权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Hidden State Inference in the Midbrain Dopamine System

中脑多巴胺系统的隐藏状态推理

基本信息

批准号：
9526911
负责人：
Clara Kwon Starkweather
金额：
$ 4.48万
依托单位：
HARVARD MEDICAL SCHOOL
依托单位国家：
美国
项目类别：
财政年份：
2017
资助国家：
美国
起止时间：
2017-07-01 至 2019-04-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9526911
关键词：
Address Algorithms Animals Anxiety Area Auditory Hallucination Basic Science Belief Brain Brain Diseases Brain region Computer Simulation Conditioned Stimulus Cues Data Data Display Dependence Dopamine Electrophysiology (science)Ensure Environment Exhibits Functional disorder Heart Learning Length Mathematics Medial Mental Depression Methods Midbrain structure Modeling Mus Neurons Outcome Pathology Pattern Phase Play Positive Reinforcements Prefrontal Cortex Probability Psychological reinforcement Ramp Regulation Rewards Role Schizophrenia Sensory Shapes Signal Transduction Specific qualifier value Specificity Stimulus Symptoms Testing Therapeutic Time Traction Translational Research Ursidae Family addiction base classical conditioning dopamine system dopaminergic neuron effective therapy experience experimental study insight neuropsychiatric disorder neuropsychiatry novel optogenetics sensory stimulus theories time interval treatment strategy

项目摘要

Project Summary/Abstract Midbrain dopamine neurons are thought to drive associative learning by signaling reward prediction error (RPE), or actual minus expected reward. Based on dopamine RPE signaling, computational and empirical studies have produced detailed models of how reinforcement learning could be implemented in the brain. In particular, the temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track the passage of elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead operate over an inferred distribution of hidden states (a ‘belief state’). Although this hypothesis has gained traction in theories of reinforcement learning, the empirical evidence is lacking. To test this hypothesis in Aim 1, dopamine neurons will be recorded while mice perform either of two novel classical conditioning tasks. In both tasks, the timing of reward delivery relative to conditioned stimulus is varied across trials. In the first task, reward is always given. In the second task, reward is occasionally omitted. Preliminary data displays a striking difference in dopamine signaling between these two tasks, which is well-explained by a model that incorporates the animal’s intra-trial inference that reward may be omitted in the second task. These preliminary results provide evidence in favor of an associative learning rule that combines cached values with hidden state inference. Aim 2 then seeks to understand which cortical regions shape hidden state inference in the dopamine system. This Aim will consist of cortical electrophysiology (Aim 2a) and chemogenetic cortical inactivation (Aim 2b) as mice perform the classical conditioning tasks described above. The results of this proposal will provide critical experimental data towards understanding how reinforcement learning is actually implemented in the brain. This has broad relevance to both basic and translational science. In the healthy brain, robust reinforcement learning ensures that animals can maximize rewards within their environments. In the diseased brain, reinforcement learning may also play an important role. For instance, addiction has been cast as an example of maladaptive and destructive reinforcement learning. Aberrant dopamine signaling in schizophrenia is thought to underlie the reinforcement of ‘positive’ symptoms such as auditory hallucination. Therefore, examining the regulation of dopamine signaling and constructing a more accurate model of reinforcement learning is of great importance in understanding both the healthy and diseased brain.

项目总结/摘要中脑多巴胺神经元被认为是通过发出奖励预测错误（RPE）信号来驱动联想学习，或者实际的减去预期奖励基于多巴胺RPE信号，计算和实证研究已经产生了详细的如何在大脑中实施强化学习的模型。时间差（TD）学习模型一直是理解多巴胺RPE如何驱动联想学习的基石。传统上， TD学习赋予连续跟踪相对于可观察刺激的流逝时间的特征以价值。在然而，在真实的世界中，感官刺激提供了关于环境的隐藏状态的模糊信息，导致 TD学习可能会在隐藏状态（“信念状态”）的推断分布上操作的提议。虽然这一假设在强化学习理论中得到了广泛的关注，但缺乏经验证据。到在目标1中测试这一假设，当小鼠执行两种新的经典动作之一时，条件反射任务在这两项任务中，奖励传递相对于条件刺激的时间在不同的试验中是不同的。在第一个任务，总是有奖励的。在第二个任务中，奖励偶尔会被省略。初步数据显示，这两项任务之间多巴胺信号的显著差异，这可以通过一个模型很好地解释，动物的试验内推断，奖励可以在第二个任务中省略。这些初步结果提供了支持将缓存值与隐藏状态推断相结合的关联学习规则的证据。目标2 试图了解哪些皮质区域在多巴胺系统中形成隐藏状态推理。这一目标将包括皮质电生理学（Aim 2a）和化学发生皮质失活（Aim 2b），因为小鼠执行经典的上面描述的任务。该提案的结果将为以下方面提供关键的实验数据：了解强化学习是如何在大脑中实现的。这与两个基本的和转化科学。在健康的大脑中，强大的强化学习确保动物能够最大限度地获得奖励在他们的环境中。在患病的大脑中，强化学习也可能发挥重要作用。比如说，成瘾被视为适应不良和破坏性强化学习的一个例子。异常多巴胺精神分裂症中的信号被认为是强化“阳性”症状如幻听的基础。因此，研究多巴胺信号的调节并构建更精确的强化模型，学习对于了解健康和患病的大脑都是非常重要的。