Hidden State Inference in the Midbrain Dopamine System
中脑多巴胺系统的隐藏状态推理
基本信息
- 批准号:9526911
- 负责人:
- 金额:$ 4.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-01 至 2019-04-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsAnimalsAnxietyAreaAuditory HallucinationBasic ScienceBeliefBrainBrain DiseasesBrain regionComputer SimulationConditioned StimulusCuesDataData DisplayDependenceDopamineElectrophysiology (science)EnsureEnvironmentExhibitsFunctional disorderHeartLearningLengthMathematicsMedialMental DepressionMethodsMidbrain structureModelingMusNeuronsOutcomePathologyPatternPhasePlayPositive ReinforcementsPrefrontal CortexProbabilityPsychological reinforcementRampRegulationRewardsRoleSchizophreniaSensoryShapesSignal TransductionSpecific qualifier valueSpecificityStimulusSymptomsTestingTherapeuticTimeTractionTranslational ResearchUrsidae Familyaddictionbaseclassical conditioningdopamine systemdopaminergic neuroneffective therapyexperienceexperimental studyinsightneuropsychiatric disorderneuropsychiatrynoveloptogeneticssensory stimulustheoriestime intervaltreatment strategy
项目摘要
Project Summary/Abstract
Midbrain dopamine neurons are thought to drive associative learning by signaling reward prediction error (RPE), or actual
minus expected reward. Based on dopamine RPE signaling, computational and empirical studies have produced detailed
models of how reinforcement learning could be implemented in the brain. In particular, the temporal difference (TD)
learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically,
TD learning imparts value to features that serially track the passage of elapsed time relative to observable stimuli. In the
real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to
the proposal that TD learning might instead operate over an inferred distribution of hidden states (a ‘belief state’).
Although this hypothesis has gained traction in theories of reinforcement learning, the empirical evidence is lacking. To
test this hypothesis in Aim 1, dopamine neurons will be recorded while mice perform either of two novel classical
conditioning tasks. In both tasks, the timing of reward delivery relative to conditioned stimulus is varied across trials. In
the first task, reward is always given. In the second task, reward is occasionally omitted. Preliminary data displays a
striking difference in dopamine signaling between these two tasks, which is well-explained by a model that incorporates
the animal’s intra-trial inference that reward may be omitted in the second task. These preliminary results provide
evidence in favor of an associative learning rule that combines cached values with hidden state inference. Aim 2 then
seeks to understand which cortical regions shape hidden state inference in the dopamine system. This Aim will consist of
cortical electrophysiology (Aim 2a) and chemogenetic cortical inactivation (Aim 2b) as mice perform the classical
conditioning tasks described above. The results of this proposal will provide critical experimental data towards
understanding how reinforcement learning is actually implemented in the brain. This has broad relevance to both basic
and translational science. In the healthy brain, robust reinforcement learning ensures that animals can maximize rewards
within their environments. In the diseased brain, reinforcement learning may also play an important role. For instance,
addiction has been cast as an example of maladaptive and destructive reinforcement learning. Aberrant dopamine
signaling in schizophrenia is thought to underlie the reinforcement of ‘positive’ symptoms such as auditory hallucination.
Therefore, examining the regulation of dopamine signaling and constructing a more accurate model of reinforcement
learning is of great importance in understanding both the healthy and diseased brain.
项目总结/摘要
中脑多巴胺神经元被认为是通过发出奖励预测错误(RPE)信号来驱动联想学习,或者实际的
减去预期奖励基于多巴胺RPE信号,计算和实证研究已经产生了详细的
如何在大脑中实施强化学习的模型。时间差(TD)
学习模型一直是理解多巴胺RPE如何驱动联想学习的基石。传统上,
TD学习赋予连续跟踪相对于可观察刺激的流逝时间的特征以价值。在
然而,在真实的世界中,感官刺激提供了关于环境的隐藏状态的模糊信息,导致
TD学习可能会在隐藏状态(“信念状态”)的推断分布上操作的提议。
虽然这一假设在强化学习理论中得到了广泛的关注,但缺乏经验证据。到
在目标1中测试这一假设,当小鼠执行两种新的经典动作之一时,
条件反射任务在这两项任务中,奖励传递相对于条件刺激的时间在不同的试验中是不同的。在
第一个任务,总是有奖励的。在第二个任务中,奖励偶尔会被省略。初步数据显示,
这两项任务之间多巴胺信号的显著差异,这可以通过一个模型很好地解释,
动物的试验内推断,奖励可以在第二个任务中省略。这些初步结果提供了
支持将缓存值与隐藏状态推断相结合的关联学习规则的证据。目标2
试图了解哪些皮质区域在多巴胺系统中形成隐藏状态推理。这一目标将包括
皮质电生理学(Aim 2a)和化学发生皮质失活(Aim 2b),因为小鼠执行经典的
上面描述的任务。该提案的结果将为以下方面提供关键的实验数据:
了解强化学习是如何在大脑中实现的。这与两个基本的
和转化科学。在健康的大脑中,强大的强化学习确保动物能够最大限度地获得奖励
在他们的环境中。在患病的大脑中,强化学习也可能发挥重要作用。比如说,
成瘾被视为适应不良和破坏性强化学习的一个例子。异常多巴胺
精神分裂症中的信号被认为是强化“阳性”症状如幻听的基础。
因此,研究多巴胺信号的调节并构建更精确的强化模型,
学习对于了解健康和患病的大脑都是非常重要的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Clara Kwon Starkweather其他文献
Clara Kwon Starkweather的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Reconstruction algorithms for time-domain diffuse optical tomography imaging of small animals
小动物时域漫射光学断层成像重建算法
- 批准号:
RGPIN-2015-05926 - 财政年份:2019
- 资助金额:
$ 4.48万 - 项目类别:
Discovery Grants Program - Individual
Reconstruction algorithms for time-domain diffuse optical tomography imaging of small animals
小动物时域漫射光学断层成像重建算法
- 批准号:
RGPIN-2015-05926 - 财政年份:2018
- 资助金额:
$ 4.48万 - 项目类别:
Discovery Grants Program - Individual
Reconstruction algorithms for time-domain diffuse optical tomography imaging of small animals
小动物时域漫射光学断层成像重建算法
- 批准号:
RGPIN-2015-05926 - 财政年份:2017
- 资助金额:
$ 4.48万 - 项目类别:
Discovery Grants Program - Individual
Reconstruction algorithms for time-domain diffuse optical tomography imaging of small animals
小动物时域漫射光学断层成像重建算法
- 批准号:
RGPIN-2015-05926 - 财政年份:2016
- 资助金额:
$ 4.48万 - 项目类别:
Discovery Grants Program - Individual
Event detection algorithms in decision support for animals health surveillance
动物健康监测决策支持中的事件检测算法
- 批准号:
385453-2009 - 财政年份:2015
- 资助金额:
$ 4.48万 - 项目类别:
Collaborative Research and Development Grants
Algorithms to generate designs of potency experiments that use far fewer animals
生成使用更少动物的效力实验设计的算法
- 批准号:
8810865 - 财政年份:2015
- 资助金额:
$ 4.48万 - 项目类别:
Reconstruction algorithms for time-domain diffuse optical tomography imaging of small animals
小动物时域漫射光学断层成像重建算法
- 批准号:
RGPIN-2015-05926 - 财政年份:2015
- 资助金额:
$ 4.48万 - 项目类别:
Discovery Grants Program - Individual
Event detection algorithms in decision support for animals health surveillance
动物健康监测决策支持中的事件检测算法
- 批准号:
385453-2009 - 财政年份:2013
- 资助金额:
$ 4.48万 - 项目类别:
Collaborative Research and Development Grants
Development of population-level algorithms for modelling genomic variation and its impact on cellular function in animals and plants
开发群体水平算法来建模基因组变异及其对动植物细胞功能的影响
- 批准号:
FT110100972 - 财政年份:2012
- 资助金额:
$ 4.48万 - 项目类别:
ARC Future Fellowships
Advanced computational algorithms for brain imaging studies of freely moving animals
用于自由活动动物脑成像研究的先进计算算法
- 批准号:
DP120103813 - 财政年份:2012
- 资助金额:
$ 4.48万 - 项目类别:
Discovery Projects