Distributional value coding and reinforcement learning in the brain
大脑中的分布值编码和强化学习
基本信息
- 批准号:10539251
- 负责人:
- 金额:$ 3.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-08-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:AffectAlgorithmsAnatomyAnimalsAnxietyArchitectureBehaviorBehavioralBernoulli DistributionBipolar DisorderBrainBrain regionCalciumCellsCodeCollaborationsColorComplexCorpus striatum structureDopamineDopamine D2 ReceptorEnvironmentEventFutureGamblingGrainHeterogeneityImageIndividualLaboratoriesLeadLearningLogicMachine LearningMeasuresMental DepressionMental disordersModificationMolecularMusNatureNeuronsOdorsOutcomePatternPerformancePopulationPopulation TheoryPredictive ValueProbabilityPsychological reinforcementRL13ResolutionRewardsRoleSchemeShapesSiliconStimulusStructureSystemTailTestingVentral StriatumVentral Tegmental AreaWaterWorkaddictionburden of illnesscell typeclassical conditioningdensitydesigndopamine systemdopaminergic neurondrug of abuseimprovedin vivoin vivo two-photon imagingmaladaptive behaviorneural circuitneuromechanismoptimismreceptorrelating to nervous systemtheoriestwo photon microscopytwo-photon
项目摘要
ABSTRACT
Making predictions about future rewards in the environment, and taking actions to obtain those rewards, is critical
for survival. When these predictions are overly optimistic — for example, in the case of gambling addiction — or
overly pessimistic — as in anxiety and depression — maladaptive behavior can result and present a significant
disease burden. A fundamental challenge for making reward predictions is that the world is inherently stochastic,
and events on the tails of a distribution need not reflect the average. Therefore, it may be useful to predict not
only the mean, but also the complete probability distribution of upcoming rewards. Indeed, recent advances in
machine learning have demonstrated that making this shift from the average reward to the complete reward
distribution can dramatically improve performance in complex task domains. Despite its apparent complexity,
such “distributional reinforcement learning” can be achieved computationally with a remarkably simple and
biologically plausible learning rule. A recent study found that the structure of dopamine neuron activity may be
consistent with distributional reinforcement learning, but it is unknown whether additional neuronal circuity is
involved — most notably the ventral striatum (VS) and orbitofrontal cortex (OFC), both of which receive dopamine
input and are thought to represent anticipated reward, also called “value”. Here, we propose to investigate
whether value coding in these downstream regions is consistent with distributional reinforcement learning. In
particular, we will record from these brain regions while mice perform classical conditioning with odors and water
rewards. In the first task, we will hold the mean reward constant while changing the reward variance or higher-
order moments, and ask whether neurons in the VS and OFC represent information over and above the mean,
consistent with distributional reinforcement learning. In principle, this should enable us to decode the complete
reward distribution purely from neural activity. In the second task, we will present mice with a panel of odors
predicting the same reward amount with differing probabilities. The simplicity of these Bernoulli distributions will
allow us to compare longstanding theories of population coding in the brain — that is, how probability distributions
can be instantiated in neural activity to guide behavior. In addition to high-density silicon probe recordings, we
will perform two-photon calcium imaging in these tasks to assess whether genetically and molecularly distinct
subpopulations of neurons in the striatum contribute differentially to distributional reinforcement learning. Finally,
we will combine these recordings with simultaneous imaging of dopamine dynamics in the striatum to ask how
dopamine affects striatal activity in vivo. Together, these studies will help clarify dopamine’s role in learning
distributions of reward, as well as its dysregulation in addiction, anxiety, depression, and bipolar disorder.
摘要
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Adam Stanley Lowet其他文献
Adam Stanley Lowet的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Adam Stanley Lowet', 18)}}的其他基金
Distributional Value Coding and Reinforcement Learning in the Brain
大脑中的分布值编码和强化学习
- 批准号:
10668487 - 财政年份:2021
- 资助金额:
$ 3.45万 - 项目类别:
Distributional value coding and reinforcement learning in the brain
大脑中的分布值编码和强化学习
- 批准号:
10311130 - 财政年份:2021
- 资助金额:
$ 3.45万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 3.45万 - 项目类别:
Continuing Grant