权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Distributional value coding and reinforcement learning in the brain

大脑中的分布值编码和强化学习

基本信息

批准号：
10539251
负责人：
Adam Stanley Lowet
金额：
$ 3.45万
依托单位：
HARVARD MEDICAL SCHOOL
依托单位国家：
美国
项目类别：
财政年份：
2021
资助国家：
美国
起止时间：
2021-08-01 至 2024-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10539251
关键词：
Affect Algorithms Anatomy Animals Anxiety Architecture Behavior Behavioral Bernoulli Distribution Bipolar Disorder Brain Brain region Calcium Cells Code Collaborations Color Complex Corpus striatum structure Dopamine Dopamine D2 Receptor Environment Event Future Gambling Grain Heterogeneity Image Individual Laboratories Lead Learning Logic Machine Learning Measures Mental Depression Mental disorders Modification Molecular Mus Nature Neurons Odors Outcome Pattern Performance Population Population Theory Predictive Value Probability Psychological reinforcement RL13 Resolution Rewards Role Scheme Shapes Silicon Stimulus Structure System Tail Testing Ventral Striatum Ventral Tegmental Area Water Work addiction burden of illness cell type classical conditioning density design dopamine system dopaminergic neuron drug of abuse improved in vivo in vivo two-photon imaging maladaptive behavior neural circuit neuromechanism optimism receptor relating to nervous system theories two photon microscopy two-photon

项目摘要

ABSTRACT Making predictions about future rewards in the environment, and taking actions to obtain those rewards, is critical for survival. When these predictions are overly optimistic — for example, in the case of gambling addiction — or overly pessimistic — as in anxiety and depression — maladaptive behavior can result and present a significant disease burden. A fundamental challenge for making reward predictions is that the world is inherently stochastic, and events on the tails of a distribution need not reflect the average. Therefore, it may be useful to predict not only the mean, but also the complete probability distribution of upcoming rewards. Indeed, recent advances in machine learning have demonstrated that making this shift from the average reward to the complete reward distribution can dramatically improve performance in complex task domains. Despite its apparent complexity, such “distributional reinforcement learning” can be achieved computationally with a remarkably simple and biologically plausible learning rule. A recent study found that the structure of dopamine neuron activity may be consistent with distributional reinforcement learning, but it is unknown whether additional neuronal circuity is involved — most notably the ventral striatum (VS) and orbitofrontal cortex (OFC), both of which receive dopamine input and are thought to represent anticipated reward, also called “value”. Here, we propose to investigate whether value coding in these downstream regions is consistent with distributional reinforcement learning. In particular, we will record from these brain regions while mice perform classical conditioning with odors and water rewards. In the first task, we will hold the mean reward constant while changing the reward variance or higher- order moments, and ask whether neurons in the VS and OFC represent information over and above the mean, consistent with distributional reinforcement learning. In principle, this should enable us to decode the complete reward distribution purely from neural activity. In the second task, we will present mice with a panel of odors predicting the same reward amount with differing probabilities. The simplicity of these Bernoulli distributions will allow us to compare longstanding theories of population coding in the brain — that is, how probability distributions can be instantiated in neural activity to guide behavior. In addition to high-density silicon probe recordings, we will perform two-photon calcium imaging in these tasks to assess whether genetically and molecularly distinct subpopulations of neurons in the striatum contribute differentially to distributional reinforcement learning. Finally, we will combine these recordings with simultaneous imaging of dopamine dynamics in the striatum to ask how dopamine affects striatal activity in vivo. Together, these studies will help clarify dopamine’s role in learning distributions of reward, as well as its dysregulation in addiction, anxiety, depression, and bipolar disorder.

摘要