权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Distributional Reinforcement Learning in the Brain

大脑中的分布式强化学习

基本信息

批准号：
10709775
负责人：
Jan Drugowitsch
金额：
$ 58.12万
依托单位：
HARVARD UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2020
资助国家：
美国
起止时间：
2020-04-15 至 2025-05-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10709775
关键词：
Distributional Reinforcement Learning Brain

项目摘要

Project Summary The field of artificial intelligence (AI) has recently made remarkable advances that resulted in new and improved algorithms and network architectures that proved efficient empirically in silico. These advances raise new questions in neurobiology: are these new algorithms used in the brain? The present study focuses on a new algorithm developed in the field of reinforcement learning (RL), called distributional RL, which outperforms other state-of-the-art RL algorithms and is regarded as a major advancement in RL. In environments in which rewards are probabilistic with respect to its occurrence and size, traditional RL algorithms have focused on learning to predict a single quantity, the average over all potential rewards. Distributional RL, by contrast, learns to predict the entire distribution over rewards (or values) by employing multiple value predictors that together encode all possible levels of future reward concurrently. Remarkably, theoretical work has shown that a class of distributional RL, called ‘quantile distributional RL’, can arise out of a simple modification of traditional RL that introduces structured variability in dopamine reward prediction error (RPE) signals. This project set out to test the hypothesis that the brain utilizes distributional RL to predict future rewards. Aim 1 will explore the characteristics of distributional RL theoretically and make predictions that allow for testing distributional RL in the brain. Theoretical investigations and simulations will be used to determine how value representations in distributional RL differ from pre-existing population coding schemes for representing probability distributions (probabilistic population codes, distributed distributional codes, etc.). Aim 2 will examine the activity of neurons that are thought to signal RPEs and reward expectation and test various predictions of distributional RL. Specifically, the activity of dopamine neurons in the ventral tegmental area and neurons in the ventral striatum and orbitofrontal cortex will be compared to key predictions of distributional RL. Aim 3 will use optogenetic manipulation to causally demonstrate the relationship between RPE signals and distributional codes.

项目摘要人工智能（AI）领域最近取得了显着的进步，改进的算法和网络架构，在计算机上证明是有效的。这些神经生物学的进步提出了新的问题：这些新的算法是否用于大脑？的目前的研究集中在强化学习（RL）领域开发的一种新算法，称为分布式RL，它优于其他国家的最先进的RL算法，被认为是一个主要的在RL的进步。在奖励的发生具有概率性的环境中，和大小，传统的RL算法专注于学习预测一个单一的量，平均值所有潜在的奖励。相比之下，分布式RL学习预测整个分布，通过采用多个价值预测器来获得奖励（或价值），这些预测器共同编码所有可能的未来的奖励同时。值得注意的是，理论工作表明，一类分布RL，称为“分位数分布RL”，可以从传统RL的简单修改中产生，在多巴胺奖赏预测误差（RPE）信号中引入结构化变异。该项目旨在测试大脑利用分布强化学习来预测未来的假设奖励目标1将从理论上探讨分布式强化学习的特征并做出预测来测试大脑中的分布强化学习。理论研究和模拟将是用于确定分布RL中的值表示与预先存在的总体有何不同用于表示概率分布的编码方案（概率总体代码，分布式分布式代码等）。目标2将检测被认为是向RPE发出信号的神经元的活动和奖励期望，并测试分布RL的各种预测。具体而言，腹侧被盖区的多巴胺神经元和腹侧纹状体和眶额区的神经元皮质将与分布RL的关键预测进行比较。Aim 3将使用光遗传学操纵，以因果地证明RPE信号和分布代码之间的关系。