权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A theoretical framework for probabilistic reinforcement learning in the basal ganglia

基底神经节概率强化学习的理论框架

基本信息

批准号：
10687830
负责人：
Samuel J Gershman
金额：
$ 53.56万
依托单位：
HARVARD MEDICAL SCHOOL
依托单位国家：
美国
项目类别：
财政年份：
2019
资助国家：
美国
起止时间：
2019-08-15 至 2024-07-31
项目状态：
已结题

项目摘要

Project abstract According to the standard reinforcement learning framework, the basal ganglia implements estimation of long- term future reward and the control of actions to maximize future reward. Dopamine (DA) plays a central role by providing the learning signal (reward prediction error, or RPE) that guides updating of reward predictions and the action policy. Despite its success, the reinforcement learning framework has been challenged from a number of directions. Some studies have suggested that DA encodes reward predictions themselves, rather than reward prediction errors, and other studies have suggested that DA may play a role in invigorating action selection independently from its contribution to learning. A major goal of this project is to develop a reinforcement learning theory of basal ganglia function that addresses these challenges, and more broadly presents a unifying view of how learning, probabilistic inference, and action selection work together to produce adaptive behavior. Our theoretical innovation can be divided into three components. First, we argue that cortical inputs to the striatum encode a probability distribution over hidden states, known as the belief state. Second, we argue that striatal projection neurons transform this input through a set of basis functions, whose purpose is to facilitate reward prediction. The synaptic weights that parametrize these predictions are updated based on the DA RPE signal. Third, we argue that action selection circuits in the dorsal striatum use probabilistic information about rewards to implement uncertainty-guided exploration.

项目摘要根据标准的强化学习框架，基底神经节实现长时程的估计术语未来奖励和对行动的控制以最大化未来奖励。多巴胺 (DA) 发挥着核心作用提供指导奖励预测更新的学习信号（奖励预测误差，或 RPE）行动政策。尽管取得了成功，强化学习框架还是受到了来自以下方面的挑战：方向数。一些研究表明 DA 本身编码奖励预测，而不是比奖励预测错误更重要，其他研究表明 DA 可能在激励行动中发挥作用选择独立于其对学习的贡献。该项目的一个主要目标是开发一个基底神经节功能的强化学习理论可以更广泛地解决这些挑战提出了学习、概率推理和行动选择如何协同工作以产生结果的统一观点适应性行为。我们的理论创新可以分为三个部分。首先，我们认为纹状体的皮质输入编码隐藏状态的概率分布，称为信念状态。其次，我们认为纹状体投射神经元通过一组基函数转换该输入，其目的是促进奖励预测。更新参数化这些预测的突触权重基于 DA RPE 信号。第三，我们认为背侧纹状体中的动作选择电路使用有关实施不确定性引导探索的奖励的概率信息。