权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Decision dynamics during a continuous-time foraging task: a reinforcement learning approach

连续时间觅食任务期间的决策动态：强化学习方法

基本信息

批准号：
10373999
负责人：
Benjamin Nicolaas Ballintyn
金额：
$ 0.52万
依托单位：
BRANDEIS UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2020
资助国家：
美国
起止时间：
2020-04-01 至 2022-04-15
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10373999
关键词：
Algorithms Animal Behavior Animals Behavior Behavioral Behavioral Model Brain Breathing Complement Computer Models Consumption Data Decision Making Decision Modeling Development Economics Electrophysiology (science)Energy Intake Environment Evolution Family Food Free Will Future Goals Hour Human Individual Knowledge Learning Learning Disorders Life Location Measurement Measures Memory Methods Modeling Natural Selections Nature Palate Pathology Performance Policies Probability Process Psychological reinforcement Rattus Research Resources Rewards Sampling Self-control as a personality trait Source Stimulus Structure System Testing Thirst Time Travel Update Wait Time Weight addiction base behavior prediction cognitive process decision making algorithm discounting experimental study improved indexing insight learning algorithm learning strategy motor disorder neural circuit neurophysiology pressure relating to nervous system reward circuitry success theories

项目摘要

Project Summary It is likely that evolution has strongly shaped the neural circuitry of the reward systems to optimize performance in the many tasks involved in foraging for resources, a critical part of every animal's life. This proposition was the inspiration for the development of “optimal” foraging theories, such as the marginal value theorem (MVT), which derive analytically the foraging behavior (sequences of choices) that maximizes the long-term rate of reward, usually considered to be energy intake. While these analytic theories have had some success in describing animal behavior, the theories themselves rely on strict assumptions about the environment that do not hold in many natural situations and are not flexible enough to generalize to more complicated environments or other tasks. Therefore, the end goal of this project is to understand which of a family of general-purpose decision (reinforcement-learning) algorithms is most likely to be employed by the brain to solve value-based tasks and to use this knowledge to predict under what circumstances these algorithms will lead to optimal or suboptimal behavior. With this project, I will improve our understanding of animal decision processes in these more natural environments by performing a foraging experiment that is continuous in time and violates many of the assumptions that prior analytical theories of foraging rely on. Rats motivated by thirst will be allowed to sample freely from two or three (palatable or aversive) tastant options (“patches”) in an open field and, critically, will be allowed to direct their encounters with the options, something which past experiments have lacked. Measurements of licking (consumption) behavior at each of the tastant options will allow me to measure the decision dynamics of the rat over several 1-hour sessions. In particular, I will measure how the sampling times at each option correlate with the values of the alternatives to gain insight into how rats combine the values of available options to make decisions. As a complement to this behavioral task, I will simulate a set of reinforcement learning agents that vary in the rules used for learning action values, choosing actions, and planning actions. By quantitatively comparing the decision behavior of these artificial agents to that obtained from rats I will determine which of the simulated agents best reproduces the rat behavior, giving insight into the decision algorithms used by rats and providing a direction for future electrophysiological recordings during this task. Importantly, this comparison of animal behavior with that produced by artificial agents will allow me to assess how close to “optimal” rat behavior is and, in the cases where it is suboptimal, to provide quantitative explanations for why it is so.

项目摘要很可能进化强烈地塑造了奖励系统的神经回路，在许多涉及觅食资源的任务中表现出色，这是每个动物生活的关键部分。这这一命题启发了“最优”觅食理论的发展，如边缘觅食理论。价值定理（MVT），它推导出分析觅食行为（选择序列），最大化长期回报率，通常被认为是能量摄入。虽然这些分析理论在描述动物行为方面取得了一些成功，但理论本身依赖于严格的关于环境的假设在许多自然情况下不成立，而且不够灵活以推广到更复杂的环境或其他任务。因此，本项目的最终目标是了解通用决策（学习）算法家族中的哪一个最有可能被大脑用来解决基于价值的任务，并利用这些知识来预测在某些情况下，这些算法将导致最优或次优行为。通过这个项目，我将提高我们对动物决策过程的理解，环境通过执行觅食实验，这是连续的时间，并违反了许多先前的觅食分析理论所依赖的假设。被口渴驱使的老鼠将被允许从两个或三个（可口的或厌恶的）促味剂选项（“块”）中自由地取样，关键的是，将被允许指导他们与选项的接触，这是过去实验所做的。缺乏。在每一种促味剂的选择下测量舔（消耗）行为将使我能够在几个1小时的会话中测量大鼠的决策动态。特别是，我将衡量如何每个选项的采样时间与替代品的值相关，以了解大鼠如何联合收割机结合可用选项的值来做出决策。作为这个行为任务的补充，我将模拟一组强化学习代理，它们在用于学习动作值、选择动作和规划动作的规则。通过定量将这些人工代理的决策行为与从大鼠获得的决策行为进行比较，我将确定模拟的代理最好地再现了大鼠的行为，从而深入了解了大鼠，并提供了一个方向，为未来的电生理记录在这项任务。重要的是这通过比较动物行为和人工代理产生的行为，我可以评估 “最佳”大鼠行为是，在次优的情况下，为什么会这样。