RI: Small: Combining Reinforcement Learning and Deep Learning Methods to Address High-Dimensional Perception, Partial Observability and Delayed Reward
RI:小:结合强化学习和深度学习方法来解决高维感知、部分可观察性和延迟奖励问题
基本信息
- 批准号:1526059
- 负责人:
- 金额:$ 49.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-09-01 至 2020-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Consider the problem faced by a machine agent that has to interact with some dynamical environment to achieve some goals. Concretely, imagine an agent engaged in a virtual competition as a human would. It can see the screen composed of many moving objects. At any time, it can choose one of a dozen or so actions. Its action controls one of the objects on the screen, but it often is not clear which one. Every so often the an evaluation is given of the competition. At some point the competition ends. How should such an agent choose actions, or more importantly how can we build agents that can learn to compete, i.e., achieve high scores, through trial and error. In this project methods will be developed and evaluated to build such agents. The above problem is an instance of what is called a reinforcement learning (RL) problem. Such problems abound in sequential decision-making settings. Applications in industry include factory optimization, robotics, and chronic disease management (to list but three diverse domains of interest). Like many of these RL problems, Atari games (used as a testbed here to evaluate learning strategies) have three characteristics of interest to this project. First, they generate high-dimensional images and so the agent faces a difficult perception problem. Second, they often have deeply-delayed rewards; i.e., actions have long-term consequences. For example, losing a resource may not cost at the moment of loss, but could lead to very high losses much later when that resource is critically necessary. Third, they have deep partial observability, i.e., to compete effectively one has to often remember the deep past. For example, a location encountered far back in the past may become valuable much later because a critical resource becomes available at that time and the agent would have to find its way back to that location to use the resource. It is proposed to address these three challenges respectively with new neural network architectures for predicting the consequences of actions, new methods for intrinsically motivating agents even when reward is delayed, and new recurrent neural network architectures to remember the past effectively. Success of the proposed work is expected to significantly expand the scope of application of reinforcement learning. Finally, Atari games will be used instead of, say, factory optimization as an evaluation domain because they are readily available. They will be used to draw high-school and under-represented undergraduate students interest into complex ideas underlying the proposed work; their fun visualizations will allow them to be integrated into teaching in the PIs' classes, and there are a variety of games that vary in the degree of difficulty of the three challenge dimensions allowing more effective control of the evaluations more effectively.
考虑一个机器代理所面临的问题,它必须与一些动态环境进行交互以实现某些目标。具体来说,想象一个像人类一样参与虚拟竞争的代理。它可以看到由许多运动物体组成的屏幕。在任何时候,它都可以从十几个动作中选择一个。它的动作控制屏幕上的一个对象,但通常不清楚是哪个对象。每隔一段时间就会对比赛进行评估。在某种程度上,竞争结束了。这样的代理应该如何选择行动,或者更重要的是,我们如何构建能够学习竞争的代理,即通过试验和错误获得高分。在本项目中,将开发和评估构建此类代理的方法。上面的问题是所谓的强化学习(RL)问题的一个例子。这类问题在顺序决策设置中比比皆是。工业上的应用包括工厂优化、机器人和慢性疾病管理(仅列出三个不同的兴趣领域)。与许多强化学习问题一样,雅达利游戏(用作评估学习策略的测试平台)具有本项目感兴趣的三个特征。首先,它们生成高维图像,因此智能体面临一个困难的感知问题。其次,他们的回报往往非常滞后;也就是说,行为具有长期的后果。例如,资源的损失可能不会在损失的那一刻造成损失,但可能会在以后非常需要该资源时导致非常高的损失。第三,它们具有深刻的部分可观察性,也就是说,为了有效地竞争,人们必须经常记住深刻的过去。例如,很久以前遇到的位置可能在很久以后变得有价值,因为当时有一个关键资源可用,代理必须找到返回该位置的方法来使用该资源。为了解决这三个挑战,我们提出了新的神经网络架构来预测行为的后果,新的方法来内在激励智能体,即使奖励延迟,以及新的递归神经网络架构来有效地记住过去。这项工作的成功有望显著扩大强化学习的应用范围。最后,雅达利游戏将被用作评估领域,而不是工厂优化,因为它们很容易获得。它们将被用来吸引高中和代表性不足的本科生对拟议工作背后的复杂想法感兴趣;他们有趣的可视化将使他们能够融入到pi的课堂教学中,并且有各种各样的游戏,在三个挑战维度的难度程度上有所不同,可以更有效地控制评估。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Satinder Baveja其他文献
Satinder Baveja的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Satinder Baveja', 18)}}的其他基金
RI: Small: Reinforcement Learning with Predictive State Representations
RI:小:具有预测状态表示的强化学习
- 批准号:
1319365 - 财政年份:2013
- 资助金额:
$ 49.99万 - 项目类别:
Continuing Grant
EAGER: On the Optimal Rewards Problem
EAGER:关于最优奖励问题
- 批准号:
1148668 - 财政年份:2011
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant
SHB: Medium: Collaborative Research: Novel Computational Techniques for Cardiovascular Risk Stratification
SHB:媒介:协作研究:心血管风险分层的新颖计算技术
- 批准号:
1064948 - 财政年份:2011
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant
RI: Medium: Building Flexible, Robust, and Autonomous Agents
RI:中:构建灵活、稳健和自治的代理
- 批准号:
0905146 - 财政年份:2009
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant
Flexible State Representations in Reinforcement Learning
强化学习中灵活的状态表示
- 批准号:
0413004 - 财政年份:2005
- 资助金额:
$ 49.99万 - 项目类别:
Continuing Grant
Collaborative Research: Intrinsically Motivated Learning in Artificial Agents
协作研究:人工智能体的内在动机学习
- 批准号:
0432027 - 财政年份:2004
- 资助金额:
$ 49.99万 - 项目类别:
Continuing Grant
Exploiting Structure in Reinforcement Learning Problems
利用强化学习问题中的结构
- 批准号:
9711753 - 财政年份:1997
- 资助金额:
$ 49.99万 - 项目类别:
Continuing Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
Design of future low Earth orbit small satellites by combining aerodynamic force and solar radiation pressure
气动力与太阳辐射压相结合的未来近地轨道小卫星设计
- 批准号:
22J13958 - 财政年份:2022
- 资助金额:
$ 49.99万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Small area estimation, combining data from multiple sources, and inference from non-probability samples
小区域估计,结合多个来源的数据,以及非概率样本的推断
- 批准号:
RGPIN-2019-06181 - 财政年份:2021
- 资助金额:
$ 49.99万 - 项目类别:
Discovery Grants Program - Individual
Combining Camu-Camu prebiotics and anti-programmed cell death protein-1 to improve gut microbiome and clinical outcomes for patients with non-small cell lung cancer
结合 Camu-Camu 益生元和抗程序性细胞死亡蛋白-1,改善非小细胞肺癌患者的肠道微生物组和临床结果
- 批准号:
466902 - 财政年份:2021
- 资助金额:
$ 49.99万 - 项目类别:
Studentship Programs
Small area estimation, combining data from multiple sources, and inference from non-probability samples
小区域估计,结合多个来源的数据,以及非概率样本的推断
- 批准号:
RGPIN-2019-06181 - 财政年份:2020
- 资助金额:
$ 49.99万 - 项目类别:
Discovery Grants Program - Individual
RI: Small: Exploring Rationale behind Visual Understanding: Combining Attention and Reasoning
RI:小:探索视觉理解背后的基本原理:注意力和推理的结合
- 批准号:
1908711 - 财政年份:2019
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant
III: Small: Combining Stochastics and Numerics for Improved Scalable Matrix Computations
III:小型:结合随机变量和数值以改进可扩展矩阵计算
- 批准号:
1815054 - 财政年份:2018
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant
Building Better Antioxidants: Virtual Screening, Synthesis, and Characterization of Multifunctional Small Molecules Combining Nrf2 Pathway Activation and Direct Antioxidant Activity
构建更好的抗氧化剂:结合 Nrf2 通路激活和直接抗氧化活性的多功能小分子的虚拟筛选、合成和表征
- 批准号:
10360130 - 财政年份:2018
- 资助金额:
$ 49.99万 - 项目类别:
Combining information from independent surveys, small area estimation of complex parameters, analysis of complex survey data
结合独立调查的信息,复杂参数的小区域估计,复杂调查数据的分析
- 批准号:
8856-2013 - 财政年份:2017
- 资助金额:
$ 49.99万 - 项目类别:
Discovery Grants Program - Individual
CIF: Small: Combining Information Theoretic Security and Stochastic Control to Study Advanced Persistent Threats
CIF:小型:结合信息论安全和随机控制来研究高级持续威胁
- 批准号:
1617889 - 财政年份:2016
- 资助金额:
$ 49.99万 - 项目类别:
Standard Grant
Combining information from independent surveys, small area estimation of complex parameters, analysis of complex survey data
结合独立调查的信息,复杂参数的小区域估计,复杂调查数据的分析
- 批准号:
8856-2013 - 财政年份:2016
- 资助金额:
$ 49.99万 - 项目类别:
Discovery Grants Program - Individual