CAREER: Learning from demonstrations and beyond -- consolidating imitation and reinforcement learning
职业:从演示中学习以及超越——巩固模仿和强化学习
基本信息
- 批准号:2238979
- 负责人:
- 金额:$ 58.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-06-01 至 2028-05-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Recent advancements in deep reinforcement learning (RL) hold unprecedented potential for automating and optimizing control of real-world tasks such as autonomous driving, traffic management, medical procedures, robotic manufacturing, and energy management. Unfortunately, it is common for RL algorithms to exhibit unstable and/or inefficient learning, which limits their applicability. Seeking to address this critical concern, this CAREER project leverages imitation learning (IL), or behavior copying, which is better understood and typically more stable. The project targets the unification of IL and RL into a holistic paradigm that can safely and effectively learn from, and outperform, existing solutions. This project will address outstanding knowledge gaps in both types of learning through a novel curriculum decomposition of the tasks, where simplified demonstrations are used to bootstrap the learner’s behavior. The project will also foster education and outreach activities. Specifically, it will enhance undergraduate STEM training by providing students with exposure to scientific research and knowledge discovery processes relating to safety-critical AI applications through an original multidisciplinary undergraduate engineering program. Moreover, it will facilitate a unique K12 outreach activity within a large minority (Hispanic/latino) community (Bryan, TX). The project will support and advance an existing research collaboration with an industrial partner in the context of defense technology. This collaboration, in turn, is expected to advance the US national defense.This project will form the basis for a new research thrust in ML---one that combines IL and RL toward a holistic, robust, and safe learning framework. It will define and prove a no-regret bound on the training process within the Markov-Decision Process formalization. The approach is to reduce an IL problem to an RL one that includes a domain-independent curriculum-learning trajectory. The resulting algorithms and solutions are expected to achieve state-of-the-art performance in complex control domains as well as to deepen theoretical understanding of the potential and limitations of the resulting solutions. Specifically, the research seeks to prove conditions guaranteeing policy convergence and monotonic improvement during training. Moreover, the project will develop domain-specific adaptation to and analysis of real-world applications (autonomous driving and robotics testbeds) while providing stable and efficient RL from demonstrations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
深度强化学习(RL)的最新进展为自动化和优化控制现实世界的任务(如自动驾驶、交通管理、医疗程序、机器人制造和能源管理)提供了前所未有的潜力。不幸的是,RL算法通常表现出不稳定和/或低效的学习,这限制了它们的适用性。为了解决这一关键问题,这个CAREER项目利用模仿学习(IL)或行为复制,这是更好地理解和通常更稳定。该项目旨在将IL和RL统一为一个整体范式,可以安全有效地从现有解决方案中学习并超越现有解决方案。该项目将通过任务的新课程分解来解决这两种学习类型中存在的突出知识差距,其中简化的演示用于引导学习者的行为。该项目还将促进教育和外联活动。具体而言,它将通过原始的多学科本科工程计划,为学生提供与安全关键型人工智能应用相关的科学研究和知识发现过程,从而加强本科STEM培训。此外,它还将促进在一个大型少数民族(西班牙裔/拉丁裔)社区(德克萨斯州布赖恩)开展独特的K12外联活动。该项目将支持和推进与工业合作伙伴在国防技术方面的现有研究合作。该项目将为ML的新研究方向奠定基础--将IL和RL结合起来,形成一个全面、强大和安全的学习框架。它将在马尔可夫决策过程形式化中定义并证明训练过程的无遗憾界限。该方法将IL问题简化为RL问题,其中包括域独立的学习轨迹。由此产生的算法和解决方案,预计将实现国家的最先进的性能,在复杂的控制领域,以及深化的潜力和由此产生的解决方案的局限性的理论理解。具体而言,该研究旨在证明在训练过程中保证策略收敛和单调改进的条件。此外,该项目还将开发针对实际应用(自动驾驶和机器人试验台)的特定领域适应和分析,同时通过演示提供稳定高效的RL。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Comparison between popular Genetic Algorithm (GA)-based tool and Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES) for optimizing indoor daylight
用于优化室内日光的流行的基于遗传算法 (GA) 的工具与协方差矩阵适应 - 进化策略 (CMA-ES) 的比较
- DOI:10.26868/25222708.2023.1218
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Anis, Manal;Pendurkar, Sumedh;Yi, Yun Kyu;Sharon, Guni
- 通讯作者:Sharon, Guni
The (Un)Scalability of Informed Heuristic Function Estimation in NP-Hard Search Problems
NP 难搜索问题中知情启发式函数估计的(非)可扩展性
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Sumedh Pendurkar;Taoan Huang;Brendan Juba;Jiapeng Zhang;Sven Koenig;Guni Sharon
- 通讯作者:Guni Sharon
Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games
- DOI:10.5555/3545946.3598887
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Sumedh Pendurkar;Chris Chow;Luo Jie;Guni Sharon
- 通讯作者:Sumedh Pendurkar;Chris Chow;Luo Jie;Guni Sharon
Task Phasing: Automated Curriculum Learning from Demonstrations
任务阶段化:从演示中自动进行课程学习
- DOI:10.1609/icaps.v33i1.27235
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Bajaj, Vaibhav;Sharon, Guni;Stone, Peter
- 通讯作者:Stone, Peter
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Guni Sharon其他文献
Curriculum Generation for Learning Guiding Functions in State-Space Search Algorithms
状态空间搜索算法中学习引导功能的课程生成
- DOI:
10.1609/socs.v17i1.31546 - 发表时间:
2024 - 期刊:
- 影响因子:2.3
- 作者:
Sumedh Pendurkar;Levi H.S. Lelis;Nathan R Sturtevant;Guni Sharon - 通讯作者:
Guni Sharon
Technical Report: Hybrid Autonomous Intersection Management
技术报告:混合自主交叉口管理
- DOI:
10.48550/arxiv.2204.07704 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Aaron Parks;Guni Sharon - 通讯作者:
Guni Sharon
Traffic Optimization For a Mixture of Self-interested and Compliant Agents
自利和顺从代理混合的流量优化
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Guni Sharon;Michael Albert;Tarun Rambha;S. Boyles;P. Stone - 通讯作者:
P. Stone
An Assessment of Autonomous Vehicles: Traffic Impacts and Infrastructure Needs—Final Report
自动驾驶汽车评估:交通影响和基础设施需求——最终报告
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
K. Kockelman;S. Boyles;P. Stone;Daniel J. Fagnant;Rahul Patel;M. Levin;Guni Sharon;M. Simoni;Michael Albert;Hagen Fritz;Rebecca Hutchinson;P. Bansal;Gleb B. Domnenko;P. Bujanovic;Bumsik Kim;Elham Pourrahmani;Sudesh Agrawal;Tianxin Li;Josiah P. Hanna;Aqshems Nichols;Jia Li - 通讯作者:
Jia Li
Socially Optimal Non-discriminatory Restrictions for Continuous-Action Games
对连续动作游戏的社会最优非歧视性限制
- DOI:
10.1609/aaai.v37i10.26375 - 发表时间:
2023 - 期刊:
- 影响因子:8.7
- 作者:
M.;Guni Sharon - 通讯作者:
Guni Sharon
Guni Sharon的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
- 批准号:61573081
- 批准年份:2015
- 资助金额:64.0 万元
- 项目类别:面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
- 批准号:61572533
- 批准年份:2015
- 资助金额:66.0 万元
- 项目类别:面上项目
E-Learning中学习者情感补偿方法的研究
- 批准号:61402392
- 批准年份:2014
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
- 批准号:
2312955 - 财政年份:2023
- 资助金额:
$ 58.45万 - 项目类别:
Standard Grant
FMitF: Track I: Program Synthesis for Robot Learning from Demonstrations
FMITF:轨道 I:机器人从演示中学习的程序综合
- 批准号:
2319471 - 财政年份:2023
- 资助金额:
$ 58.45万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
- 批准号:
2312956 - 财政年份:2023
- 资助金额:
$ 58.45万 - 项目类别:
Standard Grant
Learning robot navigation and manipulation from demonstrations
通过演示学习机器人导航和操作
- 批准号:
2601734 - 财政年份:2021
- 资助金额:
$ 58.45万 - 项目类别:
Studentship
NRI: FND: Robust Learning of Sequential Motion from Human Demonstrations to Enable Robot-Guided Exercise Training
NRI:FND:从人体演示中稳健地学习顺序运动,以实现机器人引导的运动训练
- 批准号:
1830597 - 财政年份:2019
- 资助金额:
$ 58.45万 - 项目类别:
Standard Grant
Efficient Robot Learning of Contact-Rich Tasks from Non-Expert Demonstrations
机器人从非专家演示中高效学习接触丰富的任务
- 批准号:
2297064 - 财政年份:2019
- 资助金额:
$ 58.45万 - 项目类别:
Studentship
The role of the head direction cell circuit in behavioural demonstrations of direction learning
头部方向细胞回路在方向学习行为演示中的作用
- 批准号:
RGPIN-2015-04997 - 财政年份:2019
- 资助金额:
$ 58.45万 - 项目类别:
Discovery Grants Program - Individual
The role of the head direction cell circuit in behavioural demonstrations of direction learning
头部方向细胞回路在方向学习行为演示中的作用
- 批准号:
526654-2018 - 财政年份:2018
- 资助金额:
$ 58.45万 - 项目类别:
University Undergraduate Student Research Awards
The role of the head direction cell circuit in behavioral demonstrations of direction learning
头部方向细胞回路在方向学习行为演示中的作用
- 批准号:
526664-2018 - 财政年份:2018
- 资助金额:
$ 58.45万 - 项目类别:
University Undergraduate Student Research Awards
CRII: RI: Towards Learning Skills from First Person Demonstrations
CRII:RI:从第一人称演示中学习技能
- 批准号:
1755895 - 财政年份:2018
- 资助金额:
$ 58.45万 - 项目类别:
Standard Grant