权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Learning from demonstrations and beyond -- consolidating imitation and reinforcement learning

职业：从演示中学习以及超越——巩固模仿和强化学习

基本信息

批准号：
2238979
负责人：
Guni Sharon
金额：
$ 58.45万
依托单位：
Texas A&M Engineering Experiment Station
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-06-01 至 2028-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2238979&HistoricalAwards=false
关键词：
CAREER Learning demonstrations beyond consolidating

项目摘要

Recent advancements in deep reinforcement learning (RL) hold unprecedented potential for automating and optimizing control of real-world tasks such as autonomous driving, traffic management, medical procedures, robotic manufacturing, and energy management. Unfortunately, it is common for RL algorithms to exhibit unstable and/or inefficient learning, which limits their applicability. Seeking to address this critical concern, this CAREER project leverages imitation learning (IL), or behavior copying, which is better understood and typically more stable. The project targets the unification of IL and RL into a holistic paradigm that can safely and effectively learn from, and outperform, existing solutions. This project will address outstanding knowledge gaps in both types of learning through a novel curriculum decomposition of the tasks, where simplified demonstrations are used to bootstrap the learner’s behavior. The project will also foster education and outreach activities. Specifically, it will enhance undergraduate STEM training by providing students with exposure to scientific research and knowledge discovery processes relating to safety-critical AI applications through an original multidisciplinary undergraduate engineering program. Moreover, it will facilitate a unique K12 outreach activity within a large minority (Hispanic/latino) community (Bryan, TX). The project will support and advance an existing research collaboration with an industrial partner in the context of defense technology. This collaboration, in turn, is expected to advance the US national defense.This project will form the basis for a new research thrust in ML---one that combines IL and RL toward a holistic, robust, and safe learning framework. It will define and prove a no-regret bound on the training process within the Markov-Decision Process formalization. The approach is to reduce an IL problem to an RL one that includes a domain-independent curriculum-learning trajectory. The resulting algorithms and solutions are expected to achieve state-of-the-art performance in complex control domains as well as to deepen theoretical understanding of the potential and limitations of the resulting solutions. Specifically, the research seeks to prove conditions guaranteeing policy convergence and monotonic improvement during training. Moreover, the project will develop domain-specific adaptation to and analysis of real-world applications (autonomous driving and robotics testbeds) while providing stable and efficient RL from demonstrations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

深度强化学习（RL）的最新进展为自动化和优化控制现实世界的任务（如自动驾驶、交通管理、医疗程序、机器人制造和能源管理）提供了前所未有的潜力。不幸的是，RL算法通常表现出不稳定和/或低效的学习，这限制了它们的适用性。为了解决这一关键问题，这个CAREER项目利用模仿学习（IL）或行为复制，这是更好地理解和通常更稳定。该项目旨在将IL和RL统一为一个整体范式，可以安全有效地从现有解决方案中学习并超越现有解决方案。该项目将通过任务的新课程分解来解决这两种学习类型中存在的突出知识差距，其中简化的演示用于引导学习者的行为。该项目还将促进教育和外联活动。具体而言，它将通过原始的多学科本科工程计划，为学生提供与安全关键型人工智能应用相关的科学研究和知识发现过程，从而加强本科STEM培训。此外，它还将促进在一个大型少数民族（西班牙裔/拉丁裔）社区（德克萨斯州布赖恩）开展独特的K12外联活动。该项目将支持和推进与工业合作伙伴在国防技术方面的现有研究合作。该项目将为ML的新研究方向奠定基础--将IL和RL结合起来，形成一个全面、强大和安全的学习框架。它将在马尔可夫决策过程形式化中定义并证明训练过程的无遗憾界限。该方法将IL问题简化为RL问题，其中包括域独立的学习轨迹。由此产生的算法和解决方案，预计将实现国家的最先进的性能，在复杂的控制领域，以及深化的潜力和由此产生的解决方案的局限性的理论理解。具体而言，该研究旨在证明在训练过程中保证策略收敛和单调改进的条件。此外，该项目还将开发针对实际应用（自动驾驶和机器人试验台）的特定领域适应和分析，同时通过演示提供稳定高效的RL。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（5）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Comparison between popular Genetic Algorithm (GA)-based tool and Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES) for optimizing indoor daylight

用于优化室内日光的流行的基于遗传算法 (GA) 的工具与协方差矩阵适应 - 进化策略 (CMA-ES) 的比较

DOI：
10.26868/25222708.2023.1218
发表时间：
2023
期刊：
Proceedings of Building Simulation 2023: 18th Conference of IBPSA
影响因子：
0
作者：
Anis, Manal;Pendurkar, Sumedh;Yi, Yun Kyu;Sharon, Guni
通讯作者：
Sharon, Guni

The (Un)Scalability of Informed Heuristic Function Estimation in NP-Hard Search Problems

NP 难搜索问题中知情启发式函数估计的（非）可扩展性

DOI：
发表时间：
2023
期刊：
Transactions on Machine Learning Research
影响因子：
0
作者：
Sumedh Pendurkar;Taoan Huang;Brendan Juba;Jiapeng Zhang;Sven Koenig;Guni Sharon
通讯作者：
Guni Sharon

Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games

DOI：
10.5555/3545946.3598887
发表时间：
2023
期刊：
影响因子：
0
作者：
Sumedh Pendurkar;Chris Chow;Luo Jie;Guni Sharon
通讯作者：
Sumedh Pendurkar;Chris Chow;Luo Jie;Guni Sharon

Task Phasing: Automated Curriculum Learning from Demonstrations

任务阶段化：从演示中自动进行课程学习

DOI：
10.1609/icaps.v33i1.27235
发表时间：
2023
期刊：
Proceedings of the International Conference on Automated Planning and Scheduling
影响因子：
0
作者：
Bajaj, Vaibhav;Sharon, Guni;Stone, Peter
通讯作者：
Stone, Peter

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Guni Sharon其他文献

Curriculum Generation for Learning Guiding Functions in State-Space Search Algorithms

状态空间搜索算法中学习引导功能的课程生成

DOI：
10.1609/socs.v17i1.31546
发表时间：
2024
期刊：
Comparative Biochemistry and Physiology A-molecular & Integrative Physiology
影响因子：
2.3
作者：
Sumedh Pendurkar;Levi H.S. Lelis;Nathan R Sturtevant;Guni Sharon
通讯作者：
Guni Sharon

Technical Report: Hybrid Autonomous Intersection Management

技术报告：混合自主交叉口管理

DOI：
10.48550/arxiv.2204.07704
发表时间：
2022
期刊：
ArXiv
影响因子：
0
作者：
Aaron Parks;Guni Sharon
通讯作者：
Guni Sharon

Traffic Optimization For a Mixture of Self-interested and Compliant Agents

自利和顺从代理混合的流量优化

DOI：
发表时间：
2017
期刊：
International Symposium on Artificial Intelligence and Mathematics
影响因子：
0
作者：
Guni Sharon;Michael Albert;Tarun Rambha;S. Boyles;P. Stone
通讯作者：
P. Stone

An Assessment of Autonomous Vehicles: Traffic Impacts and Infrastructure Needs—Final Report

自动驾驶汽车评估：交通影响和基础设施需求——最终报告

DOI：
发表时间：
2017
期刊：
影响因子：
0
作者：
K. Kockelman;S. Boyles;P. Stone;Daniel J. Fagnant;Rahul Patel;M. Levin;Guni Sharon;M. Simoni;Michael Albert;Hagen Fritz;Rebecca Hutchinson;P. Bansal;Gleb B. Domnenko;P. Bujanovic;Bumsik Kim;Elham Pourrahmani;Sudesh Agrawal;Tianxin Li;Josiah P. Hanna;Aqshems Nichols;Jia Li
通讯作者：
Jia Li