权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Safe and Efficient Robot Learning from Demonstration in the Real World

职业：安全高效的机器人从现实世界的演示中学习

基本信息

批准号：
2323384
负责人：
Scott Niekum
金额：
$ 52.46万
依托单位：
University of Massachusetts Amherst
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-01-01 至 2025-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2323384&HistoricalAwards=false
关键词：
CAREER Safe Efficient Robot Learning

项目摘要

General purpose robots are poised to enter the home and workplace in unprecedented numbers in coming years, but face the significant challenge of customization - the ability to perform user-specified tasks in many different unstructured environments. In response to this need, robot learning from demonstration (LfD) has emerged as a paradigm that allows users to quickly and naturally program robots by simply showing them how to perform a task, rather than by writing code. This methodology aims to allow non-expert users to program robots, as well as communicate embodied knowledge that is difficult to translate into formal code. However, current state-of-the-art LfD algorithms are not yet ready for widespread deployment, as they are often unreliable, need too much data, and are designed to learn in a single session in a laboratory setting. This work addresses these issues to help enable future robots to perform important tasks ranging from in-home elderly care to reconfigurable manufacturing.Specifically, this work identifies three significant technical improvements to current LfD algorithms that are needed before they can be deployed in the real world: the need for safety guarantees, the ability to learn from very limited amounts of data, and the ability to continually improve in an ongoing, life-long fashion. A formal theory of safe LfD is developed, along with practical algorithms that provide strong probabilistic lower bounds on agent performance. Algorithmic efficiency is addressed via a re-examining of common statistical assumptions (such as independent and identically distributed data) and through the use of multimodal side-information, such as natural language and gaze. Finally, active learning strategies and modeling of human beliefs are used to enable interactive, continual learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

未来几年，通用机器人将以前所未有的数量进入家庭和工作场所，但面临着定制的重大挑战-在许多不同的非结构化环境中执行用户指定的任务的能力。为了回应这一需求，机器人从演示中学习(LFD)已经成为一种范例，允许用户通过简单地向机器人展示如何执行任务来快速而自然地对机器人进行编程，而不是通过编写代码。这种方法旨在允许非专家用户对机器人进行编程，以及交流难以转换为正式代码的具体化知识。然而，当前最先进的LFD算法还没有准备好广泛部署，因为它们通常不可靠，需要太多数据，并且被设计为在实验室环境中通过一次会话学习。这项工作解决了这些问题，以帮助未来的机器人执行从居家老年人护理到可重构制造等重要任务。具体地说，这项工作确定了在将当前LFD算法部署到现实世界之前需要对其进行三项重大的技术改进：安全保证的需要，从非常有限的数据量中学习的能力，以及以持续、终身的方式持续改进的能力。建立了安全LFD的形式化理论，并给出了为代理性能提供强大概率下界的实用算法。算法效率是通过重新检查常见的统计假设(例如独立和相同分布的数据)以及通过使用多模式辅助信息来解决的，例如自然语言和凝视。最后，积极的学习策略和人类信念的模型被用来实现互动的、持续的学习。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（7）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Ranking Game for Imitation Learning

DOI：
发表时间：
2022-02
期刊：
ArXiv
影响因子：
0
作者：
Harshit S. Sikchi;Akanksha Saran;Wonjoon Goo;S. Niekum
通讯作者：
Harshit S. Sikchi;Akanksha Saran;Wonjoon Goo;S. Niekum

Learning Optimal Advantage from Preferences and Mistaking it for Reward

DOI：
10.48550/arxiv.2310.02456
发表时间：
2023-10
期刊：
ArXiv
影响因子：
0
作者：
W. B. Knox;Stephane Hatgis-Kessell;Sigurdur O. Adalgeirsson;Serena Booth;Anca D. Dragan;Peter Stone;S. Niekum
通讯作者：
W. B. Knox;Stephane Hatgis-Kessell;Sigurdur O. Adalgeirsson;Serena Booth;Anca D. Dragan;Peter Stone;S. Niekum

Score Models for Offline Goal-Conditioned Reinforcement Learning

离线目标条件强化学习的评分模型

DOI：
发表时间：
2024
期刊：
International Conference on Learning Representations
影响因子：
0
作者：
Sikchi, H;Chitnis, R;Touati, A;Geramifard, A;Zhang, A;Niekum, S
通讯作者：
Niekum, S

Understanding Acoustic Patterns of Human Teachers Demonstrating Manipulation Tasks to Robots

了解人类教师向机器人演示操作任务的声学模式

DOI：
发表时间：
2022
期刊：
Proceedings of the International Conference on Intelligent Robots and Systems
影响因子：
0
作者：
Saran, A.;Desai, K.;Chang, M.L.;Lioutikov, R.;Thomaz, A.;Niekum, S.
通讯作者：
Niekum, S.

Models of human preference for learning reward functions