权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: RI: Medium: Bootstrapping natural feedback for reinforcement learning

合作研究：RI：中：引导强化学习的自然反馈

基本信息

批准号：
2212310
负责人：
Jacob Andreas
金额：
$ 120万
依托单位：
Massachusetts Institute of Technology
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-09-01 至 2025-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2212310&HistoricalAwards=false
关键词：
Collaborative Research RI Medium Bootstrapping

项目摘要

Many modern applications of artificial intelligence---from industrial automation to content recommendation---depend on machine learning algorithms that train automated agents to interact with their environments. But the two main approaches to interactive learning, reinforcement learning and imitation, require so much supervision or training time that it is prohibitively expensive to apply them to most real-world problems. Human learning does not suffer from this shortcoming, in large part because humans learn not from rewards or demonstrations, but instead from extended interaction with skilled teachers who use signals like gesture and language. This project will lay a foundation for research on interactive learning with rich feedback, from the perspective of individual agents, human--agent teams, and multi-agent populations. It will yield new capabilities for interactive training of automated agents, expanding both the effectiveness and accessibility of such techniques. Support for natural, interactive feedback will also improve the customizability of such systems, making on-the-fly adaptation or retraining accessible to users without significant computing power, data annotation resources or even programming ability.The project is organized into three broad research objectives. First, it will develop a formal framework for grounding feedback, using simple supervisory signals (provided during or after execution) to bootstrap learned interpretation of more complex feedback types. Second, it will develop algorithms for learning to solicit feedback. These algorithms will turn the one-way process of reinforcement learning into a two-way interaction, enabling agents to proactively query supervisors for information about the compositional and causal structure of the environment. Third, it will develop new mechanisms and techniques for providing feedback, via software tools that assist human supervisors in selecting or generating maximally informative feedback signals. Research under each of these objectives will be carried out in simulated environments, benchmarked using complex tasks spanning navigation, robot manipulation, and furniture assembly, and evaluated in terms of its benefits to sample efficiency, end-to-end development time, and usability.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人工智能的许多现代应用--从工业自动化到内容推荐--都依赖于机器学习算法，这些算法可以训练自动代理与环境进行交互。但是交互式学习的两种主要方法，强化学习和模仿，需要大量的监督或训练时间，以至于将它们应用于大多数现实世界的问题是非常昂贵的。人类的学习并不受这个缺点的影响，这在很大程度上是因为人类不是从奖励或示范中学习，而是通过与熟练的教师进行广泛的互动来学习，这些教师使用手势和语言等信号。该项目将从个体代理、人-代理团队和多代理群体的角度，为具有丰富反馈的交互式学习研究奠定基础。它将为自动代理的交互式培训提供新的能力，扩大这种技术的有效性和可访问性。支持自然的、交互式的反馈也将提高这种系统的可定制性，使用户在没有强大的计算能力、数据注释资源甚至编程能力的情况下也能进行动态适应或再培训。首先，它将开发一个正式的框架，用于接地反馈，使用简单的监督信号（在执行过程中或执行后提供）引导学习更复杂的反馈类型的解释。其次，它将开发学习征求反馈的算法。这些算法将把强化学习的单向过程变成双向交互，使智能体能够主动向监督者查询有关环境组成和因果结构的信息。第三，它将开发提供反馈的新机制和技术，通过软件工具帮助人类监督者选择或生成最大信息反馈信号。每个目标下的研究将在模拟环境中进行，使用跨越导航，机器人操作和家具装配的复杂任务进行基准测试，并根据其对样品效率，端到端开发时间，该奖项反映了NSF的法定使命，并被认为是值得通过使用基金会的知识价值和更广泛的影响审查评估的支持的搜索.

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Guiding Pretraining in Reinforcement Learning with Large Language Models

DOI：
10.48550/arxiv.2302.06692
发表时间：
2023-02
期刊：
影响因子：
0
作者：
Yuqing Du;Olivia Watkins;Zihan Wang;Cédric Colas;Trevor Darrell;P. Abbeel;Abhishek Gupta;Jacob Andreas
通讯作者：
Yuqing Du;Olivia Watkins;Zihan Wang;Cédric Colas;Trevor Darrell;P. Abbeel;Abhishek Gupta;Jacob Andreas

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jacob Andreas其他文献

Good-Enough Compositional Data Augmentation

DOI：
10.18653/v1/2020.acl-main.676
发表时间：
2019-04
期刊：
ArXiv
影响因子：
0
作者：
Jacob Andreas
通讯作者：
Jacob Andreas

Guided K-best Selection for Semantic Parsing Annotation

语义解析标注的引导 K-best 选择

DOI：
10.18653/v1/2022.acl-demo.11
发表时间：
2022
期刊：
ArXiv
影响因子：
0
作者：
Anton Belyy;Huang Chieh;Jacob Andreas;Emmanouil Antonios Platanios;Sam Thomson;Richard Shin;Subhro Roy;Aleksandr Nisnevich;Charles C. Chen;Benjamin Van Durme
通讯作者：
Benjamin Van Durme

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

从文字模型到世界模型：从自然语言到概率性思维语言的翻译

DOI：
发表时间：
2023
期刊：
arXiv.org
影响因子：
0
作者：
L. Wong;Gabriel Grand;Alexander K. Lew;Noah D. Goodman;Vikash K. Mansinghka;Jacob Andreas;J. Tenenbaum
通讯作者：
J. Tenenbaum