权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: HCC: Medium: Aligning Robot Representations with Humans

合作研究：HCC：媒介：使机器人表示与人类保持一致

基本信息

批准号：
2310757
负责人：
Anca Dragan
金额：
$ 42.05万
依托单位：
University of California-Berkeley
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-15 至 2026-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2310757&HistoricalAwards=false
关键词：
Collaborative Research HCC Medium Aligning

项目摘要

This project seeks to make robots more robust and aligned with human preferences and values. Traditionally, robot behaviors and objectives were trained to include a set of hand-crafted features (i.e., variables represented in the data) that reflect task-relevant aspects of the environment. Using well-chosen features is very data-efficient, but it is unrealistic for human engineers to identify and write code ahead of time for all the features that could matter. Training modern high-capacity models from a lot of data is a great alternative, as long as we do not probe the learned models on novel (out-of-distribution) inputs. The reason these models fail to generalize to out-of-distribution inputs is that they will generally fail to learn the correct representation, comprising the features that matter, and instead pick up on spurious patterns in the data. The central goal of this project is to enable robots to arrive at the underlying correct representation for objectives (and, hence, behaviors). And since learning the objective function---what the human user wants---is fundamentally about humans, this work proposes that only the human can determine what actually matters vs. what is spurious. The research will introduce the problem of aligning robot representations to humans. The key observation behind the project is that traditional input used in learning, such as demonstrations or comparisons, which is designed to teach the robot the full task, is not ideal for aligning the robot’s representation. With representation alignment defined as a problem, there is the opportunity to design new types of human feedback that help the robot explicitly isolate the right representation. The project will develop new types of human feedback and algorithms for efficiently learning from them to arrive at an aligned representation. Preliminary work leveraged this observation to introduce feature traces---a novel type of human input through which users can teach the robot about specific features they care about. The project will pursue four objectives that together tackle the aspects of aligning robot representations with humans: (1) Teaching one feature at a time, beyond feature traces: It will investigate new input types for aligning robot representations with users, contribute active learning algorithms that help the human teacher provide the most informative input, and build transparency tools that enable robots to teach back to the user their current understanding of the representation. (2) Extracting features all at once from new, representation-specific human input: It will investigate new human input types that teach the full representation all at once by combining self-supervised representation learning methods with human-centric representation learning. (3) Using a correct representation in the right way: Given a new task, the robot needs to learn which features matter and in which contexts. (4) Extending earlier work to policy learning: It will extend new tools to the policy learning setting and use the lens of human-aligned representations to enable better policy generalization to new users and to improve goal mis-generalization in reinforcement learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目旨在使机器人更加强大，并符合人类的偏好和价值观。传统上，机器人行为和目标被训练为包括一组手工制作的特征（即，数据中表示的变量），这些变量反映了环境中与任务相关的方面。使用精心选择的功能是非常有效的数据，但对于人类工程师来说，提前识别和编写所有重要功能的代码是不现实的。从大量数据中训练现代高容量模型是一个很好的选择，只要我们不探索新的（分布外）输入的学习模型。这些模型无法推广到分布外输入的原因是，它们通常无法学习正确的表示，包括重要的特征，而是在数据中拾取虚假模式。该项目的中心目标是使机器人能够达到目标（以及行为）的基本正确表示。由于学习目标函数--人类用户想要什么--从根本上讲是关于人类的，这项工作提出，只有人类才能确定什么是真正重要的，什么是虚假的。这项研究将介绍机器人表示与人类对齐的问题。该项目背后的关键观察结果是，用于学习的传统输入，例如演示或比较，旨在教机器人完成全部任务，对于对齐机器人的表示并不理想。随着表示对齐被定义为一个问题，有机会设计新类型的人类反馈，帮助机器人明确隔离正确的表示。该项目将开发新型的人类反馈和算法，以有效地从中学习，从而获得对齐的表示。初步工作利用这一观察结果引入了特征跟踪--一种新型的人类输入，用户可以通过它来教机器人他们关心的特定特征。该项目将追求四个目标，共同解决机器人表示与人类对齐的问题：（1）一次教授一个特征，超越特征轨迹：它将研究新的输入类型，用于将机器人表示与用户对齐，贡献主动学习算法，帮助人类教师提供最丰富的输入，并建立透明工具，使机器人能够教回用户他们目前对表示的理解。(2)从新的、特定于表示的人类输入中一次性提取特征：它将研究新的人类输入类型，通过将自监督表示学习方法与以人为中心的表示学习相结合，一次性教授完整的表示。(3)以正确的方式使用正确的表示：给定一个新任务，机器人需要学习哪些特征重要以及在哪些上下文中重要。(4)将早期的工作扩展到政策学习：它将把新工具扩展到政策学习环境，并使用人类对齐表示的透镜，使新用户能够更好地进行政策泛化，并改善强化学习中的目标错误泛化。该奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Anca Dragan其他文献

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

学习时间距离：对比后继特征可以为决策提供度量结构

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Vivek Myers;Chongyi Zheng;Anca Dragan;Sergey Levine;Benjamin Eysenbach
通讯作者：
Benjamin Eysenbach

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

当你的人工智能欺骗你时：奖励学习中人类评估者的部分可观察性挑战

DOI：
10.48550/arxiv.2402.17747
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Leon Lang;Davis Foote;Stuart J. Russell;Anca Dragan;Erik Jenner;Scott Emmons
通讯作者：
Scott Emmons