权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CRII: RI: RUI: Performance guarantees for online apprenticeship learning with unknown features

CRII：RI：RUI：具有未知特征的在线学徒学习的性能保证

基本信息

批准号：
1850149
负责人：
Kenneth Bogert
金额：
$ 15.83万
依托单位：
University of North Carolina at Asheville
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-05-01 至 2022-04-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1850149&HistoricalAwards=false
关键词：
CRII RI RUI Performance guarantees

项目摘要

The surge of interest in robots that can be trained to perform in industries such as manufacturing and healthcare increases the need for improved learning methods. In one such method, apprenticeship learning, a robot learns to perform a task by watching an expert. This project's goals are to decrease the time required to set up the robot for learning and to offer college students hands-on robotic research activities. Most related work describes techniques that require the robot's programmer to identify features of the task. This project will reduce the programmer's setup work by using automatically generated features. A new method ensures the accuracy of the robotic learner by determining the number of observations required of the expert. Maximum Causal Entropy Inverse-Reinforcement Learning learns feature weights from demonstration, and like other maximum entropy models, offers strong performance guarantees and analysis possibilities. Proven generalization bounds are available that allow an estimate on the number of observed samples needed for a given expected level of error. However, they require knowledge of the covering number or complexity of the feature functions and/or known limits on the feature weights. When features are automatically extracted from a robot's sensor stream it is likely that many spurious features will be selected for use which could greatly increase the estimated number of samples needed, rendering the technique impractical. This project is developing an iterative, online variant of the maximum causal inverse-reinforcement learning algorithm that runs during the demonstrations and selects high-valued features as a critical subset which are then used to calculate the sample bounds. Once the required number of samples have been observed an offline inverse-reinforcement learning technique is run to ensure the feature weights are learned accurately. The new algorithm will be evaluated on a robot tasked with sorting previously-unknown objects. In this task, students will demonstrate the sorting of objects, then the robot will be required to do the same. Afterwards, the robot will be reset and the experiment repeats with a new set of objects. Critically, the software on the robot should not be changed or updated between these tasks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人们对机器人的兴趣激增，可以训练它们在制造业和医疗保健等行业中发挥作用，这增加了对改进学习方法的需求。在一种名为学徒学习的方法中，机器人通过观察专家来学习执行任务。这个项目的目标是减少设置机器人学习所需的时间，并为大学生提供动手的机器人研究活动。大多数相关工作描述了需要机器人程序员识别任务特征的技术。该项目将减少程序员的设置工作使用自动生成的功能。一种新的方法通过确定专家所需的观察次数来确保机器人学习者的准确性。最大因果熵逆强化学习通过演示学习特征权重，与其他最大熵模型一样，提供了强大的性能保证和分析可能性。已证实的概化范围可用于估计给定预期误差水平所需的观察样本数。然而，它们需要知道特征函数的覆盖数目或复杂性和/或特征权重的已知限制。当从机器人的传感器流中自动提取特征时，很可能会选择许多虚假特征使用，这可能会大大增加估计所需的样本数量，使该技术变得不切实际。该项目正在开发最大因果反向强化学习算法的迭代在线变体，该算法在演示过程中运行，并选择高值特征作为关键子集，然后用于计算样本界。一旦观察到所需数量的样本，就运行离线逆强化学习技术，以确保准确地学习特征权重。新算法将在一个负责对以前未知的物体进行分类的机器人上进行评估。在这个任务中，学生将演示物体的分类，然后机器人将被要求做同样的事情。之后，机器人将被重置，并对一组新的对象重复实验。关键的是，机器人上的软件不应该在这些任务之间进行更改或更新。这一裁决反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。