权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Formal Models of Trainer Feedback for I-Learning Theoretical Guarantees

EAGER：I-Learning 理论保证的培训师反馈正式模型

基本信息

批准号：
1643411
负责人：
David Roberts
金额：
$ 7万
依托单位：
North Carolina State University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-08-15 至 2017-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1643411&HistoricalAwards=false
关键词：
EAGER Formal Models Trainer Feedback

项目摘要

As virtual agents and physical robots become more common, there is an increasing number of complex tasks they can usefully perform to assist humans. These tasks are typically formalized as sequential decision tasks, where robots and agents perceive states, take actions, and receive a reward feedback signal. In practice, there is a critical need to learn directly from human users---the majority of human users will not be able to directly program or fully specify a useful reward function. On the other hand, they can likely train an agent to perform tasks unanticipated by the original designer. Machine reinforcement learning (RL), a paradigm often used for solving sequential decision making tasks, was originally developed with inspiration from animal learning research from the applied behavior analysis (ABA) community. Existing RL approaches operationalize a limited set of ABA principles effectively; however, there are additional principles and properties from ABA research that are not well encapsulated in the existing RL formalisms, and that are likely sources of new inspiration for designing more effective RL techniques capable of learning from human teachers. The objective of this project is to leverage insights from animal training to reformulate the learning of sequential tasks from an agent learning alone in a fixed environment to an agent learning cooperatively with a competent, but not necessarily perfect, human teacher. Successful completion of this project will contribute a foundation of knowledge that will aide in the development new technologies to allow end users to customize the functions of their gadgets. This project is a part of a larger and collaborative effort between North Carolina State University (NCSU), Brown University, and Washington State University (WSU). The NCSU effort will include theoretical contributions along with empirical analyses and data collection. The emphasis of the NCSU portion of the project will be on the development of theoretical models of human feedback. When humans provide rewards to learning machines, describing the properties of the algorithms those machines use requires knowledge of how the humans provide feedback. For example, knowing when and how they make errors, the circumstances where they provide reinforcement or punishment, or use extinction, etc. Understanding the theoretical properties of I-Learning under different trainer paradigms will be the primary effort of NCSU project personnel. NCSU personnel will also work in concert with collaborators at Brown to use these models of feedback for describing the performance properties of I-Learning under different assumptions of trainer behavior. In addition, NCSU personnel will work with WSU collaborators to collect data from human trainers in virtual settings in order to validate and set the parameters of the theoretical models.

随着虚拟代理和物理机器人变得越来越普遍，它们可以有效地执行越来越多的复杂任务来帮助人类。这些任务通常被形式化为顺序决策任务，其中机器人和代理感知状态、采取行动并接收奖励反馈信号。在实践中，迫切需要直接从人类用户那里学习-大多数人类用户将无法直接编程或完全指定有用的奖励功能。另一方面，他们可能会训练一名代理执行原始设计者没有预料到的任务。机器强化学习(RL)是一种经常用于解决顺序决策任务的范式，最初是受应用行为分析(ABA)社区的动物学习研究的启发而发展起来的。现有的RL方法有效地操作了一组有限的ABA原则；然而，来自ABA研究的其他原则和性质没有很好地封装在现有的RL形式中，这些可能是设计能够从人类教师那里学习的更有效的RL技术的新灵感的来源。这个项目的目标是利用动物训练的洞察力来重新制定顺序任务的学习，从一个单独在固定环境中学习的代理学习到一个有能力但不一定完美的人类教师合作学习。该项目的成功完成将有助于开发新技术，使最终用户能够定制其小工具的功能，从而为开发新技术奠定基础。该项目是北卡罗来纳州立大学(NCSU)、布朗大学(Brown University)和华盛顿州立大学(WSU)之间更大规模的合作努力的一部分。NCSU的工作将包括理论贡献以及经验分析和数据收集。该项目NCSU部分的重点将是开发人类反馈的理论模型。当人类向学习机器提供奖励时，描述这些机器使用的算法的属性需要了解人类如何提供反馈。例如，了解他们何时以及如何犯错，他们在什么情况下提供强化或惩罚，或使用灭绝等。了解不同教员范式下i-Learning的理论属性将是NCSU项目人员的主要工作。NCSU人员还将与Brown的合作者合作，使用这些反馈模型来描述i-Learning在不同培训师行为假设下的性能属性。此外，NCSU人员将与WSU合作者合作，在虚拟环境中从人类教练员那里收集数据，以验证和设置理论模型的参数。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans

对速度的需求：调整代理动作速度以提高非专家的任务学习能力

DOI：
发表时间：
2016
期刊：
Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems
影响因子：
0
作者：
Peng, Bei;MacGlashan, James;Loftin, Robert;Littman, Michael L.;Roberts, David L.;Taylor, Matthew E.
通讯作者：
Taylor, Matthew E.

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

David Roberts其他文献

Understanding Middle Neolithic food and farming in and around the Stonehenge World Heritage Site: An integrated approach

了解巨石阵世界遗产地及其周围新石器时代中期的食物和农业：综合方法

DOI：
发表时间：
2019
期刊：
Journal of Archaeological Science: Reports
影响因子：
0
作者：
Fay Worley;R. Madgwick;R. Pelling;P. Marshall;J. Evans;A. Lamb;Inés López;C. Bronk Ramsey;E. Dunbar;P. Reimer;J. Vallender;David Roberts
通讯作者：
David Roberts

Neither Deep nor Shallow: A Classroom Experiment Testing the Orthographic Depth of Tone Marking in Kabiye (Togo)

不深也不浅：卡比耶（多哥）测试声调标记的正字法深度的课堂实验

DOI：
发表时间：
2016
期刊：
Language and Speech
影响因子：
1.8
作者：
David Roberts;Stephen L. Walter;Keith L. Snider
通讯作者：
Keith L. Snider

Work-based skills development: a context-engaged approach

基于工作的技能发展：结合情境的方法

DOI：
10.1108/heswbl-12-2015-0058
发表时间：
2016
期刊：
Higher Education, Skills and Work-based Learning
影响因子：
0
作者：
A. Felce;S. Perks;David Roberts
通讯作者：
David Roberts

KEK Preprint 2001-26

KEK 预印本 2001-26

DOI：
发表时间：
2001
期刊：
影响因子：
0
作者：
B. H. Behrens;W. T. Ford;A. Gritsan;H. Krieg;J. Roy;J. Smith;M. Zhao;J. Alexander;R. Baker;C. Bebek;B. Berger;Karl Berkelman;K. Bloom;V. Boisvert;D. G. Cassel;David S. Crowcroft;M. Dickson;S. V. Dombrowski;P. S. Drell;K. Ecklund;R. Ehrlich;A. D. Foland;Peter Gaidarev;L. Gibbons;B. Gittelman;S. W. Gray;D. L. Hartill;B. K. Heltsley;P. I. Hopman;J. Kandaswamy;Philip Kim;D. L. Kreinick;T. Lee;Yehan Liu;N. B. Mistry;C. Ng;E. Nordberg;M. Ogg;J. R. Patterson;Dean E. Peterson;D. Riley;A. Soffer;B. Valant;C. Ward;Michael Athanas;P. Avery;C. D. Jones;M. Lohner;S. Patton;C. Prescott;J. Yelton;J. Zheng;G. Brandenburg;R. A. Briere;A. Ershov;Y. S. Gao;D. Kim;R. Wilson;H. Yamamoto;T. Browder;Yan Li;Jorge Luis Rodriguez;T. Bergfeld;B. I. Eisenstein;J. Ernst;G. E. Gladding;G. D. Gollin;R. M. Hans;E. Johnson;I. Karliner;M. A. Marsh;M. Palmer;M. Selen;J. J. Thaler;K. Edwards;A. Bellerive;R. Janicek;D. B. Macfarlane;P. M. Patel;A. J. Sadoff;R. Ammar;P. Baringer;A. Bean;D. Besson;D. Coppage;Cynthia L. Darling;Robin E. P. Davis;S. A. Kotov;I. Kravchenko;N. Kwak;L. Zhou;Stuart B. Anderson;Y. Kubota;S. J. Lee;Jim O’Neill;R. Poling;T. Riehle;A. J. Smith;M. S. Alam;S. B. Athar;Ling Zhao;A. Mahmood;S. Timm;F. Wappler;A. Anastassov;J. E. Duboscq;D. Fujino;K. Gan;T. L. Hart;K. Honscheid;H. Kagan;R. Kass;Jason Sang Hun Lee;M. Spencer;M. Sung;A. Undrus;Andreas Wolf;M. M. Zoeller;B. Nemati;S. J. Richichi;W. R. Ross;H. Severini;P. Skubic;M. Bishai;J. Fast;J. W. Hinson;N. Menon;D. H. Miller;E. I. Shibata;I. Shipsey;M. Yurko;Steven M Glenn;Y. Kwon;S. Roberts;E. H. Thorndike;C. Jessop;K. Lingel;H. Marsiske;M. Perl;V. Savinov;D. Ugolini;R. Wang;X.;T. E. Coan;V. Fadeyev;I. Korolkov;Y. Maravin;I. Narsky;V. Shelkov;J. Staeck;R. Stroynowski;I. Volobouev;J. Ye;Marina Artuso;F. Azfar;A. O. Efimov;M. Goldberg;Dong‐Qiang He;S. Kopp;G. Moneti;R. Mountain;S. Schuh;Tomasz Skwarnicki;S. Stone;G. Viehhauser;X. Xing;J. Bartelt;S. E. Csorna;V. Jain;K. W. McLean;S. Marka;R. Godang;K. Kinoshita;I. Lai;P. Pomianowski;S. Schrenk;G. Bonvicini;D. Cinabro;R. Greene;L. Perera;G. Zhou;M. Chadha;Simon Chan;G. Eigen;Js Miller;Cp O'Grady;M. Schmidtler;J. Urheim;A. Weinstein;F. Würthwein;D. W. Bliss;G. Masek;H. Paar;S. Prell;Varun Sharma;D. Asner;J. Gronberg;T. Hill;D. J. Lange;R. J. Morrison;H. Nelson;T. Nelson;David Roberts;A. Ryd
通讯作者：
A. Ryd