EAGER: Training A Mobile Robot from Human Feedback via Income Learning

EAGER:通过收入学习根据人类反馈训练移动机器人

基本信息

  • 批准号:
    1643413
  • 负责人:
  • 金额:
    $ 7万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-08-01 至 2018-07-31
  • 项目状态:
    已结题

项目摘要

As cyberphysical systems become more widespread, there is an increasing number of complex tasks that they can usefully perform to assist human users. Tasks are typically formalized in the sequential decision framework, where the learner perceives states, takes actions, and receives a reward feedback signal. In practice, there is a critical need to learn directly from human users if such machines are to accomplish tasks outside of those pre-specified by the original developers. This project will develop new algorithms that can learn more effectively from humans. We will evaluate these algorithms in both virtual agents and on robot platforms. We will investigate whether and how non-expert humans can construct sequences of tasks of increasing difficulty, similar to how expert animal trainers shape tasks. Insights from these user studies will be leveraged to further improve our algorithms' abilities to learn from human trainers. Once successful, this project will make critical progress towards allowing non-technical users to be able to teach virtual and physical agents to perform complex tasks in a natural setting, familiar to many from previous experience in training household pets.This project is a part of a larger effort between Washington State University (WSU), North Carolina State University, and Brown University. The Brown effort will focus on deriving a well-motivated learning algorithm (tentatively called "I-learning") and understanding its theoretical properties. Of particular interest is the behavior of these algorithms in settings that are well studied in the reinforcement-learning community such as Markov decisions processes, k-armed bandit, and learning with function approximation. Algorithms will be implemented and tested on virtual and physical platforms (robots) and broader impacts on education and control will be pursued.
随着网络物理系统变得越来越普遍,它们可以有效地执行的复杂任务越来越多,以帮助人类用户。任务通常在顺序决策框架中形式化,在该框架中,学习者感知状态、采取行动并接收奖励反馈信号。在实践中,如果这样的机器要完成原始开发人员预先指定的任务之外的任务,就迫切需要直接向人类用户学习。该项目将开发能够更有效地向人类学习的新算法。我们将在虚拟代理和机器人平台上对这些算法进行评估。我们将调查非专家人类是否以及如何构建难度越来越高的任务序列,类似于专家动物训练员如何塑造任务。来自这些用户研究的见解将被用来进一步提高我们的算法向人类训练者学习的能力。一旦成功,该项目将在允许非技术用户能够教授虚拟和物理代理人在自然环境中执行复杂任务方面取得关键进展,这是许多人从以前训练家养宠物的经验中熟悉的。该项目是华盛顿州立大学(WSU)、北卡罗来纳州立大学和布朗大学之间更大努力的一部分。布朗的努力将集中在推导一个动机良好的学习算法(暂定称为“I-学习”),并理解其理论性质。特别令人感兴趣的是这些算法在强化学习社区中研究得很好的环境中的行为,例如马尔可夫决策过程、k臂强盗和函数逼近学习。算法将在虚拟和物理平台(机器人)上实施和测试,并将对教育和控制产生更广泛的影响。

项目成果

期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Showing versus doing: Teaching by demonstration
展示与实践:示范教学
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ho, M. K.;Littman, M. L.;MacGlashan, J.;Cushman, F.;Austerweil, J. L.
  • 通讯作者:
    Austerweil, J. L.
Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning
  • DOI:
    10.1007/s10458-015-9283-7
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    1.9
  • 作者:
    R. Loftin;Bei Peng;J. MacGlashan;M. Littman;Matthew E. Taylor;Jeff Huang;D. Roberts
  • 通讯作者:
    R. Loftin;Bei Peng;J. MacGlashan;M. Littman;Matthew E. Taylor;Jeff Huang;D. Roberts
Curriculum Design for Machine Learners in Sequential Decision Tasks
Teaching by Intervention: Working Backwards, Undoing Mistakes, or Correcting Mistakes?
干预教学:逆向工作、消除错误还是纠正错误?
Interactive Learning from Policy-Dependent Human Feedback
  • DOI:
  • 发表时间:
    2017-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. MacGlashan;Mark K. Ho;R. Loftin;Bei Peng;Guan Wang;David L. Roberts;Matthew E. Taylor;M. Littman-M.
  • 通讯作者:
    J. MacGlashan;Mark K. Ho;R. Loftin;Bei Peng;Guan Wang;David L. Roberts;Matthew E. Taylor;M. Littman-M.
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Michael Littman其他文献

Model-based reasoning
基于模型的推理
  • DOI:
    10.1016/j.compedu.2012.11.014
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Michael Jackson;Janusz Wojtusiak;Dayne Freitag;Eugene Subbotsky;Hans M. Nordahl;Jens C. Thimm;John Burgoyne;Roberto Poli;Thomas R. Guskey;Michael Davison;J. Magnotti;Adam M. Goodman;Jeffrey S. Katz;L. Verschaffel;W. Dooren;B. Smedt;Sean A. Fulop;Melva R. Grant;Leonid I. Perlovsky;B. De Smedt;P. Ghesquière;Dariusz Plewczynski;Leily Ziglari;P. Birjandi;Scott Rick;Roberto Weber;N. Seel;Maike Luhmann;Michael Eid;A. Antonietti;Barbara Colombo;Hamish Coates;Ali Radloff;P. Pirnay;Dirk Ifenthaler;Edward Swing;Craig A Anderson;David Tzuriel;Norman M. Weinberger;David C. Riccio;Patrick K. Cullen;J. Tallet;Megan L. Hoffman;David A. Washburn;Iván Izquierdo;Jorge H. Medina;M. Cammarota;A. Podolskiy;Joke Torbeyns;J. Kranzler;P. A. Kirschner;F. Kirschner;Kenn Apel;Julie A. Wolter;J. Masterson;JungMi Lee;Stefan N Groesser;Sabine Al;Philip Barker;Paul Schaik;I. Cutica;Monica Bucciarelli;K. Pata;Anna Strasser;A. Guillot;N. Hoyek;Christian Collet;Maria Opfermann;Roger Azevedo;Detlev Leutner;Thomas C. Toppino;Alice Y. Kolb;David A. Kolb;P. Brazdil;Ricardo Vilalta;Carlos Soares;C. Giraud;Jeffrey W. Bloom;Tyler Volk;Marwan A. Dwairy;Richard A. Swanson;Johanna Pöysä;K. Luwel;Theo Hug;Angélique Martin;Nicolas Guéguen;Craig Hassed;Fabio Alivernini;Michael Herczeg;M. Mastropieri;T. Scruggs;Angelika Rieder;S. Castillo;Gerardo Ayala;R. Low;R. Babuška;Barbara C. Buckley;Henry Markovits;Sungho Kim;In;Michael J. Spector;A. Towse;Charlie N. Lewis;Brian Francis;David N. Rapp;Pratim Sengupta;Sidney D’Mello;Serge Brand;J. Patry;Cees Klaassen;Sieglinde Weyringer;Alfred Weinberger;Marilla D. Svinicki;Jane S. Vogler;Andrew J. Martin;John M. Keller;ChanMin Kim;Gabriele Wulf;Lynne E. Parker;Michael Wunder;Michael Littman;Lisa J. Lehmberg;C. Victor Fung;Hannele Niemi;Steven Reiss;Piet Desmet;F. Cornillie;Helmut M. Niegemann;Steffi Heidig;Dominic W. Massaro;Charles Fadel;Cheryl Lemke;R. Grabner;Michael D. Basil;Daniel R. Little;Stephan Lewandowsky;Parmjit Singh;Zheng Liu;Marcelo H. Ang;W. Seah;Jack Heller;C. Randles;Kenneth S. Aigen
  • 通讯作者:
    Kenneth S. Aigen
Computably Continuous Reinforcement-Learning Objectives are PAC-learnable
可计算连续强化学习目标是 PAC 可学习的
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cambridge Yang;Michael Littman;Michael Carbin
  • 通讯作者:
    Michael Carbin

Michael Littman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Michael Littman', 18)}}的其他基金

Collaborative Research: American Innovations in an Age of Discovery: Teaching Science and Engineering through 3D-printed Historical Reconstructions
合作研究:发现时代的美国创新:通过 3D 打印历史重建教授科学与工程
  • 批准号:
    1508319
  • 财政年份:
    2015
  • 资助金额:
    $ 7万
  • 项目类别:
    Continuing Grant
RI: Medium: Collaborative Research: Teaching Computers to Follow Verbal Instructions
RI:媒介:协作研究:教计算机遵循口头指令
  • 批准号:
    1414931
  • 财政年份:
    2013
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
RI: Small: Understanding Value-based Multiagent Learning and Its Applications
RI:小:了解基于价值的多智能体学习及其应用
  • 批准号:
    1414935
  • 财政年份:
    2013
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Speeding Up Learning through Modeling the Pragmatics of Training
RI:小型:协作研究:通过培训语用建模加速学习
  • 批准号:
    1319618
  • 财政年份:
    2013
  • 资助金额:
    $ 7万
  • 项目类别:
    Continuing Grant
RI: Medium: Collaborative Research: Teaching Computers to Follow Verbal Instructions
RI:媒介:协作研究:教计算机遵循口头指令
  • 批准号:
    1065195
  • 财政年份:
    2011
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
RI: Small: Understanding Value-based Multiagent Learning and Its Applications
RI:小:了解基于价值的多智能体学习及其应用
  • 批准号:
    1018152
  • 财政年份:
    2010
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
Collaborative Research: Pilot Research on Language-Based Strategies for Creative Problem Solving
协作研究:基于语言的创造性问题解决策略的试点研究
  • 批准号:
    0757490
  • 财政年份:
    2008
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
RI: Collaborative Research: Feature Discovery and Benchmarks for Exportable Reinforcement Learning
RI:协作研究:可导出强化学习的特征发现和基准
  • 批准号:
    0713148
  • 财政年份:
    2007
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
HSD-DRU: The Role of Communication in the Dynamics of Effective Decision Making
HSD-DRU:沟通在有效决策动态中的作用
  • 批准号:
    0624191
  • 财政年份:
    2007
  • 资助金额:
    $ 7万
  • 项目类别:
    Standard Grant
Evaluating Next Generation Probabilistic Planners
评估下一代概率规划器
  • 批准号:
    0329153
  • 财政年份:
    2003
  • 资助金额:
    $ 7万
  • 项目类别:
    Continuing Grant

相似海外基金

Mixed-Methods Evaluation of Mobile Health Adaptive Learning Training for Pediatric Healthcare Workers in Tanzania
坦桑尼亚儿科医护人员移动健康适应性学习培训的混合方法评估
  • 批准号:
    10863717
  • 财政年份:
    2023
  • 资助金额:
    $ 7万
  • 项目类别:
How can digital learning technologies be exploited for training in the Conceptual Component: A systematic intervention study of mobile gaming for Dive
如何利用数字学习技术进行概念部分的培训:潜水移动游戏的系统干预研究
  • 批准号:
    2750877
  • 财政年份:
    2022
  • 资助金额:
    $ 7万
  • 项目类别:
    Studentship
Mixed-Methods Evaluation of Mobile Health Adaptive Learning Training for Pediatric Healthcare Workers in Tanzania
坦桑尼亚儿科医护人员移动健康适应性学习培训的混合方法评估
  • 批准号:
    10538413
  • 财政年份:
    2022
  • 资助金额:
    $ 7万
  • 项目类别:
Language-Concordant Mobile Health Training and Support for Behavioral Management of Urinary Incontinence for Women with Limited English Proficiency
为英语水平有限的女性提供语言一致的移动健康培训和尿失禁行为管理支持
  • 批准号:
    10772574
  • 财政年份:
    2022
  • 资助金额:
    $ 7万
  • 项目类别:
Remotely Monitored, Mobile health-supported High Intensity Interval Training after COVID-19 Critical Illness (REMM HIIT-Covid19)
COVID-19 危重疾病后远程监控、移动健康支持的高强度间歇训练 (REMM HIIT-Covid19)
  • 批准号:
    10490892
  • 财政年份:
    2021
  • 资助金额:
    $ 7万
  • 项目类别:
Remotely Monitored, Mobile health-supported High Intensity Interval Training after COVID-19 Critical Illness (REMM HIIT-Covid19)
COVID-19 危重疾病后远程监控、移动健康支持的高强度间歇训练 (REMM HIIT-Covid19)
  • 批准号:
    10341851
  • 财政年份:
    2021
  • 资助金额:
    $ 7万
  • 项目类别:
Anatomy-augmented visual training on mobile devices for rapid dissemination of best practices during pandemics
移动设备上的解剖增强视觉训练,可在大流行期间快速传播最佳实践
  • 批准号:
    10154515
  • 财政年份:
    2021
  • 资助金额:
    $ 7万
  • 项目类别:
Mobile Virtual Simulation Training in Essential Newborn Care for Healthcare Workers in Low and Middle Income Countries
为低收入和中等收入国家的医护人员提供基本新生儿护理的移动虚拟模拟培训
  • 批准号:
    10489852
  • 财政年份:
    2021
  • 资助金额:
    $ 7万
  • 项目类别:
Mobile Virtual Simulation Training in Essential Newborn Care for Healthcare Workers in Low and Middle Income Countries
为低收入和中等收入国家的医护人员提供基本新生儿护理的移动虚拟模拟培训
  • 批准号:
    10268053
  • 财政年份:
    2021
  • 资助金额:
    $ 7万
  • 项目类别:
Remotely Monitored, Mobile health-supported High Intensity Interval Training after COVID-19 Critical Illness (REMM HIIT-Covid19)
COVID-19 危重疾病后远程监控、移动健康支持的高强度间歇训练 (REMM HIIT-Covid19)
  • 批准号:
    10688052
  • 财政年份:
    2021
  • 资助金额:
    $ 7万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了