Leveraging Human and Agent Guidance for Improved Reinforcement Learning

利用人类和代理指导来改进强化学习

基本信息

  • 批准号:
    RGPIN-2021-02538
  • 负责人:
  • 金额:
    $ 3.5万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

Reinforcement learning (RL) is a type of machine learning that lets virtual or physical agents learn through experience, often finding novel solutions to difficult problems and exceeding human performance. RL has had many exciting successes from video game playing to data center optimization. Unfortunately, there are still relatively few real-world, deployed, RL success stories. One reason is that learning a policy can be very slow and there is an emphasis on agents learning from scratch. In contrast, this research will better allow RL agents to learn from others. This research will enable more deployments of RL in real-world scenarios by using existing knowledge from humans and agents to jumpstart initial behavior and reach high performing policies more quickly. An RL agent student can receive help from a human or agent teacher with multiple kinds of guidance, such as demonstration, action advice, or direct reward feedback. When successful, this research will enable RL to be successfully deployed in more real-world scenarios by focusing costly exploration and jumpstarting initial behavior to quickly reach high quality policies. The goal of this student/teacher framework is to improve the student's learning (relative to learning without guidance) without harming the agent's final performance. An additional goal can be to have the student outperform the teacher. The research is divided into three specific aims. Aim 1 focuses on how agents can best use human guidance, when different types of guidance are more or less useful, and how humans want to provide guidance. Aim 2 considers when a student should ask for guidance, or when a teacher should proactively provide guidance. Aim 3 considers the more general case when a student can learn from multiple teachers and when multiple students can learn from a single teacher. A key criticism of RL is that it can be slow to learn and that initial performance can be poor. By leveraging other agents, programs, human experts, and human non-experts as teachers, this research will help create opportunities across industries where RL successfully learns in physical and virtual settings to impact people. Not only will this research program help create Canadian jobs by using RL to improve processes in existing companies, it may help RL create new opportunities for businesses and startups that do not currently exist. Graduate students involved in this research will develop critical research, machine learning, and human-AI interaction skills. Other research groups will benefit from developed software, as it will enable standardization and make human subject studies in RL more accessible.
强化学习(RL)是一种机器学习,它让虚拟或物理主体通过经验学习,经常找到解决难题的新方法,并超越人类的表现。从视频游戏到数据中心优化,RL已经取得了许多令人兴奋的成功。不幸的是,现实世界中部署的RL成功案例仍然相对较少。一个原因是学习策略可能非常慢,而且强调代理从头开始学习。相比之下,这项研究将更好地让RL代理向其他人学习。这项研究将利用来自人类和代理的现有知识,更快地启动初始行为并达到高性能策略,从而使RL能够在现实世界场景中进行更多部署。RL代理学生可以从人类或代理教师那里获得帮助,并提供多种指导,如演示、行动建议或直接奖励反馈。成功后,这项研究将使RL能够通过集中成本高昂的探索和启动初始行为来快速达成高质量的策略,从而在更真实的场景中成功部署。这个学生/教师框架的目标是在不损害代理最终性能的情况下改善学生的学习(相对于没有指导的学习)。另一个目标可以是让学生的表现超过老师。本研究分为三个具体目标。目标1侧重于代理人如何最好地利用人类指导,当不同类型的指导或多或少有用时,以及人类希望如何提供指导。目标2考虑学生何时应该寻求指导,或者教师何时应该主动提供指导。目标3考虑了更一般的情况,即一个学生可以从多个老师那里学习,以及多个学生可以从一个老师那里学习。对RL的一个关键批评是,它学习起来可能很慢,而且最初的表现可能很差。通过利用其他代理、程序、人类专家和人类非专家作为教师,这项研究将有助于在RL在物理和虚拟环境中成功学习的行业中创造机会,以影响人们。这一研究项目不仅将通过使用RL改善现有公司的流程来帮助加拿大创造就业机会,还可能帮助RL为目前尚不存在的企业和初创企业创造新的机会。参与这项研究的研究生将发展批判性研究、机器学习和人类-人工智能交互技能。其他研究小组将从开发的软件中受益,因为它将实现标准化,并使人类研究更容易在RL中进行。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Taylor, Matthew其他文献

Parkinsonism and Positive Dopamine Transporter Imaging in a Patient with a Novel KMT2B Variant.
  • DOI:
    10.1002/mdc3.13140
  • 发表时间:
    2021-02-01
  • 期刊:
  • 影响因子:
    4
  • 作者:
    Feuerstein, Jeanne S;Taylor, Matthew;Berman, Brian D
  • 通讯作者:
    Berman, Brian D
NICE, in Confidence: An Assessment of Redaction to Obscure Confidential Information in Single Technology Appraisals by the National Institute for Health and Care Excellence
  • DOI:
    10.1007/s40273-019-00818-0
  • 发表时间:
    2019-11-01
  • 期刊:
  • 影响因子:
    4.4
  • 作者:
    Bullement, Ash;Taylor, Matthew;Hatswell, Anthony James
  • 通讯作者:
    Hatswell, Anthony James
STEM Graduation Outcomes of the Rice University Emerging Scholars STEM Intervention and Summer Bridge Program
莱斯大学新兴学者STEM干预及暑期桥梁项目STEM毕业成果
  • DOI:
    10.18260/1-2--35204
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bradford, Brittany;Beier, Margaret;McSpedon, Megan;Wolf, Michael;Taylor, Matthew
  • 通讯作者:
    Taylor, Matthew
Budget impact analysis of everolimus for the treatment of hormone receptor positive, human epidermal growth factor receptor-2 negative (HER2-) advanced breast cancer in Kazakhstan
  • DOI:
    10.3111/13696998.2014.969432
  • 发表时间:
    2015-03-01
  • 期刊:
  • 影响因子:
    2.4
  • 作者:
    Lewis, Lily;Taylor, Matthew;Zufarovich, Abdrakhmanov Ramil
  • 通讯作者:
    Zufarovich, Abdrakhmanov Ramil
An Atypical 15q11.2 Microdeletion Not Involving SNORD116 Resulting in Prader-Willi Syndrome.
非典型15q11.2微缺失,不涉及SnORD116,导致prader-Willi综合征。
  • DOI:
    10.1155/2023/4225092
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Crenshaw, Molly M;Graw, Sharon L;Slavov, Dobromir;Boyle, Theresa A;Pique, Daniel G;Taylor, Matthew;Baker, Peter 2nd
  • 通讯作者:
    Baker, Peter 2nd

Taylor, Matthew的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Taylor, Matthew', 18)}}的其他基金

Leveraging Human and Agent Guidance for Improved Reinforcement Learning
利用人类和代理指导来改进强化学习
  • 批准号:
    RGPAS-2021-00029
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Leveraging Human and Agent Guidance for Improved Reinforcement Learning
利用人类和代理指导来改进强化学习
  • 批准号:
    RGPAS-2021-00029
  • 财政年份:
    2021
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Diversity in multi-agent systems for successful real-world deployments
多代理系统的多样性可实现成功的实际部署
  • 批准号:
    561116-2020
  • 财政年份:
    2021
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Alliance Grants
Leveraging Human and Agent Guidance for Improved Reinforcement Learning
利用人类和代理指导来改进强化学习
  • 批准号:
    RGPIN-2021-02538
  • 财政年份:
    2021
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Discovery Grants Program - Individual
Human-AI interactions in real-world complex uncertain environments using a comprehensive reinforcement learning framework
使用综合强化学习框架在现实世界复杂的不确定环境中进行人机交互
  • 批准号:
    554164-2020
  • 财政年份:
    2021
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Alliance Grants
Human-AI interactions in real-world complex uncertain environments using a comprehensive reinforcement learning framework
使用综合强化学习框架在现实世界复杂的不确定环境中进行人机交互
  • 批准号:
    554164-2020
  • 财政年份:
    2020
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Alliance Grants
Diversity in multi-agent systems for successful real-world deployments
多代理系统的多样性可实现成功的实际部署
  • 批准号:
    561116-2020
  • 财政年份:
    2020
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Alliance Grants
Isolation and identification of neuroprotective phytochemicals from tropical flora of southern Belize: An ethnobotanical study of plants traditionally used by Q'eqchi' Maya healers to treat dementia
从伯利兹南部热带植物群中分离和鉴定具有神经保护作用的植物化学物质:对 Qeqchi 玛雅治疗师传统上用于治疗痴呆症的植物进行民族植物学研究
  • 批准号:
    426963-2012
  • 财政年份:
    2012
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Master's
Influence of neuronal cholesterol biosynthesis on hedgehog signaling
神经元胆固醇生物合成对刺猬信号传导的影响
  • 批准号:
    434215-2012
  • 财政年份:
    2012
  • 资助金额:
    $ 3.5万
  • 项目类别:
    University Undergraduate Student Research Awards
Detection of very faint transients in supernova surveys
在超新星巡天中检测非常微弱的瞬变
  • 批准号:
    414554-2011
  • 财政年份:
    2011
  • 资助金额:
    $ 3.5万
  • 项目类别:
    University Undergraduate Student Research Awards

相似国自然基金

靶向Human ZAG蛋白的降糖小分子化合物筛选以及疗效观察
  • 批准号:
  • 批准年份:
    2025
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
HBV S-Human ESPL1融合基因在慢性乙型肝炎发病进程中的分子机制研究
  • 批准号:
    81960115
  • 批准年份:
    2019
  • 资助金额:
    34.0 万元
  • 项目类别:
    地区科学基金项目
基于自适应表面肌电模型的下肢康复机器人“Human-in-Loop”控制研究
  • 批准号:
    61005070
  • 批准年份:
    2010
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Learning Coordination for Multi-Autonomous Multi-Human (MAMH) Agent Systems with Guaranteed Safety
具有安全保证的多自主多人(MAMH)代理系统的学习协调
  • 批准号:
    2332210
  • 财政年份:
    2024
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Standard Grant
A closed-loop human–agent learning framework to enhance decision making
用于增强决策的闭环人类代理学习框架
  • 批准号:
    DE220100265
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Discovery Early Career Researcher Award
Alleviating Human Pain and Anxiety by Using a Cyber-Physical Agent System
使用网络物理代理系统减轻人类的痛苦和焦虑
  • 批准号:
    22K19784
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Collaborative Research: Frameworks: Simulating Autonomous Agents and the Human-Autonomous Agent Interaction
协作研究:框架:模拟自主代理和人机交互
  • 批准号:
    2209791
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Simulating Autonomous Agents and the Human-Autonomous Agent Interaction
协作研究:框架:模拟自主代理和人机交互
  • 批准号:
    2209795
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Standard Grant
Improving Human-Agent Interaction Using EEG, Virtual Reality and the Interactive Brain Hypothesis
利用脑电图、虚拟现实和交互式大脑假说改善人机交互
  • 批准号:
    2743901
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Studentship
Canada-UK AI 2019 : The self as agent-environment nexus - crossing disciplinary boundaries to help human selves and anticipate artificial selves
加拿大-英国人工智能 2019:自我作为主体与环境的联系 - 跨越学科界限帮助人类自我并预测人工自我
  • 批准号:
    548624-2019
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Alliance Grants
Leveraging Human and Agent Guidance for Improved Reinforcement Learning
利用人类和代理指导来改进强化学习
  • 批准号:
    RGPAS-2021-00029
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Collaborative Research: Frameworks: Simulating Autonomous Agents and the Human-Autonomous Agent Interaction
协作研究:框架:模拟自主代理和人机交互
  • 批准号:
    2209794
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Standard Grant
FW-HTF-P: Human-Agent Teaming for the Future of Work in Aircraft Manufacturing
FW-HTF-P:飞机制造行业未来工作的人类代理团队
  • 批准号:
    2129113
  • 财政年份:
    2022
  • 资助金额:
    $ 3.5万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了