权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Leveraging Human and Agent Guidance for Improved Reinforcement Learning

利用人类和代理指导来改进强化学习

基本信息

批准号：
RGPIN-2021-02538
负责人：
Taylor, Matthew
金额：
$ 3.5万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750525
关键词：
Leveraging Human Agent Guidance Improved

项目摘要

Reinforcement learning (RL) is a type of machine learning that lets virtual or physical agents learn through experience, often finding novel solutions to difficult problems and exceeding human performance. RL has had many exciting successes from video game playing to data center optimization. Unfortunately, there are still relatively few real-world, deployed, RL success stories. One reason is that learning a policy can be very slow and there is an emphasis on agents learning from scratch. In contrast, this research will better allow RL agents to learn from others. This research will enable more deployments of RL in real-world scenarios by using existing knowledge from humans and agents to jumpstart initial behavior and reach high performing policies more quickly. An RL agent student can receive help from a human or agent teacher with multiple kinds of guidance, such as demonstration, action advice, or direct reward feedback. When successful, this research will enable RL to be successfully deployed in more real-world scenarios by focusing costly exploration and jumpstarting initial behavior to quickly reach high quality policies. The goal of this student/teacher framework is to improve the student's learning (relative to learning without guidance) without harming the agent's final performance. An additional goal can be to have the student outperform the teacher. The research is divided into three specific aims. Aim 1 focuses on how agents can best use human guidance, when different types of guidance are more or less useful, and how humans want to provide guidance. Aim 2 considers when a student should ask for guidance, or when a teacher should proactively provide guidance. Aim 3 considers the more general case when a student can learn from multiple teachers and when multiple students can learn from a single teacher. A key criticism of RL is that it can be slow to learn and that initial performance can be poor. By leveraging other agents, programs, human experts, and human non-experts as teachers, this research will help create opportunities across industries where RL successfully learns in physical and virtual settings to impact people. Not only will this research program help create Canadian jobs by using RL to improve processes in existing companies, it may help RL create new opportunities for businesses and startups that do not currently exist. Graduate students involved in this research will develop critical research, machine learning, and human-AI interaction skills. Other research groups will benefit from developed software, as it will enable standardization and make human subject studies in RL more accessible.

强化学习(RL)是一种机器学习，它让虚拟或物理主体通过经验学习，经常找到解决难题的新方法，并超越人类的表现。从视频游戏到数据中心优化，RL已经取得了许多令人兴奋的成功。不幸的是，现实世界中部署的RL成功案例仍然相对较少。一个原因是学习策略可能非常慢，而且强调代理从头开始学习。相比之下，这项研究将更好地让RL代理向其他人学习。这项研究将利用来自人类和代理的现有知识，更快地启动初始行为并达到高性能策略，从而使RL能够在现实世界场景中进行更多部署。RL代理学生可以从人类或代理教师那里获得帮助，并提供多种指导，如演示、行动建议或直接奖励反馈。成功后，这项研究将使RL能够通过集中成本高昂的探索和启动初始行为来快速达成高质量的策略，从而在更真实的场景中成功部署。这个学生/教师框架的目标是在不损害代理最终性能的情况下改善学生的学习(相对于没有指导的学习)。另一个目标可以是让学生的表现超过老师。本研究分为三个具体目标。目标1侧重于代理人如何最好地利用人类指导，当不同类型的指导或多或少有用时，以及人类希望如何提供指导。目标2考虑学生何时应该寻求指导，或者教师何时应该主动提供指导。目标3考虑了更一般的情况，即一个学生可以从多个老师那里学习，以及多个学生可以从一个老师那里学习。对RL的一个关键批评是，它学习起来可能很慢，而且最初的表现可能很差。通过利用其他代理、程序、人类专家和人类非专家作为教师，这项研究将有助于在RL在物理和虚拟环境中成功学习的行业中创造机会，以影响人们。这一研究项目不仅将通过使用RL改善现有公司的流程来帮助加拿大创造就业机会，还可能帮助RL为目前尚不存在的企业和初创企业创造新的机会。参与这项研究的研究生将发展批判性研究、机器学习和人类-人工智能交互技能。其他研究小组将从开发的软件中受益，因为它将实现标准化，并使人类研究更容易在RL中进行。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Taylor, Matthew其他文献

Parkinsonism and Positive Dopamine Transporter Imaging in a Patient with a Novel KMT2B Variant.

DOI：
10.1002/mdc3.13140
发表时间：
2021-02-01
期刊：
Movement disorders clinical practice
影响因子：
4
作者：
Feuerstein, Jeanne S;Taylor, Matthew;Berman, Brian D
通讯作者：
Berman, Brian D

NICE, in Confidence: An Assessment of Redaction to Obscure Confidential Information in Single Technology Appraisals by the National Institute for Health and Care Excellence

DOI：
10.1007/s40273-019-00818-0
发表时间：
2019-11-01
期刊：
PHARMACOECONOMICS
影响因子：
4.4
作者：
Bullement, Ash;Taylor, Matthew;Hatswell, Anthony James
通讯作者：
Hatswell, Anthony James

STEM Graduation Outcomes of the Rice University Emerging Scholars STEM Intervention and Summer Bridge Program

莱斯大学新兴学者STEM干预及暑期桥梁项目STEM毕业成果

DOI：
10.18260/1-2--35204
发表时间：
2020
期刊：
American Society for Engineering Education
影响因子：
0
作者：
Bradford, Brittany;Beier, Margaret;McSpedon, Megan;Wolf, Michael;Taylor, Matthew
通讯作者：
Taylor, Matthew

Budget impact analysis of everolimus for the treatment of hormone receptor positive, human epidermal growth factor receptor-2 negative (HER2-) advanced breast cancer in Kazakhstan

DOI：
10.3111/13696998.2014.969432
发表时间：
2015-03-01
期刊：
JOURNAL OF MEDICAL ECONOMICS
影响因子：
2.4
作者：
Lewis, Lily;Taylor, Matthew;Zufarovich, Abdrakhmanov Ramil
通讯作者：
Zufarovich, Abdrakhmanov Ramil

An Atypical 15q11.2 Microdeletion Not Involving SNORD116 Resulting in Prader-Willi Syndrome.

非典型15q11.2微缺失，不涉及SnORD116，导致prader-Willi综合征。

DOI：
10.1155/2023/4225092
发表时间：
2023
期刊：
Case reports in genetics
影响因子：
0
作者：
Crenshaw, Molly M;Graw, Sharon L;Slavov, Dobromir;Boyle, Theresa A;Pique, Daniel G;Taylor, Matthew;Baker, Peter 2nd
通讯作者：
Baker, Peter 2nd