权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

An Adaptive Feedback System for Agent and Human Learning

用于代理和人类学习的自适应反馈系统

基本信息

批准号：
RGPIN-2019-07014
负责人：
Cutumisu, Maria
金额：
$ 2.04万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=739260
关键词：
Adaptive Feedback System Agent Human

项目摘要

Studies of technological innovation show that most new ideas need constructive feedback to become successful. Currently, there are no user-adaptive feedback systems that can be embedded in any learning and ideation environment. My research aims to discover fundamental principles for user-adaptive feedback systems that provide constructive feedback and to design and implement a system to validate these principles. My ultimate research goal is to create a general-purpose feedback system that can be embedded into a structured (with known rules) or unstructured (with rules to be discovered) environment and can learn to extract the rules of that environment. The proposed research program builds on my NSERC-funded doctoral program devising 1) a reinforcement learning (RL) algorithm, ALeRT, which enabled agents in games to learn adaptively, and 2) a model of collaborative agent behaviours. It also builds on my postdoctoral research, when I blended artificial intelligence with education to develop an intelligent rule-based feedback system. I discovered the principle that learners who seek negative feedback perform better on tasks and standardized tests, learning more than those seeking favourable feedback. My short-term goal is to identify such principles by creating a user-adaptive feedback system that generates constructive feedback for a structured domain-knowledge content. My medium-term goal is to create a system that generates feedback for an unstructured domain-knowledge content. My long-term goal is to enable multiple agents to collaboratively solve problems. Two PhDs, 1 MSc and 1 undergraduate per year will be trained throughout this program. This research includes modeling and experimental streams in a well-rounded methodology to create user-adaptable feedback agents. First, I will devise an RL algorithm enabling agents to increase their learning rate in a structured environment. I will discover a utility function to evaluate an agent's actions following feedback and the agent's performance. I will extend my ALeRT algorithm, so agents can learn to prioritize feedback. Second, I will infer the rules of an unstructured environment. Based on deep RL techniques, the value function of each action would then be used to populate the action space and the rules of the system. Third, I will extend the multiagent collaborative behaviour model I developed to enable agents to exchange feedback in a scalable way to achieve a common goal. I will validate my approach by embedding the feedback system in a different environment (e.g., the UofA massive open online course, Problem Solving, Programming, and Video Games). The project will deepen our understanding and guide research on user-adaptive systems in which agents learn to improve their decision-making that is crucial in developing autonomous learners. It will also contribute significantly to training highly-qualified personnel for successful and innovative academic or industry careers in Canada.

对技术创新的研究表明，大多数新想法需要建设性的反馈才能成功。目前，还没有可以嵌入到任何学习和思维环境中的用户自适应反馈系统。我的研究旨在发现提供建设性反馈的用户自适应反馈系统的基本原则，并设计和实现一个系统来验证这些原则。我的最终研究目标是创建一个通用的反馈系统，该系统可以嵌入到结构化(具有已知规则)或非结构化(具有待发现的规则)环境中，并且能够学习从该环境中提取规则。拟议的研究计划建立在我的NSERC资助的博士项目的基础上，该项目设计了1)强化学习(RL)算法，ALERT，它使游戏中的代理能够自适应学习，以及2)协作代理行为的模型。它还建立在我的博士后研究基础上，当时我将人工智能与教育相结合，开发了一个基于规则的智能反馈系统。我发现了这样一个原则，寻求负面反馈的学习者在任务和标准化测试中表现更好，比寻求正面反馈的学习者学到的更多。我的短期目标是通过创建一个用户自适应的反馈系统来确定这些原则，该系统为结构化的领域-知识内容生成建设性的反馈。我的中期目标是创建一个为非结构化领域--知识内容--生成反馈的系统。我的长期目标是使多个代理能够协作解决问题。两名博士，每年一名硕士和一名本科生将在整个项目中接受培训。这项研究包括以完善的方法建立模型和实验流，以创建用户可适应的反馈代理。首先，我将设计一个RL算法，使代理能够在结构化环境中提高他们的学习速度。我将发现一个效用函数来评估代理的行为反馈和代理的表现。我将扩展我的警报算法，以便工程师可以学习排列反馈的优先级。其次，我将推断非结构化环境的规则。基于深度RL技术，每个动作的值函数将被用来填充动作空间和系统规则。第三，我将扩展我开发的多代理协作行为模型，使代理能够以可扩展的方式交换反馈，以实现共同的目标。我将通过在不同的环境中嵌入反馈系统来验证我的方法(例如，UofA大型在线公开课、问题解决、编程和视频游戏)。该项目将加深我们对用户适应系统的理解和指导研究，在该系统中，代理学习如何改进他们的决策，这对培养自主学习者至关重要。它还将为在加拿大成功和创新的学术或行业职业培训高素质人才作出重大贡献。