权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

An Adaptive Feedback System for Agent and Human Learning

用于代理和人类学习的自适应反馈系统

基本信息

批准号：
RGPIN-2019-07014
负责人：
Cutumisu, Maria
金额：
$ 2.04万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750497
关键词：
Adaptive Feedback System Agent Human

项目摘要

Studies of technological innovation show that most new ideas need constructive feedback to become successful. Currently, there are no user-adaptive feedback systems that can be embedded in any learning and ideation environment. My research aims to discover fundamental principles for user-adaptive feedback systems that provide constructive feedback and to design and implement a system to validate these principles. My ultimate research goal is to create a general-purpose feedback system that can be embedded into a structured (with known rules) or unstructured (with rules to be discovered) environment and can learn to extract the rules of that environment. The proposed research program builds on my NSERC-funded doctoral program devising 1) a reinforcement learning (RL) algorithm, ALeRT, which enabled agents in games to learn adaptively, and 2) a model of collaborative agent behaviours. It also builds on my postdoctoral research, when I blended artificial intelligence with education to develop an intelligent rule-based feedback system. I discovered the principle that learners who seek negative feedback perform better on tasks and standardized tests, learning more than those seeking favourable feedback. My short-term goal is to identify such principles by creating a user-adaptive feedback system that generates constructive feedback for a structured domain-knowledge content. My medium-term goal is to create a system that generates feedback for an unstructured domain-knowledge content. My long-term goal is to enable multiple agents to collaboratively solve problems. Two PhDs, 1 MSc and 1 undergraduate per year will be trained throughout this program. This research includes modeling and experimental streams in a well-rounded methodology to create user-adaptable feedback agents. First, I will devise an RL algorithm enabling agents to increase their learning rate in a structured environment. I will discover a utility function to evaluate an agent's actions following feedback and the agent's performance. I will extend my ALeRT algorithm, so agents can learn to prioritize feedback. Second, I will infer the rules of an unstructured environment. Based on deep RL techniques, the value function of each action would then be used to populate the action space and the rules of the system. Third, I will extend the multiagent collaborative behaviour model I developed to enable agents to exchange feedback in a scalable way to achieve a common goal. I will validate my approach by embedding the feedback system in a different environment (e.g., the UofA massive open online course, Problem Solving, Programming, and Video Games). The project will deepen our understanding and guide research on user-adaptive systems in which agents learn to improve their decision-making that is crucial in developing autonomous learners. It will also contribute significantly to training highly-qualified personnel for successful and innovative academic or industry careers in Canada.

对技术创新的研究表明，大多数新想法需要建设性的反馈才能获得成功。目前，还没有用户自适应反馈系统可以嵌入任何学习和思维环境。我的研究旨在发现用户自适应反馈系统的基本原则，提供建设性的反馈，并设计和实现一个系统来验证这些原则。我的最终研究目标是创建一个通用的反馈系统，它可以嵌入到结构化（具有已知规则）或非结构化（具有待发现的规则）的环境中，并且可以学习提取该环境的规则。拟议的研究计划建立在我的NSERC资助的博士课程设计1）强化学习（RL）算法，ALERT，它使游戏中的代理自适应学习，和2）协作代理行为的模型。它也建立在我的博士后研究基础上，当时我将人工智能与教育相结合，开发了一个基于规则的智能反馈系统。我发现了一个原则，即寻求负面反馈的学习者在任务和标准化测试中表现得更好，比寻求正面反馈的学习者学得更多。我的短期目标是通过创建一个用户自适应反馈系统来识别这些原则，该系统为结构化的领域知识内容生成建设性反馈。我的中期目标是创建一个为非结构化领域知识内容生成反馈的系统。我的长期目标是使多个代理能够协作解决问题。两个博士，1个硕士和1个本科生每年将在整个培训计划。这项研究包括建模和实验流在一个全面的方法来创建用户自适应反馈代理。首先，我将设计一种RL算法，使代理能够在结构化环境中提高学习率。我将发现一个效用函数来评估一个代理的行动后的反馈和代理的性能。我将扩展我的ALERT算法，以便代理可以学习优先考虑反馈。其次，我将推断非结构化环境的规则。基于深度RL技术，每个动作的值函数将用于填充动作空间和系统规则。第三，我将扩展我开发的多智能体协作行为模型，使智能体能够以可扩展的方式交换反馈，以实现共同的目标。我将通过将反馈系统嵌入到不同的环境中来验证我的方法（例如，UofA大规模开放在线课程，问题解决，编程和视频游戏）。该项目将加深我们对用户自适应系统的理解和指导研究，在该系统中，代理人学习改善他们的决策，这对发展自主学习者至关重要。它还将大大有助于培养高素质的人才在加拿大成功和创新的学术或行业职业生涯。