权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Toward Machine Competence: Combining Demonstration-based and Experience-based Machine Learning

迈向机器能力：结合基于演示和基于经验的机器学习

基本信息

批准号：
RGPIN-2018-04674
负责人：
Schuurmans, Dale
金额：
$ 5.39万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2020
资助国家：
加拿大
起止时间：
2020-01-01 至 2021-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=712609
关键词：
Toward Machine Competence Combining Demonstration

项目摘要

The relentless expansion of digital data combined with unprecedented computing power has created new opportunities to advance computer interpretation (e.g. natural language processing and computer perception) and computer decision making, through the analysis of massive data collections and extensive interaction with existing systems or people. Even though advances in machine learning have led to the recent progress in artificial intelligence, current machine learning methods remain limited in a fundamental way: they are either based on mimicking explicit human demonstration or rely on self discovery from reinforcement---each of which is inadequate on its own. Humans achieve competence through a combination of instruction, imitation and experience, yet machine learning methods are typically siloed between these perspectives. This research program will address the challenge of developing algorithms that can acquire competence through the integration of experience-based and demonstration-based learning. To base an effective integration on sound foundations, this research will also address core questions that arise in each of the supporting subareas; in particular, learning from demonstration (e.g. supervision) and learning from experience (e.g. reinforcement). The main foci are (1) unifying value and policy based reinforcement learning, (2) relating forward and inverse reinforcement learning, (3) extending on-policy and off-policy reinforcement learning methods to exploit demonstrations and evaluation oracles for structured output prediction, and (4) exploiting equilibrium concepts from game theory. In particular, for (1) I have recently developed a new unification of value-based and policy-based reinforcement learning, based on an observation that action values and policy probabilities are duals when entropy regularization is present. This unification also suggests effective new methods for combining forward and inverse reinforcement learning, and on-policy and off-policy data to accelerate learning, which form the basis for (2). A key aspect of these investigations will be to make more effective use of demonstrations, which are inherently off-policy, and apply the resulting techniques to structured output prediction problems that arise naturally in natural language processing, combinatorial optimization, and program synthesis, fulfilling (3). Finally, for (4) I will exploit novel connections between deep learning and game theory I have recently developed, which allow for improved stability and sparsity in deep and reinforcement learning methods.

数字数据的不断扩展与前所未有的计算能力相结合，通过分析大量数据集合以及与现有系统或人员的广泛交互，为推进计算机解释（例如自然语言处理和计算机感知）和计算机决策创造了新的机会。尽管机器学习的进步导致了人工智能的最新进展，但当前的机器学习方法仍然存在根本性的局限性：它们要么基于模仿明确的人类演示，要么依赖于强化的自我发现-每一种方法本身都是不够的。人类通过指令、模仿和经验的结合来获得能力，但机器学习方法通常在这些观点之间孤立。该研究计划将解决开发算法的挑战，这些算法可以通过整合基于经验和基于演示的学习来获得能力。为了使有效的一体化建立在坚实的基础上，这项研究还将解决每个辅助分领域出现的核心问题;特别是从示范中学习（例如监督）和从经验中学习（例如强化）。主要焦点是（1）统一基于价值和策略的强化学习，（2）将正向和反向强化学习联系起来，（3）扩展策略和非策略强化学习方法，以利用结构化输出预测的演示和评估预言，以及（4）利用博弈论中的均衡概念。特别是，对于（1），我最近开发了一种新的基于值和基于策略的强化学习的统一，基于一种观察，即当熵正则化存在时，动作值和策略概率是一致的。这种统一还提出了有效的新方法，用于结合正向和反向强化学习，以及策略和非策略数据来加速学习，这构成了（2）的基础。这些研究的一个关键方面将是更有效地利用演示，这些演示本质上是偏离政策的，并将所得技术应用于自然语言处理，组合优化和程序合成中自然出现的结构化输出预测问题，实现（3）。最后，对于（4），我将利用我最近开发的深度学习和博弈论之间的新联系，这可以提高深度和强化学习方法的稳定性和稀疏性。