权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Toward Machine Competence: Combining Demonstration-based and Experience-based Machine Learning

迈向机器能力：结合基于演示和基于经验的机器学习

基本信息

批准号：
RGPIN-2018-04674
负责人：
Schuurmans, Dale
金额：
$ 5.39万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=691733
关键词：
Toward Machine Competence Combining Demonstration

项目摘要

The relentless expansion of digital data combined with unprecedented computing power has created new opportunities to advance computer interpretation (e.g. natural language processing and computer perception) and computer decision making, through the analysis of massive data collections and extensive interaction with existing systems or people. Even though advances in machine learning have led to the recent progress in artificial intelligence, current machine learning methods remain limited in a fundamental way: they are either based on mimicking explicit human demonstration or rely on self discovery from reinforcement---each of which is inadequate on its own. Humans achieve competence through a combination of instruction, imitation and experience, yet machine learning methods are typically siloed between these perspectives.******This research program will address the challenge of developing algorithms that can acquire competence through the integration of experience-based and demonstration-based learning. To base an effective integration on sound foundations, this research will also address core questions that arise in each of the supporting subareas; in particular, learning from demonstration (e.g. supervision) and learning from experience (e.g. reinforcement). The main foci are (1) unifying value and policy based reinforcement learning, (2) relating forward and inverse reinforcement learning, (3) extending on-policy and off-policy reinforcement learning methods to exploit demonstrations and evaluation oracles for structured output prediction, and (4) exploiting equilibrium concepts from game theory.******In particular, for (1) I have recently developed a new unification of value-based and policy-based reinforcement learning, based on an observation that action values and policy probabilities are duals when entropy regularization is present. This unification also suggests effective new methods for combining forward and inverse reinforcement learning, and on-policy and off-policy data to accelerate learning, which form the basis for (2). A key aspect of these investigations will be to make more effective use of demonstrations, which are inherently off-policy, and apply the resulting techniques to structured output prediction problems that arise naturally in natural language processing, combinatorial optimization, and program synthesis, fulfilling (3). Finally, for (4) I will exploit novel connections between deep learning and game theory I have recently developed, which allow for improved stability and sparsity in deep and reinforcement learning methods.

数字数据的不断扩展与前所未有的计算能力相结合，通过分析大量数据收集和与现有系统或人员的广泛互动，为推进计算机解释（例如自然语言处理和计算机感知）和计算机决策创造了新的机会。尽管机器学习的进步导致了人工智能的最新进展，但目前的机器学习方法在根本上仍然有限：它们要么基于模仿明确的人类演示，要么依赖于强化中的自我发现——这两种方法本身都是不够的。人类通过指导、模仿和经验的结合来获得能力，而机器学习方法通常是在这些角度之间孤立的。******该研究计划将解决开发算法的挑战，这些算法可以通过基于经验和基于演示的学习的整合来获得能力。为了在坚实的基础上进行有效的整合，这项研究还将处理在每个支助分领域中出现的核心问题；特别是从示范中学习（如监督）和从经验中学习（如强化）。主要焦点是(1)统一基于值和策略的强化学习，(2)关联正向和逆强化学习，(3)扩展策略上和非策略上的强化学习方法，以利用结构化输出预测的演示和评估预言，以及(4)利用博弈论中的均衡概念。******特别地，对于(1)，我最近开发了一种新的基于值和基于策略的强化学习的统一，基于一个观察，即当熵正则化存在时，行为值和策略概率是对偶的。这种统一也提出了有效的新方法来结合正向和反向强化学习，以及策略和非策略数据来加速学习，这构成了(2)的基础。这些研究的一个关键方面将是更有效地利用固有的偏离策略的演示，并将所得技术应用于自然语言处理、组合优化和程序合成中自然出现的结构化输出预测问题，从而实现(3)。最后，对于(4)，我将利用我最近开发的深度学习和博弈论之间的新联系，这可以提高深度和强化学习方法的稳定性和稀疏性。