权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exploration and Learning in Heuristic Search

启发式搜索中的探索和学习

基本信息

批准号：
RGPIN-2020-04048
负责人：
Müller, Martin
金额：
$ 2.55万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2020
资助国家：
加拿大
起止时间：
2020-01-01 至 2021-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=716407
关键词：
Exploration Learning Heuristic Search

项目摘要

Research in Computing Science progresses towards the goal of solving ever more complex, difficult real-world problems. Intelligent automated decision-making requires modelling an application domain, and processing a potentially huge space of possible future alternatives. With my research group and my colleagues I study efficient search algorithms for solving hard decision-making problems. The current proposal focuses on two topics: the question of efficient exploration in large search spaces, and the use of machine learning methods. These topics have emerged as big common themes that drive much of my group's recent work in a diverse set of application areas. My research area has radically changed over the last five years. Systems that combine deep reinforcement learning with Monte Carlo Tree Search have achieved super-human performance in complex games such as Go, chess and shogi. DeepMind's Alpha Zero system has learned to play such games from scratch, without any human input regarding playing strategy. A beautiful aspect of these systems is the way in which they combine learning and search. They create a virtuous cycle where machine learning improves the search process, and the search also improves the learning. Despite the impressive successes of these algorithms, a number of problems of both practical and fundamental nature currently limits their more widespread use. A major practical problem is posed by the massive resources required to train the large and deep neural networks which encode the learned knowledge. More fundamental questions include: how to control the search process? And how to generalize such approaches when we don't have a perfect and efficient model of a problem? In future work with my students and colleagues, I want to study the following topics in depth: 1. Continue the study of exploration in heuristic search 2. Extend our methods to problems beyond games, which are less well specified 3. Study learning and search in cases where we know the "true result" due to their special mathematical structure To study these research questions, I plan to continue working on concrete applications which pose significant challenges. I want to continue building complete high performance systems, and test them on standard benchmarks as well as in competitions. A deeper understanding of these methods will likely lead to further significantly improved decision-making systems, which can search and learn better and faster, and can be used for less well-defined problems.

计算科学的研究朝着解决越来越复杂，困难的现实世界问题的目标前进。智能自动化决策需要对应用程序域进行建模，并处理潜在的巨大空间的未来可能的替代方案。我和我的研究小组以及同事们一起研究解决困难决策问题的高效搜索算法。目前的提案集中在两个主题上：在大搜索空间中进行有效探索的问题，以及机器学习方法的使用。这些主题已经成为一个大的共同主题，推动了我的团队最近在不同应用领域的大部分工作。我的研究领域在过去五年里发生了根本性的变化。将联合收割机深度强化学习与蒙特卡洛树搜索相结合的系统在围棋、国际象棋和将棋等复杂游戏中实现了超人的表现。DeepMind的Alpha Zero系统已经学会了从头开始玩这样的游戏，没有任何关于游戏策略的人类输入。这些系统的一个美丽的方面是它们将联合收割机学习和搜索结合在一起的方式。它们创造了一个良性循环，机器学习改善了搜索过程，搜索也改善了学习。尽管这些算法取得了令人印象深刻的成功，但一些实际和基本性质的问题目前限制了它们的更广泛的使用。一个主要的实际问题是训练编码学习知识的大型深度神经网络所需的大量资源。更基本的问题包括：如何控制搜索过程？当我们没有一个完美有效的问题模型时，如何推广这些方法？在未来的工作中，我想与我的学生和同事深入研究以下主题： 1.继续探索启发式搜索的研究 2.将我们的方法扩展到游戏之外的问题，这些问题不太明确 3.研究学习和搜索的情况下，我们知道“真正的结果”，由于其特殊的数学结构为了研究这些研究问题，我计划继续致力于提出重大挑战的具体应用。我想继续构建完整的高性能系统，并在标准基准测试和比赛中测试它们。对这些方法的更深入理解可能会导致决策系统的进一步显著改进，这些系统可以更好更快地搜索和学习，并可用于定义不太明确的问题。