权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exploration and Learning in Heuristic Search

启发式搜索中的探索和学习

基本信息

批准号：
RGPIN-2020-04048
负责人：
Müller, Martin
金额：
$ 2.55万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750512
关键词：
Exploration Learning Heuristic Search

项目摘要

Research in Computing Science progresses towards the goal of solving ever more complex, difficult real-world problems. Intelligent automated decision-making requires modelling an application domain, and processing a potentially huge space of possible future alternatives. With my research group and my colleagues I study efficient search algorithms for solving hard decision-making problems. The current proposal focuses on two topics: the question of efficient exploration in large search spaces, and the use of machine learning methods. These topics have emerged as big common themes that drive much of my group's recent work in a diverse set of application areas. My research area has radically changed over the last five years. Systems that combine deep reinforcement learning with Monte Carlo Tree Search have achieved super-human performance in complex games such as Go, chess and shogi. DeepMind's Alpha Zero system has learned to play such games from scratch, without any human input regarding playing strategy. A beautiful aspect of these systems is the way in which they combine learning and search. They create a virtuous cycle where machine learning improves the search process, and the search also improves the learning. Despite the impressive successes of these algorithms, a number of problems of both practical and fundamental nature currently limits their more widespread use. A major practical problem is posed by the massive resources required to train the large and deep neural networks which encode the learned knowledge. More fundamental questions include: how to control the search process? And how to generalize such approaches when we don't have a perfect and efficient model of a problem? In future work with my students and colleagues, I want to study the following topics in depth: 1. Continue the study of exploration in heuristic search 2. Extend our methods to problems beyond games, which are less well specified 3. Study learning and search in cases where we know the "true result" due to their special mathematical structure To study these research questions, I plan to continue working on concrete applications which pose significant challenges. I want to continue building complete high performance systems, and test them on standard benchmarks as well as in competitions. A deeper understanding of these methods will likely lead to further significantly improved decision-making systems, which can search and learn better and faster, and can be used for less well-defined problems.

计算科学的研究朝着解决更复杂、更困难的现实世界问题的目标前进。智能自动化决策需要对应用领域进行建模，并处理未来可能替代方案的潜在巨大空间。我和我的研究小组和我的同事们研究高效的搜索算法来解决困难的决策问题。目前的提案侧重于两个主题：在大型搜索空间中进行有效探索的问题，以及机器学习方法的使用。这些主题已经成为大的共同主题，推动了我的团队最近在一系列不同的应用领域的大部分工作。在过去的五年里，我的研究领域发生了根本性的变化。将深度强化学习与蒙特卡洛树搜索相结合的系统在围棋、国际象棋和围棋等复杂游戏中取得了超人的表现。DeepMind的Alpha Zero系统已经学会了从零开始玩这样的游戏，而不需要任何关于游戏策略的人工输入。这些系统的一个优点是将学习和搜索结合在一起的方式。它们创造了一个良性循环，其中机器学习改善了搜索过程，搜索也改善了学习。尽管这些算法取得了令人印象深刻的成功，但目前许多实际和基本性质的问题限制了它们更广泛的使用。一个主要的实际问题是需要大量的资源来训练编码所学知识的大而深的神经网络。更基本的问题包括：如何控制搜索过程？当我们没有一个完美而有效的问题模型时，如何推广这些方法？在未来与我的学生和同事的工作中，我想深入研究以下主题：1.继续研究启发式搜索中的探索2.将我们的方法扩展到游戏之外的问题，这些问题没有很好地说明3.研究学习和搜索在由于特殊的数学结构而知道“真实结果”的情况下，为了研究这些研究问题，我计划继续致力于构成重大挑战的具体应用。我想继续构建完整的高性能系统，并在标准基准和比赛中测试它们。对这些方法的深入理解可能会导致进一步显著改进的决策系统，它可以更好、更快地搜索和学习，并可用于定义较不明确的问题。