权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Understanding and Combining Sequential Decision Making Methods

理解和结合顺序决策方法

基本信息

批准号：
RGPIN-2021-03099
负责人：
Valenzano, Richard
金额：
$ 1.75万
依托单位：
Ryerson University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=749936
关键词：
Understanding Combining Sequential Decision Making

项目摘要

If a robot is to successfully navigate from one location to another, it must identify an appropriate sequence of movements to get there. Similarly, a chess-playing program must select a sequence of "good" moves in response to its opponent if the program is to win. These are examples of sequential decision making tasks, which require that an autonomous agent make a sequence of effective decisions about how to act in order to achieve some desired objective. Sequential decision making is a fundamental task in the field of Artificial Intelligence, and it is becoming even more important as autonomous systems become a larger part of daily life. It has been examined by several different research communities, each considering different settings and using different solution methods. This includes reinforcement learning, automated planning, and game-playing systems. The long-term goal of my research agenda is to develop the underlying principles and techniques needed for effective sequential decision making. I will pursue two main lines of inquiry. The first is to improve our understanding of different sequential decision making methods --- specifically reinforcement learning, automated planning, and game-playing approaches --- with the objective of making it easier to develop systems to solve new or more complex sequential decision making tasks. This includes an examination of how different design decisions affect system performance, and an investigation into what properties of a problem lead to one technique being more effective than another. The second main line of research is in understanding how different sequential decision making methods can be used to improve and complement each other. AlphaGo is an excellent example of this idea, as this system used reinforcement learning to generate an evaluation function, and then used this function to guide a planning method during play. This general paradigm is a very powerful one, and better understanding when it is most appropriate or how to best combine other such methods should advance our ability to make systems with strong sequential decision making capabilities. Sequential decision making methods have been used in a variety of applications including DNA sequence alignment, underground sewer placement for new subdivisions, model-based diagnosis of faulty systems, and robotics. However, it is still quite complex to design a system that uses sequential decision making approaches for some new given task, due to the significant number of design decisions that need to be made. Having a better understanding of when different methods are most effective will help speed up this process, while identifying new ways of combining the various methods for sequential decision making will increase the size and complexity of problems that can be solved.

如果机器人要成功地从一个位置导航到另一个位置，它必须识别出到达那里的适当运动序列。同样，如果一个国际象棋程序想要获胜，它必须选择一系列“好”的棋子来回应对手。这些是顺序决策任务的例子，它要求自治代理就如何行动做出一系列有效的决策，以实现一些预期的目标。顺序决策是人工智能领域的一项基本任务，随着自治系统在日常生活中变得越来越重要，它变得更加重要。它已经由几个不同的研究团体进行了审查，每个团体都考虑了不同的背景，并使用了不同的解决方法。这包括强化学习、自动规划和游戏系统。我的研究议程的长期目标是开发有效的顺序决策所需的基本原则和技术。我将继续两条主要的调查路线。第一个是提高我们对不同的顺序决策方法的理解-特别是强化学习、自动规划和游戏方法-目的是使开发系统更容易地解决新的或更复杂的顺序决策任务。这包括检查不同的设计决策如何影响系统性能，以及调查问题的哪些属性导致一种技术比另一种技术更有效。研究的第二条主线是理解如何使用不同的顺序决策方法来改进和补充。AlphaGo就是这个想法的一个很好的例子，因为这个系统使用强化学习来生成一个评估函数，然后在博弈过程中使用这个函数来指导规划方法。这个通用的范例是一个非常强大的范例，更好地理解什么时候它是最合适的，或者如何最好地结合其他这样的方法，应该会提高我们制作具有强大顺序决策能力的系统的能力。序列决策方法已被用于各种应用中，包括DNA序列比对、新分区的地下下水道布置、基于模型的故障系统诊断和机器人学。然而，由于需要做出大量的设计决策，设计一个对某些新的给定任务使用顺序决策方法的系统仍然是相当复杂的。更好地了解不同方法何时最有效，将有助于加快这一进程，而确定将各种方法结合起来进行顺序决策的新方法将增加可以解决的问题的规模和复杂性。