权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Reinforcement learning approach to the optimal stopping problem

最优停止问题的强化学习方法

基本信息

批准号：
RGPIN-2021-02760
负责人：
Lee, ChiGuhn
金额：
$ 2.62万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=759665
关键词：
Reinforcement learning approach optimal stopping

项目摘要

We propose to address one of the most studied optimization problems, in which decision maker tries to choose a time to take a particular action to maximize reward from a stochastic process. This problem is known as the optimal stopping problem. The solution should map a given decision making situation to an action leading to best outcome. As such mapping should be available for all possible situations, we are required to find a function as a solution, which is often called policy. Uncertainties may come from a variety of sources, including system state and its evolution, sojourn times before state change, white noise in the observable signal, and so on. The simple structure in action selection - stop vs. continuation - allows analytical solutions in many instances of the optimal stopping problem. However, when the dimensionality of the state space increases, as often the case in most realistic situations, optimality is usually lost and heuristic search algorithm will have to be designed case by case. Therefore, the main objective of the study is to develop reinforcement learning algorithms for the optimal stopping problem that is efficient with high dimensional state space as well as a decomposition framework so that an embedded optimal stopping problem can be solved as part of a larger problem. Three specific objectives have been identified: (1) solving the optimal stopping problem as a supervised learning problem, (2) developing a reinforcement learning problem that is customized to the optimal stopping problem and (2) decomposing a general sequential decision problem so that an optimal stopping problem can be solved as a sub-problem. The impact of the proposed problem is likely significant and the proposed approaches are innovative. The ubiquity of the optimal stopping problem as an independent problem and as an embedded problem in a wide range of domains from supply chain to finance and to equipment fault detection. Therefore, efficient learning-based solution will provide solutions to practitioners in a scalable manner. As a learning method the practitioners would not fully specify the parameters of the problem. The way we tackle the problem is truly innovative. Reinforcement learning has been seen as a challenging problem as optimization and estimation problems are all intermingled. Therefore, our approach of seeing the problem as a supervised learning is innovative and likely impactful.

我们建议解决一个研究最多的优化问题，其中决策者试图选择一个时间采取特定的行动，以最大限度地提高回报的随机过程。这个问题被称为最优停止问题。解决方案应该将给定的决策情况映射到导致最佳结果的行动。由于这种映射应该适用于所有可能的情况，我们需要找到一个函数作为解决方案，这通常被称为策略。不确定性可能来自各种来源，包括系统状态及其演变、状态变化前的逗留时间、可观测信号中的白色噪声等。动作选择中的简单结构-停止与继续-允许在许多情况下的最优停止问题的解析解。然而，当状态空间的维数增加时，在大多数现实情况下，最优性通常会丢失，启发式搜索算法将不得不逐案设计。因此，这项研究的主要目标是开发强化学习算法的最佳停止问题，这是有效的高维状态空间以及分解框架，使嵌入式最佳停止问题可以解决作为一个更大的问题的一部分。确定了三个具体目标：（1）将最优停止问题作为监督学习问题求解，（2）开发针对最优停止问题定制的强化学习问题，以及（2）分解一般顺序决策问题，使得最优停止问题可以作为子问题来解决。所提出的问题的影响可能是显著的，并且所提出的方法是创新的。最优停止问题作为一个独立的问题和作为一个嵌入式的问题在从供应链到金融和设备故障检测的广泛领域的普遍存在。因此，高效的基于学习的解决方案将以可扩展的方式为从业者提供解决方案。作为一种学习方法，实践者不会完全指定问题的参数。我们解决这个问题的方式是真正的创新。强化学习一直被视为一个具有挑战性的问题，因为优化和估计问题都是混合的。因此，我们将问题视为监督学习的方法是创新的，可能是有影响力的。