Reinforcement learning approach to the optimal stopping problem
最优停止问题的强化学习方法
基本信息
- 批准号:RGPIN-2021-02760
- 负责人:
- 金额:$ 2.62万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
We propose to address one of the most studied optimization problems, in which decision maker tries to choose a time to take a particular action to maximize reward from a stochastic process. This problem is known as the optimal stopping problem. The solution should map a given decision making situation to an action leading to best outcome. As such mapping should be available for all possible situations, we are required to find a function as a solution, which is often called policy. Uncertainties may come from a variety of sources, including system state and its evolution, sojourn times before state change, white noise in the observable signal, and so on. The simple structure in action selection - stop vs. continuation - allows analytical solutions in many instances of the optimal stopping problem. However, when the dimensionality of the state space increases, as often the case in most realistic situations, optimality is usually lost and heuristic search algorithm will have to be designed case by case. Therefore, the main objective of the study is to develop reinforcement learning algorithms for the optimal stopping problem that is efficient with high dimensional state space as well as a decomposition framework so that an embedded optimal stopping problem can be solved as part of a larger problem. Three specific objectives have been identified: (1) solving the optimal stopping problem as a supervised learning problem, (2) developing a reinforcement learning problem that is customized to the optimal stopping problem and (2) decomposing a general sequential decision problem so that an optimal stopping problem can be solved as a sub-problem. The impact of the proposed problem is likely significant and the proposed approaches are innovative. The ubiquity of the optimal stopping problem as an independent problem and as an embedded problem in a wide range of domains from supply chain to finance and to equipment fault detection. Therefore, efficient learning-based solution will provide solutions to practitioners in a scalable manner. As a learning method the practitioners would not fully specify the parameters of the problem. The way we tackle the problem is truly innovative. Reinforcement learning has been seen as a challenging problem as optimization and estimation problems are all intermingled. Therefore, our approach of seeing the problem as a supervised learning is innovative and likely impactful.
我们建议解决一个研究最多的优化问题,其中决策者试图选择一个时间采取特定的行动,以最大限度地提高回报的随机过程。这个问题被称为最优停止问题。解决方案应该将给定的决策情况映射到导致最佳结果的行动。由于这种映射应该适用于所有可能的情况,我们需要找到一个函数作为解决方案,这通常被称为策略。不确定性可能来自各种来源,包括系统状态及其演变、状态变化前的逗留时间、可观测信号中的白色噪声等。动作选择中的简单结构-停止与继续-允许在许多情况下的最优停止问题的解析解。然而,当状态空间的维数增加时,在大多数现实情况下,最优性通常会丢失,启发式搜索算法将不得不逐案设计。因此,这项研究的主要目标是开发强化学习算法的最佳停止问题,这是有效的高维状态空间以及分解框架,使嵌入式最佳停止问题可以解决作为一个更大的问题的一部分。确定了三个具体目标:(1)将最优停止问题作为监督学习问题求解,(2)开发针对最优停止问题定制的强化学习问题,以及(2)分解一般顺序决策问题,使得最优停止问题可以作为子问题来解决。所提出的问题的影响可能是显著的,并且所提出的方法是创新的。最优停止问题作为一个独立的问题和作为一个嵌入式的问题在从供应链到金融和设备故障检测的广泛领域的普遍存在。因此,高效的基于学习的解决方案将以可扩展的方式为从业者提供解决方案。作为一种学习方法,实践者不会完全指定问题的参数。我们解决这个问题的方式是真正的创新。强化学习一直被视为一个具有挑战性的问题,因为优化和估计问题都是混合的。因此,我们将问题视为监督学习的方法是创新的,可能是有影响力的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lee, ChiGuhn其他文献
Lee, ChiGuhn的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lee, ChiGuhn', 18)}}的其他基金
Reinforcement learning approach to the optimal stopping problem
最优停止问题的强化学习方法
- 批准号:
RGPIN-2021-02760 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Discovery Grants Program - Individual
Transfer learning for continual learning in non-stationary environments
用于非静态环境中持续学习的迁移学习
- 批准号:
553522-2020 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Alliance Grants
Machine Learning-enhanced approaches to optimization of supply chain management at Nestlé Canada
雀巢加拿大采用机器学习增强方法优化供应链管理
- 批准号:
538626-2019 - 财政年份:2020
- 资助金额:
$ 2.62万 - 项目类别:
Collaborative Research and Development Grants
Transfer learning for continual learning in non-stationary environments
用于非静态环境中持续学习的迁移学习
- 批准号:
553522-2020 - 财政年份:2020
- 资助金额:
$ 2.62万 - 项目类别:
Alliance Grants
Machine Learning-enhanced approaches to optimization of supply chain management at Nestlé Canada
雀巢加拿大采用机器学习增强方法优化供应链管理
- 批准号:
538626-2019 - 财政年份:2019
- 资助金额:
$ 2.62万 - 项目类别:
Collaborative Research and Development Grants
Assistive sequential decision making framework
辅助顺序决策框架
- 批准号:
RGPIN-2019-05460 - 财政年份:2019
- 资助金额:
$ 2.62万 - 项目类别:
Discovery Grants Program - Individual
Data-driven condition-based maintenance models
数据驱动的基于状态的维护模型
- 批准号:
499283-2016 - 财政年份:2018
- 资助金额:
$ 2.62万 - 项目类别:
Collaborative Research and Development Grants
Optimal Economic Change Detection with Imperfect Information
不完全信息下的最优经济变化检测
- 批准号:
RGPIN-2014-04145 - 财政年份:2018
- 资助金额:
$ 2.62万 - 项目类别:
Discovery Grants Program - Individual
Optimal Economic Change Detection with Imperfect Information
不完全信息下的最优经济变化检测
- 批准号:
RGPIN-2014-04145 - 财政年份:2017
- 资助金额:
$ 2.62万 - 项目类别:
Discovery Grants Program - Individual
Dynamic Optimization with Learning Approach to Dynamic Pricing with Financial Milestones
具有财务里程碑的动态定价学习方法的动态优化
- 批准号:
507238-2016 - 财政年份:2016
- 资助金额:
$ 2.62万 - 项目类别:
Engage Grants Program
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
儿童音乐能力发展对语言与社会认知能力及脑发育的影响
- 批准号:31971003
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
多场景网络学习中基于行为-情感-主题联合建模的学习者兴趣挖掘关键技术研究
- 批准号:61702207
- 批准年份:2017
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
基于异构医学影像数据的深度挖掘技术及中枢神经系统重大疾病的精准预测
- 批准号:61672236
- 批准年份:2016
- 资助金额:64.0 万元
- 项目类别:面上项目
相似海外基金
A reinforcement learning approach for de novo metabolite structure prediction from mass spectral data
根据质谱数据从头预测代谢物结构的强化学习方法
- 批准号:
559158-2021 - 财政年份:2022
- 资助金额:
$ 2.62万 - 项目类别:
Postgraduate Scholarships - Doctoral
A reinforcement learning approach for de novo metabolite structure prediction from mass spectral data
根据质谱数据从头预测代谢物结构的强化学习方法
- 批准号:
559158-2021 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Postgraduate Scholarships - Doctoral
A Conditioned Reinforcement Approach to Improving Self-Control
提高自控力的条件强化方法
- 批准号:
10097938 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Reinforcement learning approach to the optimal stopping problem
最优停止问题的强化学习方法
- 批准号:
RGPIN-2021-02760 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Discovery Grants Program - Individual
Supporting Tailored Adaptive Change and Reinforcement for Medication Adherence Program (STAR-MAP): Randomized trial of a novel approach to improve adherence in older hypertensive women and men
支持定制的适应性改变和强化药物依从性计划 (STAR-MAP):针对提高老年高血压女性和男性依从性的新方法的随机试验
- 批准号:
10209662 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Dopamine circuit regulation of morphine reinforcement across the opioid exposure cycle
多巴胺回路对阿片类药物暴露周期中吗啡强化的调节
- 批准号:
10740931 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
A Conditioned Reinforcement Approach to Improving Self-Control
提高自控力的条件强化方法
- 批准号:
10375347 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Supporting Tailored Adaptive Change and Reinforcement for Medication Adherence Program (STAR-MAP): Randomized trial of a novel approach to improve adherence in older hypertensive women and men
支持定制的适应性改变和强化药物依从性计划 (STAR-MAP):针对提高老年高血压女性和男性依从性的新方法的随机试验
- 批准号:
10396114 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Dopamine circuit regulation of morphine reinforcement across the opioid exposure cycle
多巴胺回路对阿片类药物暴露周期中吗啡强化的调节
- 批准号:
10282160 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别:
Supporting Tailored Adaptive Change and Reinforcement for Medication Adherence Program (STAR-MAP): Randomized trial of a novel approach to improve adherence in older hypertensive women and men
支持定制的适应性改变和强化药物依从性计划 (STAR-MAP):针对提高老年高血压女性和男性依从性的新方法的随机试验
- 批准号:
10620650 - 财政年份:2021
- 资助金额:
$ 2.62万 - 项目类别: