权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Novel Algorithms to Approximate the Future Consequence of Sequential Decisions

近似连续决策的未来后果的新算法

基本信息

批准号：
RGPIN-2017-04877
负责人：
SabouriBaghAbbas, Alireza
金额：
$ 2.91万
依托单位：
University of Calgary
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=755185
关键词：
Novel Algorithms Approximate Future Consequence

项目摘要

Many complex problems arising in business, health care, and transportation can be modelled as sequential decision making problems under uncertainty, meaning that a decision maker has to make decisions periodically while some random events unfold over time. For instance, an airline dynamically changes the fare for different flights over a network of cities without knowing the actual future demand, trying to maximize its revenue while managing the risk of unsold seats. These problems can be conveniently modelled in the form of dynamic programs, a method that finds the best decision by maximizing the sum of immediate reward and the expected future reward. Unfortunately, for many practical problems, the number of future scenarios that one should consider in order to calculate the expected future reward function is exponentially large, making exact calculation of this function intractable. In order to overcome this issue, approximate dynamic programming (ADP) methods have been developed to find an approximate optimal solution. A cornerstone of many ADP algorithms is defining a set of basis functions (or an approximation architecture) for approximating the future consequence of present decisions (the expected future reward function). Currently, the choice of basis functions requires prior expert knowledge about the problem, and is usually considered as more of an art than a science. My research program aims to develop, study, and apply novel algorithms that automate generation of basis functions by efficiently selecting a subset of functions from a large pool of potential basis functions, and updating this set as more information becomes known about the problem. The benefit of such algorithms is twofold: first, it reduces the burden to come up with a well-informed set of basis functions that requires significant prior knowledge about the problem; second, since many potential candidates are considered for basis functions, it is expected that the quality of the approximation is improved. My short-term objective includes evaluating the performance of the proposed algorithms in a variety of application areas, such as perishable inventory management, patient scheduling, and revenue management.ADP is a general method that is commonly used for solving many different problems in a variety of applications. As the quality of the policies generated by these algorithms is dependent on the quality of the basis functions chosen, it would be of great interest, both theoretically and practically, if the process of generating and selecting basis functions can be automated. Therefore, even a small improvement achieved by the findings of my research would have significant practical implications in multiple application areas.

在商业、医疗保健和交通运输中出现的许多复杂问题可以建模为不确定条件下的顺序决策问题，这意味着决策者必须在一些随机事件随时间展开时周期性地做出决策。例如，一家航空公司在不知道未来实际需求的情况下，动态地改变城市网络中不同航班的票价，试图在管理未售出座位风险的同时最大化其收入。这些问题可以方便地以动态规划的形式进行建模，动态规划是一种通过最大化当前奖励和预期未来奖励的总和来寻找最佳决策的方法。不幸的是，对于许多实际问题，为了计算期望的未来奖励函数，人们应该考虑的未来场景的数量是指数级的，这使得该函数的精确计算变得棘手。为了克服这一问题，提出了近似动态规划（ADP）方法来寻找近似最优解。许多ADP算法的基石是定义一组基函数（或近似架构）来近似当前决策的未来结果（预期的未来奖励函数）。目前，基函数的选择需要有关问题的先验专家知识，并且通常被认为是一门艺术而不是一门科学。我的研究计划旨在开发、研究和应用新的算法，通过有效地从大量潜在的基函数池中选择函数子集，并随着对问题的更多信息的了解而更新该集，从而自动生成基函数。这种算法的好处是双重的：首先，它减少了提出一组信息丰富的基函数的负担，这些基函数需要对问题有大量的先验知识；其次，由于考虑了许多潜在的候选基函数，因此期望近似的质量得到改善。我的短期目标包括评估所提出的算法在各种应用领域的性能，例如易腐库存管理、患者调度和收入管理。ADP是一种通用方法，通常用于解决各种应用中的许多不同问题。由于这些算法生成的策略的质量取决于所选择的基函数的质量，因此，如果生成和选择基函数的过程能够自动化，将在理论上和实践中引起极大的兴趣。因此，即使是我的研究成果所取得的微小改进，也会在多个应用领域产生重大的实际意义。