权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Novel Algorithms to Approximate the Future Consequence of Sequential Decisions

近似连续决策的未来后果的新算法

基本信息

批准号：
RGPIN-2017-04877
负责人：
SabouriBaghAbbas, Alireza
金额：
$ 1.46万
依托单位：
University of Calgary
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2020
资助国家：
加拿大
起止时间：
2020-01-01 至 2021-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=711274
关键词：
Novel Algorithms Approximate Future Consequence

项目摘要

Many complex problems arising in business, health care, and transportation can be modelled as sequential decision making problems under uncertainty, meaning that a decision maker has to make decisions periodically while some random events unfold over time. For instance, an airline dynamically changes the fare for different flights over a network of cities without knowing the actual future demand, trying to maximize its revenue while managing the risk of unsold seats. These problems can be conveniently modelled in the form of dynamic programs, a method that finds the best decision by maximizing the sum of immediate reward and the expected future reward. Unfortunately, for many practical problems, the number of future scenarios that one should consider in order to calculate the expected future reward function is exponentially large, making exact calculation of this function intractable. In order to overcome this issue, approximate dynamic programming (ADP) methods have been developed to find an approximate optimal solution. A cornerstone of many ADP algorithms is defining a set of basis functions (or an approximation architecture) for approximating the future consequence of present decisions (the expected future reward function). Currently, the choice of basis functions requires prior expert knowledge about the problem, and is usually considered as more of an art than a science. My research program aims to develop, study, and apply novel algorithms that automate generation of basis functions by efficiently selecting a subset of functions from a large pool of potential basis functions, and updating this set as more information becomes known about the problem. The benefit of such algorithms is twofold: first, it reduces the burden to come up with a well-informed set of basis functions that requires significant prior knowledge about the problem; second, since many potential candidates are considered for basis functions, it is expected that the quality of the approximation is improved. My short-term objective includes evaluating the performance of the proposed algorithms in a variety of application areas, such as perishable inventory management, patient scheduling, and revenue management. ADP is a general method that is commonly used for solving many different problems in a variety of applications. As the quality of the policies generated by these algorithms is dependent on the quality of the basis functions chosen, it would be of great interest, both theoretically and practically, if the process of generating and selecting basis functions can be automated. Therefore, even a small improvement achieved by the findings of my research would have significant practical implications in multiple application areas.