权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Stochasticity in Approximate Dynamic Programming

近似动态规划中的随机性

基本信息

批准号：
RGPIN-2020-04301
负责人：
Patrick, Jonathan
金额：
$ 2.26万
依托单位：
University of Ottawa
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2020
资助国家：
加拿大
起止时间：
2020-01-01 至 2021-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=716961
关键词：
Stochasticity Approximate Dynamic Programming

项目摘要

My research focus is on patient scheduling in settings with multiple types of patients and/or multiple resources consumed. The challenge consists in juggling resources to meet different wait time targets while fitting different resource consumption into available capacity. In partnership with local organizations we have begun to examine the challenge of scheduling new and follow-up visits to an outpatient clinic - with wait time targets for new patients and time windows for follow-ups as well as the challenge of home health care visits where patients receive multiple visits on a regular schedule. While these applications form the test scenarios for our research, my primary concern is to address some of the limitations inherent in past work. Our primary methodology in solving scheduling problems has been Approximate Dynamic Programming (ADP) as scheduling problems are sequential decision problems where any realistic setting runs into the curse of dimensionality and thus requires an approximation method. Within ADP we have focused on the linear programming approach that has to main limitations - the restriction on the form of the approximation to a linear architecture and the fact that in transforming the MDP into a linear program one removes the variability. In some settings, ignoring variability is fine as the infinite horizon means that there is time to reach the expectations. We have demonstrated that assuming deterministic service times work reasonably well when the only concern is overtime at the end of the day. Part of this application extends this work to scenarios where one is also concerned with idle time and waiting time during the day. In the two settings described at the outset what is often variable is the number of visits per patient. Relying on the expected number of visits may in fact lead to unwanted periods of congestion. Thus, one long term objective of this proposal is to explore the application of robust optimization to help determine the optimal approximation in ADP. Since the LP approach to ADP transforms the MDP into a linear program, it would appear reasonable that the application of robust optimization is possible. Some work has already been done in this area suggesting that this may well be a promising direction. A second long term objective is the application of neural networks to the development of value function approximations that are capable of approximating a complicated value function but that nonetheless remain linear. We have done some work using non-linear approximation architectures that demonstrate improved performance over linear approximations but at the cost of significantly higher computational challenges. There is a possibility through machine learning of capturing some of the non-linearity while still being able to use the LP approach thus keeping the computational efficiencies of the LP approach while improving its ability to capture additional complexity.

我的研究重点是在多种类型的患者和/或多种资源消耗的情况下的患者调度。挑战在于处理资源以满足不同的等待时间目标，同时将不同的资源消耗匹配到可用容量中。在与地方组织的合作下，我们已经开始研究安排门诊诊所的新就诊和后续就诊的挑战——为新患者制定了等待时间目标和后续就诊的时间窗口，以及病人定期接受多次就诊的家庭保健就诊的挑战。虽然这些应用程序构成了我们研究的测试场景，但我主要关心的是解决过去工作中固有的一些限制。我们解决调度问题的主要方法是近似动态规划（ADP），因为调度问题是顺序决策问题，任何实际设置都会遇到维数的诅咒，因此需要近似方法。在ADP中，我们关注的是线性规划方法，它有主要的局限性——对线性结构近似形式的限制，以及在将MDP转换为线性程序时消除可变性的事实。在某些情况下，忽略可变性是可以的，因为无限的视界意味着有时间达到预期。我们已经证明，假设确定性服务时间在一天结束时唯一关心的是加班时工作得相当好。此应用程序的一部分将此工作扩展到还关注白天的空闲时间和等待时间的场景。在开始时描述的两种情况下，每个病人的就诊次数往往是可变的。依靠预期的访问量实际上可能会导致不必要的拥堵期。因此，本提案的一个长期目标是探索鲁棒优化的应用，以帮助确定ADP中的最优逼近。由于ADP的LP方法将MDP转换为线性规划，因此鲁棒优化的应用似乎是合理的。这方面的一些工作已经完成，表明这很可能是一个有希望的方向。第二个长期目标是将神经网络应用于价值函数近似的发展，这种近似能够近似复杂的价值函数，但仍然保持线性。我们已经使用非线性近似架构做了一些工作，证明了比线性近似更好的性能，但代价是更高的计算挑战。通过机器学习可以捕获一些非线性，同时仍然能够使用LP方法，从而保持LP方法的计算效率，同时提高其捕获额外复杂性的能力。