权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Constrained Optimization of Markov Decision Processes

马尔可夫决策过程的约束优化

基本信息

批准号：
0928490
负责人：
Eugene Feinberg
金额：
$ 24.5万
依托单位：
SUNY at Stony Brook
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2009
资助国家：
美国
起止时间：
2009-09-01 至 2013-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0928490&HistoricalAwards=false
关键词：
Constrained Optimization Markov Decision Processes

项目摘要

The main research objectives of this project are to develop new algorithms and analytical tools for optimization and analysis of broad classes of controlled stochastic systems, known under the name of Markov Decision Processes, when the system performance is characterized by multiple criteria. The objective is to optimize one of the criteria under constraints on other criteria. This project will study models with the criteria of the expected total costs and the expected costs per unit time. It will be focused on finding optimal nonrandomized policies for problems with large state spaces, on developing algorithms for multi-chain problems, and on solving adaptive control problems with unknown parameters. In addition to general Markov Decision Processes, three groups of particular problems will be studied: the so-called multi-armed bandit problems that model many important managerial and operations research problems, call admission, and inventory control. If successful, the results of this project will provide efficient algorithms for computing optimal policies for important classes of stochastic systems. They will also provide descriptions of the structure of optimal policies for several groups of mathematical models of production and service systems. Currently, efficient algorithms and structural results are primarily known for problems with single objective functions. However, real-life applications usually deal with multiple criteria. This project will develop mathematical and engineering tools to control large stochastic systems for which traditional computational methods are intractable. In particular, multi-armed bandit problems to be investigated in this project have important applications to scheduling, pharmaceutical research, project management, and economics.

该项目的主要研究目标是开发新的算法和分析工具，用于优化和分析广泛类别的受控随机系统，即马尔可夫决策过程，当系统性能具有多个标准特征时。目标是在其他标准的约束下优化其中一个标准。本项目将研究具有预期总成本和单位时间预期成本标准的模型。它将专注于寻找具有大状态空间的问题的最佳非随机策略，开发多链问题的算法，以及解决具有未知参数的自适应控制问题。除了一般的马尔可夫决策过程之外，还将研究三组特殊问题：所谓的多臂强盗问题，它模拟了许多重要的管理和运筹学问题，呼叫准入和库存控制。如果成功，这个项目的结果将为计算重要类别的随机系统的最优策略提供有效的算法。他们还将为几组生产和服务系统的数学模型提供最优政策结构的描述。目前，高效算法和结构化结果主要用于具有单一目标函数的问题。然而，现实生活中的应用程序通常要处理多个标准。该项目将开发数学和工程工具来控制传统计算方法难以处理的大型随机系统。特别是本课题所研究的多武装盗匪问题，在调度、医药研究、项目管理和经济学等方面具有重要的应用价值。