权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Value Function Approximation for Control of Complex Systems

职业：复杂系统控制的价值函数逼近

基本信息

批准号：
9985229
负责人：
Benjamin Van Roy
金额：
$ 20万
依托单位：
Stanford University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2000
资助国家：
美国
起止时间：
2000-04-01 至 2005-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9985229&HistoricalAwards=false
关键词：
CAREER Value Function Approximation Control

项目摘要

9985229Van RoyThis proposed research is devoted to the development of streamlined and reliable computational methods for value function approximation. A successful outcome would be approximation algorithms that are widely-accessible and effective in the control of complex systems.Proposed approximation methods build on work in the area of neuro-dynamic programming which is sometimes called "Approximate Dynamic Programming" or "Reinforcement Learning." Algorithms that will be developed are based on approximate value iteration, temporal-difference, learning, and linear programming. A method for "feature selection" involving the use of value functions associated with simplified problems will also be explored.To promote a pragmatic view of methods under development, and to provide a testbed for evaluation of ideas, two applications have been chosen to play integral roles in the project: dynamic risk management and the control of multiclass queuing networks.The educational component of this project includes a new graduate level course on neurodynamic programming together with a realignment of current courses to incorporate a greater emphasis on computation, to foster an appreciation for the use of approximations when system become more complex, and to promote a unified view of stochastic control problems across many disciplines.***

9985229 Van Roy这项拟议的研究致力于开发简化和可靠的值函数逼近计算方法。一个成功的结果将是在复杂系统的控制中广泛可访问和有效的近似算法。所提出的近似方法建立在神经动态规划领域的工作基础上，该领域有时被称为“近似动态规划”或“强化学习”。将要开发的算法是基于近似值迭代、时差、学习和线性规划。还将探索一种涉及使用与简化问题相关的值函数的“特征选择”方法。为了促进对正在开发的方法的实用看法，并为想法的评估提供一个试验台，已选择了两个应用程序在该项目中扮演整体角色：动态风险管理和多类排队网络的控制。该项目的教育部分包括一门新的研究生水平的神经动力学编程课程，以及对现有课程的重新调整，以纳入对计算的更多重视，培养对系统变得更加复杂时使用近似的欣赏，并促进跨多个学科的随机控制问题的统一观点。*