权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Research on Adaptive Estimation and Control of Dynamical Systems

动力系统自适应估计与控制研究

基本信息

批准号：
9703812
负责人：
Michael Katehakis
金额：
$ 10万
依托单位：
Rutgers University New Brunswick
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
1997
资助国家：
美国
起止时间：
1997-08-01 至 2000-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9703812&HistoricalAwards=false
关键词：
Research Adaptive Estimation Control Dynamical

项目摘要

DMS 9703812 Research on Adaptive Estimation and Control of Dynamical Systems. Michael N. Katehakis and Herbert Robbins Rutgers University Abstract This research involves work on adaptive control of dynamic systems. The basic dynamic model is known as the "Markov decision process with incomplete information" (MDP) problem, where the transition law and/or the expected one-period rewards may depend on unknown parameters. The most notable results in this area are based on ideas utilizing either a separation principle and the related certainty-equivalence rule, or uniformly efficient rules for the model of sequential allocation known as the multi-armed bandit (MAB) problem. Limitations of the certainty-equivalence rule are: i) there is no claim on the rate of convergence, and ii) there are cases for which, with positive probability, this rule can prematurely converge to a wrong parameter value so that it eventually uses only a non-optimal policy. The typical approach in the latter studies has been to fit the larger MDP model into the smaller MAB one by considering each deterministic policy as a reward-generating population (bandit). A consequence of this is that the resulting statistically efficient procedures involve sampling from all deterministic policies and do not otherwise utilize the optimization aspect of the problem. Thus, they become limited in scope by data collection complexity. The reason is that in practice the state spaces of MDP models tend to be very large and the set of deterministic policies is immense. In recent work the investigators have obtained adaptive procedures with data collection requirements that are proportional to the number of state - action pairs of the MDP, under a minimal irreducibility condition. A major direction of the proposed research involves the development of solutions for important more general problems such as i) multi-chain MDPs, ii) the case in which there a re side constraints, and iii) discounted streams of rewards. A second important goal is the development of new adaptive statistical methods that possess practically useful implementation and optimality properties for the related problems of detection of total error and change points. The main idea of adaptive control is to compute strategies (policies, or control rules) for the operation of a system that estimate the unknown parameters of the system, and in doing so converge to a strategy that is optimal for the true values of the unknown parameters. Applications arise in many areas of modern engineering, finance, and operations research, such as reliability, maintenance, quality control, scheduling, inventory, and production planning. Consequently, this type of problem has been widely studied in the literature. However, effective procedures that take into account and optimize the speed of convergence have been obtained only recently for specific models, often, with prohibitive data collection complexity. A primary objective of the proposed research is the development of relatively simple adaptive control procedures with reasonable computational and memory requirements for on-line implementation, for a wide class of problems, utilizing ideas from recent work of the investigators. Another important goal is the development of new methods for specific models useful in such areas as software reliability (error detection) and quality control (change points). This research relates to the following strategic areas of national concern: high performance computing, communications, and manufacturing.

9703812动态系统自适应估计与控制的研究。Michael N.Katehakis和Herbert Robbins Rutgers大学摘要本研究涉及动态系统的自适应控制工作。基本的动态模型被称为“不完全信息马尔可夫决策过程”(MDP)问题，其中的转移规律和/或期望的单周期报酬可能取决于未知参数。这一领域最显著的结果是基于这样的思想：要么利用分离原则和相关的确定性等价规则，要么利用顺序分配模型的一致有效规则，称为多臂强盗(MAB)问题。确定性等价规则的局限性是：i)没有关于收敛速度的要求，ii)在某些情况下，该规则可能以正的概率过早收敛到错误的参数值，从而最终只使用非最优策略。在后一种研究中，典型的方法是将每个确定性政策视为一个产生报酬的群体(强盗)，从而将较大的MDP模型与较小的MAB模型相匹配。这样做的结果是，所产生的统计上有效的程序涉及从所有确定性策略中抽样，并且不以其他方式利用问题的最优化方面。因此，它们的范围受到数据收集复杂性的限制。这是因为在实际应用中，MDP模型的状态空间往往非常大，确定性策略集也是巨大的。在最近的工作中，研究人员获得了自适应程序，在最小不可约条件下，数据收集要求与MDP的状态-动作对的数量成正比。拟议研究的一个主要方向涉及为更重要的更一般的问题开发解决方案，例如i)多链MDP，ii)存在Re边约束的情况，以及iii)奖励折扣流。第二个重要目标是开发新的自适应统计方法，这些方法对于检测总误差和变化点的相关问题具有实用的实施性和最优性。自适应控制的主要思想是为系统的运行计算策略(策略或控制规则)，估计系统的未知参数，并在这样做的过程中收敛到对未知参数的真值最优的策略。应用出现在现代工程、金融和运筹学的许多领域，如可靠性、维护、质量控制、调度、库存和生产计划。因此，这类问题在文献中得到了广泛的研究。然而，考虑和优化收敛速度的有效程序直到最近才针对特定的模型获得，通常具有令人望而却步的数据收集复杂性。拟议研究的一个主要目标是利用调查人员最近工作中的想法，开发相对简单的自适应控制程序，为在线实施提供合理的计算和内存需求，用于广泛类别的问题。另一个重要目标是为在软件可靠性(错误检测)和质量控制(变化点)等领域有用的特定模型开发新方法。这项研究涉及国家关注的以下战略领域：高性能计算、通信和制造。