权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Intelligent Control: A Dynamic Game Approach

智能控制：动态博弈方法

基本信息

批准号：
9727805
负责人：
John Baras
金额：
$ 10万
依托单位：
University of Maryland, College Park
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
1998
资助国家：
美国
起止时间：
1998-09-01 至 2001-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=9727805&HistoricalAwards=false
关键词：
Intelligent Control Dynamic Game Approach

项目摘要

9727805BarasThe proposed work aims to extend results in reinforcement learning theory to dynamic game problems relevant to output feedback robust nonlinear control. There are two primary motivations for this:(a).To develop schemes to overcome the prohibitive computational cost encountered while designing and implementing robust nonlinear controllers.(b).Employ the dynamic game framework as a stepping stone leading to the development of an analytical machinery suitable for posing, and solving intelligent control problems.The former is concerned primarily with off-line schemes for approximating the key equations, and development of techniques to efficiently compute and represent the control policy. The latter is concerned with on-line schemes, where one needs to integrate identification, control, and the ability to improve performance in finite amount of time1 with finite computational resources. The latter has less available information on system model and environment; thus learning is an essential component of the methodology.With these objectives in mind, special emphasis needs to be placed on obtaining algorithms that exhibit good finite time performance, and do so with finite amount of resources (computational). Furthermore, in order to efficiently integrate the components of the resulting architectures, one needs to also develop (finite time) performance bounds for these algorithms. The approach calls for first studying the problem in the context of finite state automata, and then extending the results to discrete time dynamical system models. The proposed work intends to study:(a).Extensions of reinforcement learning to obtain finite time performance bounds.(b).Development of schemes to directly identify the information most relevant for control (information state), and to do so with specified accuracy in a finite amount of time. This calls for the development of measures of risk to tradeoff exploration and control for on-line implementation.(c)Model structures in. (b) that lead to reduction in complexity, and lend themselves to efficient learning.(d).Extension of the current analytical framework for studying reinforcement learning to account for the unpredictability associated with intelligence.(e).Exploiting the relationship between risk-sensitive control and dynamic games to harness the structure offered by probability theory.(f).Development of architectures, and software that efflciently implement the algorithms obtained.Results obtained from this research project, coupled with the development of appropriate complexity metrics would result in a framework for posing, and analyzing a wide variety of intelligent control problems. Such an approach would lead to controllers that are inherently robust, yet capable of adapting their behaviour to perceived changes in the system/environment. The results would be applicable to computation and implementation of robust nonlinear control at one end, to truly autonomous control for large, complex systems at the other. Specific applicatlon domains include chemical process control, semiconductor manufacturing, and control of large communication networks. ***

9727805 Baras所提出的工作旨在将强化学习理论的结果扩展到与输出反馈鲁棒非线性控制相关的动态博弈问题。有两个主要的动机：（a）。发展计划，以克服在设计和实现鲁棒非线性控制器时遇到的令人望而却步的计算成本。（B）.利用动态博弈框架作为垫脚石，发展适合于提出和解决智能控制问题的分析机器，前者主要涉及近似关键方程的离线方案，以及发展有效计算和表示控制策略的技术。后者关注的是在线计划，其中一个需要集成识别，控制，并能够提高性能在有限的时间1有限的计算资源。后者对系统模型和环境的可用信息较少，因此学习是该方法的重要组成部分。考虑到这些目标，需要特别强调获得表现出良好的有限时间性能的算法，并使用有限的资源（计算）。此外，为了有效地集成所得到的架构的组件，还需要开发这些算法的（有限时间）性能界限。该方法要求首先在有限状态自动机的背景下研究问题，然后将结果扩展到离散时间动力系统模型。本文主要研究：（1）强化学习的扩展，以获得有限时间性能界。（B）制定计划，直接查明与控制最相关的信息（信息状态），并在有限的时间内以规定的准确度这样做。这就要求制定风险措施，以权衡在线实施的探索和控制。（c）示范结构。(b)这导致复杂性降低，并有助于有效的学习。（d）扩大目前研究强化学习的分析框架，以解释与智力有关的不可预测性。（e）利用对风险敏感的控制与动态博弈之间的关系，利用概率论提供的结构。（f）.开发有效地实现所获得的算法的体系结构和软件.从这个研究项目中获得的结果，加上适当的复杂性度量的发展，将产生一个提出和分析各种智能控制问题的框架。这样的方法将导致控制器固有的鲁棒性，但能够使其行为适应系统/环境中的感知变化。结果将适用于计算和实施的鲁棒非线性控制在一端，真正的自治控制的大型，复杂的系统在另一端。具体的应用领域包括化学过程控制、半导体制造和大型通信网络的控制。 ***