权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Online Learning-based Real-time Control of Unknown Autonomous Systems

基于在线学习的未知自治系统实时控制

基本信息

批准号：
1810447
负责人：
Rahul Jain
金额：
$ 33万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-15 至 2022-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1810447&HistoricalAwards=false
关键词：
Online Learning based Real time

项目摘要

Many emerging autonomous systems, e.g., robots in unstructured environments, are too complex to be accurately modeled. There are unknown model parameters, partial state observations, or a drift in system characteristics. This makes the problem of system identification and control quite challenging. Real-time adaptation is needed for optimal and resilient operation. It is well-known that the classical adaptive con-trol approach of system identification and `certainty equivalent' control in the feedback-loop doesn't work. In this project, we introduce a new paradigm of 'Learning-to-Control' unknown Autonomous Systems based on the newly developing approach of Thompson/Posterior sampling-based online learning. We will focus on discrete state space models of Markov decision processes (MDPs). We will first develop a posterior sampling-inspired algorithms for online learning-based control with real-time adaptation for MDP models with partial observation of the system state. We note that such approaches may be inter-preted to provide just the right amount of randomization for optimally trading off exploration and exploi-tation that is needed for online learning of the optimal policy at the fastest rate. We will then extend this to the setting where the system parameter may be varying or drifting with time. We will then develop such algorithms for more relevant but also more complicated system models - stochastic hybrid systems, that have both discrete and continuous states. The developed algorithms will be extensively validated in sim-ulation experiments in the classical control and robotics environments in OpenAI Gym. The intellectual merit of the research lies in its contribution to the 'Science of Autonomous Systems' by development of foundations of online learning-based real-time control and adaptation for autonomous systems by addressing fundamental questions about separation of parameter estimation, state estima-tion and control for various stochastic system models, particularly when model parameters must be learnt from data. The broader impacts will include impact on the smart grid, autonomous robotics, and medical CPS devices via dissemination of research results, training of a female PhD student and a K-12 STEM outreach effort.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

许多新兴的自主系统，如非结构化环境中的机器人，都太复杂了，无法准确建模。存在未知的模型参数、部分状态观测或系统特性漂移。这使得系统辨识和控制的问题变得相当具有挑战性。实时适应是最佳和有弹性的操作所必需的。众所周知，经典的系统辨识和反馈回路“确定性等价”控制的自适应控制方法不起作用。在这个项目中，我们引入了一种新的基于汤普森/后验抽样的在线学习方法的未知自主系统的学习-控制模式。我们将关注马尔可夫决策过程(MDP)的离散状态空间模型。我们将首先开发一种受后验采样启发的在线学习控制算法，用于具有部分系统状态观测的MDP模型的实时适应。我们注意到，这样的方法可能被解释为提供恰到好处的随机化，以最快的速度在线学习最优策略所需的探索和开发之间的最佳权衡。然后我们将其扩展到系统参数可能随时间变化或漂移的设置。然后，我们将为更相关但也更复杂的系统模型开发这样的算法-随机混合系统，既有离散状态，也有连续状态。所开发的算法将在OpenAI体育馆的经典控制和机器人环境中进行广泛的仿真实验。这项研究的学术价值在于，它通过解决各种随机系统模型的参数估计、状态估计和控制分离的基本问题，特别是当模型参数必须从数据中学习时，发展了基于在线学习的自治系统实时控制和自适应的基础，从而为“自治系统科学”做出了贡献。更广泛的影响将包括通过传播研究成果对智能电网、自主机器人和医疗CPS设备的影响，培训一名女性博士生和K-12 STEM外展工作。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。