权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CPS: Small: Distributed Learning for Control of Cyber-Physical Systems

CPS：小型：用于控制信息物理系统的分布式学习

基本信息

批准号：
1932011
负责人：
Michael Zavlanos
金额：
$ 40.75万
依托单位：
Duke University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-10-01 至 2023-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1932011&HistoricalAwards=false
关键词：
CPS Small Distributed Learning Control

项目摘要

In state-of-the-art Cyber-Physical-Systems (CPS) supervised learning or unsupervised learning are typically used to analyze data. Nevertheless, in many such systems rules cannot be determined in advance and these data mining techniques are not directly applicable due to the dynamic nature of the data, their large volume that prohibits labelling in practice, and the fact that these data are added to the system piece by piece and not altogether in advance. On the other hand, control of CPS is usually done in a model-based manner, where a desired control policy is computed from a high-fidelity system model that has been derived at design-time, and potentially may be updated at runtime. However, this approach is not suitable for highly dynamical CPS, that potentially represent systems of systems whose spatial and temporal configurations may rapidly change. In fact, with such high number of configuration levels, it is almost impossible to derive suitable control policies using standard model-driven techniques. Consequently, it is critical to facilitate design of data-based controllers, with strong performance guarantees, in a way that allows for natural runtime control adaptation. Reinforcement Learning (RL) provides such a framework. In RL agents interact with the environment in a feedback loop to learn an optimal policy by taking appropriate sequences of actions in order to optimize longterm payoff. As such, RL can be much more efficient compared to supervised and unsupervised learning, in analyzing streaming data and especially in controlling a system. The goal of this project is to develop a distributed off-policy RL framework for the control of CPS. Distributed RL methods avoid the fragility, communication overhead, and privacy concerns of collecting all information at a central processing unit. Moreover, off-policy learning methods significantly improve sampling efficiency and ensure safer operation. The distributed RL framework developed under this project will have a profound impact on the control of CPS, in areas as diverse as transportation, manufacturing, health-care, smart city, urban planning, etc., that rely on multiple sensors for data collection and control. This project also involves an educational agenda focusing on K-12, undergraduate, and graduate level education. The outreach component of this project focuses on improving the pre-college students' awareness of the potential and attractiveness of a research and engineering career.The technical aims of this project are divided into four thrusts. The first thrust develops distributed off-policy RL methods using linear function approximation of the action-value function. Distributed RL algorithms using linear function approximation have been proposed for policy evaluation only. This thrust develops new RL algorithms that can also improve the policy until an optimal policy is found, which is necessary for control. Since defining appropriate feature vectors for RL problems is generally difficult and since linear mappings might not able to capture possibly nonlinear interactions between these features, the second thrust develops distributed off-policy RL methods using nonlinear function approximation, specifically, Neural Networks. The third thrust develops distributed off-policy Actor-Critic methods. When the action space is large or continuous, Actor-Critic methods are much more effective since they parameterize the target policy function using either linear or nonlinear function approximation and learn the optimal parameter so that the resulting policy maps to the optimal action for every state. Finally, the fourth thrust develops distributed RL methods for asynchronous, heterogeneous, and non-stationary data that are common in modern CPS, where sensors do not observe identically distributed data nor do they sample data at the same time. Moreover, the distributions from which data are sampled can change with time. This project focuses on the development of algorithms and supporting theoretical results. The developed algorithms are evaluated in simulation on resource allocation problems in CPS, specifically, on the control of distributed shared vehicle dispatch systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在最先进的网络物理系统（CPS）中，监督学习或无监督学习通常用于分析数据。然而，在许多这样的系统中，规则不能提前确定，这些数据挖掘技术不能直接应用，因为数据的动态性，它们的大容量在实践中禁止标记，以及这些数据是一块一块地添加到系统中，而不是事先全部添加到系统中。另一方面，CPS的控制通常以基于模型的方式完成，其中所需的控制策略是从设计时派生的高保真系统模型中计算出来的，并且可能在运行时更新。然而，这种方法不适合高度动态的CPS，因为它可能代表空间和时间结构可能迅速变化的系统的系统。事实上，对于如此多的配置级别，几乎不可能使用标准的模型驱动技术派生出合适的控制策略。因此，以一种允许自然运行时控制自适应的方式，促进基于数据的控制器的设计，具有强大的性能保证是至关重要的。强化学习（RL）提供了这样一个框架。在强化学习中，智能体在一个反馈回路中与环境相互作用，通过采取适当的行动序列来学习最优策略，以优化长期收益。因此，在分析流数据，特别是控制系统方面，强化学习比监督学习和无监督学习更有效。这个项目的目标是开发一个分布式的非策略RL框架来控制CPS。分布式RL方法避免了在中央处理单元收集所有信息的脆弱性、通信开销和隐私问题。此外，非策略学习方法显著提高了采样效率，保证了操作的安全性。在该项目下开发的分布式RL框架将对依赖多个传感器进行数据收集和控制的交通、制造、医疗保健、智慧城市、城市规划等领域的CPS控制产生深远影响。该项目还涉及教育议程，重点是K-12，本科和研究生水平的教育。该项目的外展部分侧重于提高大学预科学生对研究和工程职业的潜力和吸引力的认识。这个项目的技术目标分为四个重点。第一个重点是利用动作值函数的线性函数逼近发展分布式离策略RL方法。使用线性函数近似的分布式RL算法已被提出仅用于策略评估。这种推力开发了新的强化学习算法，也可以改进策略，直到找到最优策略，这是控制所必需的。由于为RL问题定义适当的特征向量通常是困难的，并且线性映射可能无法捕获这些特征之间可能的非线性相互作用，因此第二个重点开发了使用非线性函数近似的分布式非策略RL方法，特别是神经网络。第三个重点是发展分布式的非政策行为者批评方法。当动作空间很大或连续时，Actor-Critic方法更有效，因为它们使用线性或非线性函数逼近来参数化目标策略函数，并学习最优参数，以便生成的策略映射到每个状态的最优动作。最后，第四个推力开发了用于异步、异构和非平稳数据的分布式RL方法，这些方法在现代CPS中很常见，其中传感器不会观察到相同分布的数据，也不会同时采样数据。此外，采样数据的分布可能随时间而变化。该项目侧重于算法的开发和支持理论结果。在CPS资源分配问题的仿真中对所开发的算法进行了评估，特别是对分布式共享车辆调度系统的控制。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（18）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Augmented Lagrangian optimization under fixed-point arithmetic

定点运算下的增强拉格朗日优化

DOI：
10.1016/j.automatica.2020.109218
发表时间：
2020
期刊：
Automatica
影响因子：
6.4
作者：
Zhang, Yan;Zavlanos, Michael M.
通讯作者：
Zavlanos, Michael M.

Risk-Averse No-Regret Learning in Online Convex Games

DOI：
10.48550/arxiv.2203.08957
发表时间：
2022-03
期刊：
ArXiv
影响因子：
0
作者：
Zifan Wang;Yi Shen;M. Zavlanos
通讯作者：
Zifan Wang;Yi Shen;M. Zavlanos

Deep Learning for Robotic Mass Transport Cloaking

DOI：
10.1109/tro.2020.2980176
发表时间：
2018-12
期刊：
IEEE Transactions on Robotics
影响因子：
7.8
作者：
Reza Khodayi-mehr;M. Zavlanos
通讯作者：
Reza Khodayi-mehr;M. Zavlanos

Transfer Reinforcement Learning under Unobserved Contextual Information

未观察到的上下文信息下的迁移强化学习

DOI：
10.1109/iccps48487.2020.00015
发表时间：
2020
期刊：
2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS
影响因子：
0
作者：
Zhang, Yan;Zavlanos, Michael M.
通讯作者：
Zavlanos, Michael M.

Policy Evaluation in Distributional LQR

分布式 LQR 中的政策评估

DOI：
发表时间：
2023
期刊：
5th Annual Learning for Dynamics and Control Conference
影响因子：
0
作者：
Wang, Zifan;Gao, Yulong;Wang, Siyi;Zavlanos, Michael M.;Abate, Alessandro;Johansson, Karl Henrik
通讯作者：
Johansson, Karl Henrik