权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Temporal Causal Reinforcement Learning and Control for Autonomous and Swarm Cyber-Physical Systems

职业：自治和群体网络物理系统的时间因果强化学习和控制

基本信息

批准号：
2339774
负责人：
Zhe Xu
金额：
$ 54.98万
依托单位：
Arizona State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-03-01 至 2029-02-28
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339774&HistoricalAwards=false
关键词：
CAREER Temporal Causal Reinforcement Learning

项目摘要

Understanding the root cause of behavior is imperative for informed decision-making and preventing ineffective or biased policies. Currently, most AI-based learning and control modules embedded in cyber-physical systems (CPS) rely on statistical correlation rather than causality for decision-making. This not only results in incorrect decisions but also hinders the interpretability of learning, limiting transferability and scalability. This CAREER proposal aims to bridge the gap between causal inference and the growing capabilities of reinforcement learning (RL) in CPS. The proposed methods are transformative to a wide range of CPS applications, enabling more efficient and effective decision-making processes in autonomous and swarm CPS such as self-driving cars, drones, industrial robots, and swarm robots.This NSF CAREER proposal proposes a set of temporal causal RL and control approaches for CPS by leveraging the reasoning capabilities of temporal logics and causal diagrams in single-agent, multi-agent, and swarm system settings. The tools we develop will be implemented on multiple CPS testbeds and integrated with the proposed education plan. The proposed algorithms have the following unique and innovative features. Firstly, we will develop computationally efficient tools that can discover temporal causal knowledge from both observational and interventional data of a CPS in performing RL to improve the sampling efficiency and transferability. Secondly, we will develop multi-agent RL approaches for CPS in cooperative, non-cooperative, and incomplete information stochastic game environments where temporal causal knowledge is discovered in a distributed way for expediting RL. Lastly, we will develop scalable RL-based control methods for swarm systems utilizing temporal causal reasoning over agent-level features and swarm-level features such as densities and generalized moments. The education plan will impact the next generation of CPS and AI engineers and researchers through AI-assisted adaptive and interactive teaching, temporal-logic-based educational games, online interactive educational website design for temporal causal RL, and workshops and webinars with industrial partners.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

了解行为的根本原因对于明智的决策和防止无效或有偏见的政策至关重要。目前，大多数嵌入在网络物理系统（CPS）中的基于人工智能的学习和控制模块依赖于统计相关性而不是因果关系来进行决策。这不仅会导致错误的决策，而且会阻碍学习的可解释性，限制可移植性和可扩展性。这个CAREER提案旨在弥合CPS中因果推理和不断增长的强化学习（RL）能力之间的差距。所提出的方法对广泛的CPS应用具有变革性，使自主和群体CPS（如自动驾驶汽车、无人机、工业机器人和群体机器人）中的决策过程更加高效和有效。NSF CAREER提案通过利用单智能体、多智能体、和蜂群系统设置我们开发的工具将在多个CPS测试平台上实施，并与拟议的教育计划相结合。所提出的算法具有以下独特和创新的功能。首先，我们将开发计算效率高的工具，可以发现时间因果知识从观察和干预数据的CPS在执行RL，以提高采样效率和可移植性。其次，我们将开发多代理RL方法CPS合作，非合作，和不完全信息的随机博弈环境中的时间因果知识被发现在一个分布式的方式加快RL。最后，我们将开发可扩展的基于RL的控制方法的群体系统，利用时间因果推理代理级功能和群体级功能，如密度和广义矩。该教育计划将通过AI辅助的自适应和交互式教学，基于时间逻辑的教育游戏，时间因果RL的在线交互式教育网站设计，该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响进行评估，被认为值得支持审查标准。