权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

NSF-AoF: RI: Small: Safe Reinforcement Learning in Non-Stationary Environments With Fast Adaptation and Disturbance Prediction

NSF-AoF：RI：小型：具有快速适应和干扰预测功能的非平稳环境中的安全强化学习

基本信息

批准号：
2133656
负责人：
Naira Hovakimyan
金额：
$ 50万
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-01 至 2024-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2133656&HistoricalAwards=false
关键词：
NSF AoF RI Small Safe

项目摘要

Reinforcement learning (RL) has shown impressive performance in the control of complex robotic systems for various tasks such as locomotion, manipulation, and playing sports, e.g., table tennis. Reinforcement learning enables a robot to autonomously discover an optimal behavior through trial-and-error interactions with its environment. However, the environmental perturbations could easily cause a behavior policy trained in an old environment to fail in a perturbed environment. The failure is unacceptable for safety-critical robotic systems such as self-driving cars, drones, flying taxies and construction machines. Existing robust methods try to consider all scenarios during the training phase and seek a fixed policy, leading to conservative behaviors. Existing adaptive methods try to update their behavior policies in the perturbed environment, but will only do that after the robot has “felt a difference” through its interaction with the environment. In contrast, a human could leverage his/her perception for prediction in the new environment and adjust his/her behavior accordingly even before interacting with it. In light of these conditions, this project envisions a new framework for safe and efficient RL in the presence of environmental changes leveraging fast adaptation and perception-based prediction. The framework will enable robotic and autonomous systems robustly and safely operate, learn and adapt in the real world. This project relies on the following thrusts: i) hybrid RL for safe and efficient policy updates, ii) robust adaptive control with safety guarantees; iii) vision-based disturbance prediction. More specifically, the project will develop robust adaptive control algorithms that ensure that the executed trajectory of a robot remains safe in the presence of disturbances induced by environmental changes. It will spur hybrid model-free/model-based RL algorithms that are capable of efficiently and safely updating the behavior policies with the help of the control algorithms. The project will advance novel methodologies for predicting the key parameters of the disturbances (e.g., the weight of a package) directly from the image observations, leading to new scalable methods for efficiently learning the mathematical model of the disturbances with quantified error bounds. All the ingredients will be holistically integrated to build a framework to enable robots to safely, robustly, and efficiently operate and adapt in real-world environments. Aerial and ground vehicles will be used for experimental validation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习（RL）在控制复杂机器人系统的各种任务中表现出令人印象深刻的性能，例如运动，操纵和体育运动，例如，乒乓球强化学习使机器人能够通过与环境的试错交互来自主发现最佳行为。然而，环境扰动很容易导致在旧环境中训练的行为策略在扰动环境中失败。对于自动驾驶汽车、无人机、飞行出租车和建筑机械等安全关键型机器人系统来说，这种故障是不可接受的。现有的鲁棒方法试图在训练阶段考虑所有场景，并寻求固定的策略，导致保守的行为。现有的自适应方法试图在扰动环境中更新其行为策略，但只有在机器人通过与环境的交互“感觉到差异”之后才能这样做。相比之下，人类可以利用他/她的感知来预测新环境，并在与之交互之前相应地调整他/她的行为。鉴于这些条件，该项目设想了一个新的框架，在环境变化的情况下，利用快速适应和基于感知的预测，实现安全有效的RL。该框架将使机器人和自主系统能够在真实的世界中稳健、安全地运行、学习和适应。该项目依赖于以下几个方面：i）用于安全和有效的策略更新的混合RL，ii）具有安全保证的鲁棒自适应控制; iii）基于视觉的干扰预测。更具体地说，该项目将开发鲁棒的自适应控制算法，以确保机器人的执行轨迹在环境变化引起的干扰下保持安全。它将刺激混合无模型/基于模型的强化学习算法，这些算法能够在控制算法的帮助下有效和安全地更新行为策略。该项目将提出新的方法来预测扰动的关键参数（例如，包裹的重量）直接从图像观察，导致新的可扩展方法，用于有效地学习具有量化误差界限的扰动的数学模型。所有成分将被整体整合，以构建一个框架，使机器人能够安全，稳健，高效地运行和适应现实世界的环境。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Convex Synthesis of Control Barrier Functions Under Input Constraints

DOI：
10.1109/lcsys.2023.3293765
发表时间：
2023
期刊：
IEEE Control Systems Letters
影响因子：
3
作者：
Pan Zhao;R. Ghabcheloo;Yikun Cheng;Hossein Abdi;N. Hovakimyan
通讯作者：
Pan Zhao;R. Ghabcheloo;Yikun Cheng;Hossein Abdi;N. Hovakimyan

DiffTune+: Hyperparameter-Free Auto-Tuning using Auto-Differentiation

DOI：
10.48550/arxiv.2212.03194
发表时间：
2022-12
期刊：
影响因子：
0
作者：
Sheng Cheng;Lin Song;Minkyung Kim;Shenlong Wang;N. Hovakimyan
通讯作者：
Sheng Cheng;Lin Song;Minkyung Kim;Shenlong Wang;N. Hovakimyan

Tube-Certified Trajectory Tracking for Nonlinear Systems With Robust Control Contraction Metrics

DOI：
10.1109/lra.2022.3153712
发表时间：
2021-09
期刊：
IEEE Robotics and Automation Letters
影响因子：
5.2
作者：
Pan Zhao;Arun Lakshmanan;K. Ackerman;Aditya Gahlawat;M. Pavone;N. Hovakimyan
通讯作者：
Pan Zhao;Arun Lakshmanan;K. Ackerman;Aditya Gahlawat;M. Pavone;N. Hovakimyan

Safe and Efficient Reinforcement Learning using Disturbance-Observer-Based Control Barrier Functions