Online Learning-based Real-time Control of Unknown Autonomous Systems
基于在线学习的未知自治系统实时控制
基本信息
- 批准号:1810447
- 负责人:
- 金额:$ 33万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-15 至 2022-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Many emerging autonomous systems, e.g., robots in unstructured environments, are too complex to be accurately modeled. There are unknown model parameters, partial state observations, or a drift in system characteristics. This makes the problem of system identification and control quite challenging. Real-time adaptation is needed for optimal and resilient operation. It is well-known that the classical adaptive con-trol approach of system identification and `certainty equivalent' control in the feedback-loop doesn't work. In this project, we introduce a new paradigm of 'Learning-to-Control' unknown Autonomous Systems based on the newly developing approach of Thompson/Posterior sampling-based online learning. We will focus on discrete state space models of Markov decision processes (MDPs). We will first develop a posterior sampling-inspired algorithms for online learning-based control with real-time adaptation for MDP models with partial observation of the system state. We note that such approaches may be inter-preted to provide just the right amount of randomization for optimally trading off exploration and exploi-tation that is needed for online learning of the optimal policy at the fastest rate. We will then extend this to the setting where the system parameter may be varying or drifting with time. We will then develop such algorithms for more relevant but also more complicated system models - stochastic hybrid systems, that have both discrete and continuous states. The developed algorithms will be extensively validated in sim-ulation experiments in the classical control and robotics environments in OpenAI Gym. The intellectual merit of the research lies in its contribution to the 'Science of Autonomous Systems' by development of foundations of online learning-based real-time control and adaptation for autonomous systems by addressing fundamental questions about separation of parameter estimation, state estima-tion and control for various stochastic system models, particularly when model parameters must be learnt from data. The broader impacts will include impact on the smart grid, autonomous robotics, and medical CPS devices via dissemination of research results, training of a female PhD student and a K-12 STEM outreach effort.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
许多新兴的自主系统,如非结构化环境中的机器人,都太复杂了,无法准确建模。存在未知的模型参数、部分状态观测或系统特性漂移。这使得系统辨识和控制的问题变得相当具有挑战性。实时适应是最佳和有弹性的操作所必需的。众所周知,经典的系统辨识和反馈回路“确定性等价”控制的自适应控制方法不起作用。在这个项目中,我们引入了一种新的基于汤普森/后验抽样的在线学习方法的未知自主系统的学习-控制模式。我们将关注马尔可夫决策过程(MDP)的离散状态空间模型。我们将首先开发一种受后验采样启发的在线学习控制算法,用于具有部分系统状态观测的MDP模型的实时适应。我们注意到,这样的方法可能被解释为提供恰到好处的随机化,以最快的速度在线学习最优策略所需的探索和开发之间的最佳权衡。然后我们将其扩展到系统参数可能随时间变化或漂移的设置。然后,我们将为更相关但也更复杂的系统模型开发这样的算法-随机混合系统,既有离散状态,也有连续状态。所开发的算法将在OpenAI体育馆的经典控制和机器人环境中进行广泛的仿真实验。这项研究的学术价值在于,它通过解决各种随机系统模型的参数估计、状态估计和控制分离的基本问题,特别是当模型参数必须从数据中学习时,发展了基于在线学习的自治系统实时控制和自适应的基础,从而为“自治系统科学”做出了贡献。更广泛的影响将包括通过传播研究成果对智能电网、自主机器人和医疗CPS设备的影响,培训一名女性博士生和K-12 STEM外展工作。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes”, ICML (Int'l Conf. on Machine Learning) 2020.
无限视野平均奖励马尔可夫决策过程中的无模型强化学习,ICML(机器学习国际会议)2020。
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Chen-Yu Wei, Mehdi Jafarnia-Jahromi
- 通讯作者:Chen-Yu Wei, Mehdi Jafarnia-Jahromi
A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints
- DOI:10.1609/aaai.v35i9.16979
- 发表时间:2020-09
- 期刊:
- 影响因子:0
- 作者:K. C. Kalagarla;Rahul Jain;P. Nuzzo
- 通讯作者:K. C. Kalagarla;Rahul Jain;P. Nuzzo
Scheduling Flexible Nonpreemptive Loads in Smart-Grid Networks
智能电网网络中灵活的非抢占式负载调度
- DOI:10.1109/tcns.2022.3141017
- 发表时间:2022
- 期刊:
- 影响因子:4.2
- 作者:Dahlin, Nathan;Jain, Rahul
- 通讯作者:Jain, Rahul
Non-indexability of the stochastic appointment scheduling problem
- DOI:10.1016/j.automatica.2020.109016
- 发表时间:2017-08
- 期刊:
- 影响因子:0
- 作者:Mehdi Jafarnia-Jahromi;Rahul Jain
- 通讯作者:Mehdi Jafarnia-Jahromi;Rahul Jain
An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space
连续状态空间非参数MDP的经验相对值学习算法
- DOI:10.23919/ecc.2019.8795982
- 发表时间:2019
- 期刊:
- 影响因子:0
- 作者:Sharma, Hiteshi;Jain, Rahul;Gupta, Abhishek
- 通讯作者:Gupta, Abhishek
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rahul Jain其他文献
Direct Product Theorems for Communication Complexity via Subdistribution Bounds
通过次分布界限计算通信复杂性的直积定理
- DOI:
- 发表时间:
2007 - 期刊:
- 影响因子:0
- 作者:
Rahul Jain;H. Klauck;A. Nayak - 通讯作者:
A. Nayak
A new information-theoretic property about quantum states with an application to privacy in quantum communication ∗
关于量子态的新信息论属性及其在量子通信隐私中的应用*
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Rahul Jain;J. Radhakrishnan;P. Sen - 通讯作者:
P. Sen
Direct Product Theorems for Classical Communication Complexity via Subdistribution Bounds
通过次分布界求解经典通信复杂性的直积定理
- DOI:
- 发表时间:
2007 - 期刊:
- 影响因子:0
- 作者:
Rahul Jain - 通讯作者:
Rahul Jain
The Partition Bound for Classical Communication Complexity and Query Complexity
经典通信复杂性和查询复杂性的分区界限
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Rahul Jain;H. Klauck - 通讯作者:
H. Klauck
Peptide‐Heterocycle Conjugates as Antifungals Against Cryptococcosis
肽杂环缀合物作为抗隐球菌病的抗真菌药
- DOI:
10.1002/ajoc.202200196 - 发表时间:
2022 - 期刊:
- 影响因子:2.7
- 作者:
K. Sharma;K. Sharma;Anurag Kudwal;Shabana I. Khan;Rahul Jain - 通讯作者:
Rahul Jain
Rahul Jain的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rahul Jain', 18)}}的其他基金
EAGER: Real-Time: Formal Reinforcement Learning Methods for the Design of Safety-critical Autonomous Systems
EAGER:实时:用于安全关键型自主系统设计的形式强化学习方法
- 批准号:
1839842 - 财政年份:2019
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
AF: Small: A New Approach to Analysis and Design of Algorithms for Stochastic Control and Optimization
AF:小:随机控制和优化算法分析和设计的新方法
- 批准号:
1817212 - 财政年份:2018
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
Collaborative Research: Smarter Markets for a Smarter Grid: Pricing Randomness, Flexibility and Risk
协作研究:智能电网的智能市场:定价随机性、灵活性和风险
- 批准号:
1611574 - 财政年份:2016
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
CAREER: Network Economics: Theory and Architectures for Incentive-engineered Networks
职业:网络经济学:激励工程网络的理论和架构
- 批准号:
0954116 - 财政年份:2010
- 资助金额:
$ 33万 - 项目类别:
Continuing Grant
NetSE: Small: Cooperation and Incentives in Communication and Social Networks
NetSE:小型:通信和社交网络中的合作和激励
- 批准号:
0917410 - 财政年份:2009
- 资助金额:
$ 33万 - 项目类别:
Continuing Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
- 批准号:61573081
- 批准年份:2015
- 资助金额:64.0 万元
- 项目类别:面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
- 批准号:61572533
- 批准年份:2015
- 资助金额:66.0 万元
- 项目类别:面上项目
E-Learning中学习者情感补偿方法的研究
- 批准号:61402392
- 批准年份:2014
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
CPS: Medium: Collaborative Research: Srch3D: Efficient 3D Model Search via Online Manufacturing-specific Object Recognition and Automated Deep Learning-Based Design Classification
CPS:中:协作研究:Srch3D:通过在线制造特定对象识别和基于自动化深度学习的设计分类进行高效 3D 模型搜索
- 批准号:
2240733 - 财政年份:2022
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
EdPlace's LEARNEY-AI: Need-driven AI-based adaptive online LEArning jouRNEY
EdPlace 的 LEARNEY-AI:需求驱动的基于人工智能的自适应在线学习之旅
- 批准号:
10034668 - 财政年份:2022
- 资助金额:
$ 33万 - 项目类别:
Collaborative R&D
ISports Wall – Improving the wellbeing of primary school pupils through an online platform for integrated, curriculum based exercise and learning - (CBEaL)
ISports Wall — 通过基于课程的综合锻炼和学习的在线平台改善小学生的福祉 - (CBEaL)
- 批准号:
10012668 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
Responsive Strategy and Planning
Collaborative Online Optimization for Efficient Model-Based Learning
基于模型的高效学习的协作在线优化
- 批准号:
2136206 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
cryoEDU: An online curriculum and software platform for hands-on learning in single-particle cryoEM and cryoET
CryoEDU:用于单粒子 CryoEM 和 CryoET 实践学习的在线课程和软件平台
- 批准号:
10663238 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
Realization of a learning support system for online discussions based on dialogue agents
基于对话代理的在线讨论学习支持系统的实现
- 批准号:
21K12154 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
cryoEDU: An online curriculum and software platform for hands-on learning in single-particle cryoEM and cryoET
CryoEDU:用于单粒子 CryoEM 和 CryoET 实践学习的在线课程和软件平台
- 批准号:
10436923 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
cryoEDU: An online curriculum and software platform for hands-on learning in single-particle cryoEM and cryoET
CryoEDU:用于单粒子 CryoEM 和 CryoET 实践学习的在线课程和软件平台
- 批准号:
10222983 - 财政年份:2021
- 资助金额:
$ 33万 - 项目类别:
RAPID: Learning to Teach During COVID-19: Leveraging Simulated Classrooms as Practice-Based Spaces for Preservice Elementary Teachers within Online Teacher Education Courses
RAPID:在 COVID-19 期间学习教学:利用模拟教室作为在线教师教育课程中职前小学教师的实践空间
- 批准号:
2032179 - 财政年份:2020
- 资助金额:
$ 33万 - 项目类别:
Standard Grant
A Multiplatform Multimodal Machine Learning Based Study of Misinformation Online
基于多平台多模态机器学习的在线错误信息研究
- 批准号:
2440362 - 财政年份:2020
- 资助金额:
$ 33万 - 项目类别:
Studentship