CAREER: Reinforcement Learning for Recursive Markov Decision Processes and Beyond
职业:递归马尔可夫决策过程及其他的强化学习
基本信息
- 批准号:2146563
- 负责人:
- 金额:$ 59.66万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-01 至 2027-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Reinforcement Learning (RL) is a sampling-based approach to optimization of Markov decision processes (MDPs), where agents rely on rewards to discover optimal solutions. When combined with powerful approximation schemes (e.g., deep neural networks), RL has been effective in highly complex tasks traditionally considered beyond reach of Artificial Intelligence. However, its sensitivity to the approximation parameters makes RL difficult to use (significant Machine Learning expertise is demanded of the programmer) and difficult to trust (manual approximations can invalidate guarantees). The vision of this project is to democratize RL by developing principled methodologies and powerful tools to improve the usability and trustworthiness of RL-based programming at scale. These research objectives are complemented by efforts to integrate the foundations of RL-based computability in CS education and to explore the role of RL-based programming in CS education.Approximation in RL is needed because RL algorithms with guaranteed convergence work on finite MDPs, and yet scale poorly. Approximation affects usability and trustworthiness. This proposal identifies two goals addressing both concerns: 1) to discover convergent RL beyond finite MDPs and 2) to develop abstraction-based approaches for RL with rigorous optimization guarantees. The success of the proposed approaches will be evaluated by their ability to handle systems at scale. The algorithms and datasets will be disseminated as open-source software. The proposed research makes fundamental contributions to three disciplines: formal methods, machine learning, and control theory; at the same time, it takes fundamental, concrete steps towards broadening participation in computing by making RL-based programming easier and more inclusive.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项全部或部分由2021年美国救援计划法案(公法117-2)资助。强化学习(RL)是一种基于采样的马尔可夫决策过程(MDP)优化方法,其中代理依赖于奖励来发现最佳解决方案。当与强大的近似方案(例如,深度神经网络),RL在传统上被认为超出人工智能范围的高度复杂任务中一直很有效。然而,它对近似参数的敏感性使得RL难以使用(程序员需要大量的机器学习专业知识)并且难以信任(手动近似可能使保证无效)。该项目的愿景是通过开发原则性的方法和强大的工具来提高基于RL的编程的可用性和可信度,从而使RL民主化。这些研究目标的补充,努力整合的基础RL为基础的可计算性在CS教育和探索的作用RL为基础的编程在CS education.Approximation在RL是必要的,因为RL算法保证收敛工作有限的MDPs,但规模差。近似性影响可用性和可信度。该提案确定了两个目标,解决这两个问题:1)发现收敛RL超越有限的MDPs和2)开发基于抽象的方法,RL具有严格的优化保证。所提出的方法的成功将根据其大规模处理系统的能力来评估。算法和数据集将作为开放源码软件传播。 该研究计划对形式化方法、机器学习和控制理论这三个学科做出了重要贡献,同时,通过使基于RL的编程变得更容易和更具包容性,为扩大计算参与度迈出了重要的具体步骤。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Octatope Abstract Domain for Verification of Neural Networks.
用于验证神经网络的八位位抽象域。
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Bak, S.;Dohmen, T.;Subramani, K.;Trivedi, A.;Velasquez, A.;Wojciechowski, P.
- 通讯作者:Wojciechowski, P.
LTL-Based Non-Markovian Inverse Reinforcement Learning
- DOI:10.5555/3545946.3599102
- 发表时间:2021-10
- 期刊:
- 影响因子:0
- 作者:Mohammad Afzal;Sankalp Gambhir;Ashutosh Gupta;S. Krishna;Ashutosh Trivedi;Alvaro Velasquez
- 通讯作者:Mohammad Afzal;Sankalp Gambhir;Ashutosh Gupta;S. Krishna;Ashutosh Trivedi;Alvaro Velasquez
Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning.
Mungojerrie:无模型强化学习中的线性时间目标。
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Hahn, Ernst Moritz;Perez, Mateo;Schewe, Sven;Somenzi, Fabio;Trivedi, Ashutosh;Wojtczak, Dominik
- 通讯作者:Wojtczak, Dominik
Recursive Reinforcement Learning
递归强化学习
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Hahn, Ernst Moritz;Perez, Mateo;Schewe, Sven;Somenzi, Fabio;Trivedi, Ashutosh;Wojtczak, Dominik
- 通讯作者:Wojtczak, Dominik
Optimal Repair for Omega-Regular Properties
Omega-Regular 特性的最佳修复
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Dave, V.;Krishna, S.;Murali, V.;Trivedi, A.
- 通讯作者:Trivedi, A.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ashutosh Trivedi其他文献
Delhi
德里
- DOI:
10.1177/0019556119790333 - 发表时间:
1979 - 期刊:
- 影响因子:0
- 作者:
Ashutosh Trivedi;Sadanand Ojha - 通讯作者:
Sadanand Ojha
Weighted timed games : Positive results with negative costs
加权定时游戏:积极的结果与消极的成本
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Benjamin Monmege;Thomas Brihaye;G. Geeraerts;Krishna Shankara Narayanan;L. Manasa;Ashutosh Trivedi - 通讯作者:
Ashutosh Trivedi
Formal verification of hyperproperties for control systems
控制系统超特性的形式化验证
- DOI:
10.1145/3457335.3461715 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Mahathi Anand;Vishnu Murali;Ashutosh Trivedi;Majid Zamani - 通讯作者:
Majid Zamani
Co-Buchi Barrier Certificates for Discrete-time Dynamical Systems
离散时间动力系统的 Co-Buchi 屏障证书
- DOI:
10.48550/arxiv.2311.07695 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Vishnu Murali;Ashutosh Trivedi;Majid Zamani - 通讯作者:
Majid Zamani
Stochastic Timed Games Revisited
重新审视随机计时游戏
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
S. Akshay;P. Bouyer;S. Krishna;L. Manasa;Ashutosh Trivedi - 通讯作者:
Ashutosh Trivedi
Ashutosh Trivedi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ashutosh Trivedi', 18)}}的其他基金
Collaborative Research: DASS: Assessing Accountability of Tax Preparation Software Systems
合作研究:DASS:评估报税软件系统的责任
- 批准号:
2317207 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
SHF: Small: Omega-Regular Objectives for Model-Free Reinforcement Learning
SHF:小型:无模型强化学习的 Omega-Regular 目标
- 批准号:
2009022 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
相似国自然基金
海桑属杂种区强化(Reinforcement)的检验与遗传基础研究
- 批准号:30800060
- 批准年份:2008
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
相似海外基金
CAREER: Stochasticity and Resilience in Reinforcement Learning: From Single to Multiple Agents
职业:强化学习中的随机性和弹性:从单个智能体到多个智能体
- 批准号:
2339794 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Towards Real-world Reinforcement Learning
职业:走向现实世界的强化学习
- 批准号:
2339395 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits
职业:模型不确定性下的鲁棒强化学习:算法和基本限制
- 批准号:
2337375 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Temporal Causal Reinforcement Learning and Control for Autonomous and Swarm Cyber-Physical Systems
职业:自治和群体网络物理系统的时间因果强化学习和控制
- 批准号:
2339774 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Structure Exploiting Multi-Agent Reinforcement Learning for Large Scale Networked Systems: Locality and Beyond
职业:为大规模网络系统利用多智能体强化学习的结构:局部性及其他
- 批准号:
2339112 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Intelligent Battery Management with Safe, Efficient, Fast-Adaption Reinforcement Learning and Physics-Inspired Machine Learning: From Cells to Packs
职业:具有安全、高效、快速适应的强化学习和物理启发机器学习的智能电池管理:从电池到电池组
- 批准号:
2340194 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Dual Reinforcement Learning: A Unifying Framework with Guarantees
职业:双重强化学习:有保证的统一框架
- 批准号:
2340651 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Reinforcement Learning-Based Control of Heterogeneous Multi-Agent Systems in Structured Environments: Algorithms and Complexity
职业:结构化环境中异构多智能体系统的基于强化学习的控制:算法和复杂性
- 批准号:
2237830 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Foundations of Reinforcement Learning under Partial Observability
职业:部分可观察性下强化学习的基础
- 批准号:
2239297 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: OneSense: One-Rule-for-All Combinatorial Boolean Synthesis via Reinforcement Learning
职业:OneSense:通过强化学习进行一刀切的组合布尔综合
- 批准号:
2349670 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant














{{item.name}}会员




