CAREER: Dual Reinforcement Learning: A Unifying Framework with Guarantees
职业:双重强化学习:有保证的统一框架
基本信息
- 批准号:2340651
- 负责人:
- 金额:$ 59.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-09-01 至 2029-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Reinforcement learning (RL) holds the promise to automate and improve many real-world processes that require sequential decision-making to optimize some long-term objective, such as self-driving cars, industry automation, recommendation systems, and more recently in natural language processing. There has been much exciting progress in the field of deep reinforcement learning in the past few years, with RL agents demonstrating remarkable performance across a wide range of problem domains. However, to achieve this progress, it is necessary to have access to a fast simulator and tens or hundreds of millions of data points that are collected, trained on, then thrown away. Off-policy methods are an alternative approach, which provide much more data efficiency because they are not restricted to only training on on-policy data and can even be used to train on existing offline data. This suggests that to truly unlock the potential of reinforcement learning, we must develop principled off-policy algorithms. This project is focused on advancing RL by looking at a framework that aims to provide a unified, principled objective that applies to both standard and off-line RL settings and will allow us to efficiently solve large-scale, real-world, sequential decision-making problems.In this project, the PI will examine the dual formulation of this objective, which gives rise to a principled off-policy objective that sidesteps issues present in the more commonly used primal formulation. This objective will lead to algorithms particularly suitable for large state-action spaces, long horizons, and sparse rewards encountered in real-world problems. The PI will explore connections between existing and new imitation learning and reinforcement-learning methods and the proposed framework. The PI will show that both imitation learning and reinforcement learning methods are unified under this objective and present theoretical guarantees for this class of methods. Finally, the PI will extend the dual framework to leverage pre-training and fine tuning for improved sample efficiency. This includes exploring methods for incorporating out-of-domain datasets and multiple modalities in self-supervised pre-training, especially relevant for applications in household robotics.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
强化学习(RL)有望自动化和改进许多需要顺序决策以优化某些长期目标的现实世界流程,例如自动驾驶汽车,工业自动化,推荐系统以及最近的自然语言处理。在过去的几年里,深度强化学习领域取得了令人兴奋的进展,强化学习代理在广泛的问题领域中表现出卓越的性能。然而,要实现这一进展,必须能够访问快速模拟器和数千万或数亿个数据点,这些数据点被收集、训练,然后被丢弃。非策略方法是一种替代方法,它提供了更高的数据效率,因为它们不仅限于在策略数据上进行训练,甚至可以用于在现有的离线数据上进行训练。这表明,要真正释放强化学习的潜力,我们必须开发有原则的非策略算法。这个项目的重点是通过研究一个框架来推进强化学习,该框架旨在提供一个统一的,原则性的目标,适用于标准和离线强化学习设置,并使我们能够有效地解决大规模,现实世界,顺序决策问题。在这个项目中,PI将研究这个目标的双重表述,这就产生了一个原则性的政策外目标,回避了更常用的原始公式中存在的问题。这一目标将导致算法特别适合于大的状态-动作空间,长的视野,和稀疏的奖励在现实世界中遇到的问题。PI将探索现有和新的模仿学习和模仿学习方法与拟议框架之间的联系。PI将表明,模仿学习和强化学习方法在这一目标下是统一的,并为这类方法提供理论保证。最后,PI将扩展双重框架,以利用预训练和微调来提高样本效率。这包括探索将域外数据集和多种模式纳入自我监督预训练的方法,特别是与家用机器人应用相关的方法。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Amy Zhang其他文献
Intervention Design for Effective Sim2Real Transfer
有效 Sim2Real 传输的干预设计
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Melissa Mozifian;Amy Zhang;Joelle Pineau;D. Meger - 通讯作者:
D. Meger
Learning Action-based Representations Using Invariance
使用不变性学习基于动作的表示
- DOI:
10.48550/arxiv.2403.16369 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Max Rudolph;Caleb Chuck;Kevin Black;Misha Lvovsky;S. Niekum;Amy Zhang - 通讯作者:
Amy Zhang
Discovery of Insulin Receptor Partial Agonists MK-5160 and MK-1092 as Novel Basal Insulins with Potential to Improve Therapeutic Index.
发现胰岛素受体部分激动剂 MK-5160 和 MK-1092 作为具有提高治疗指数潜力的新型基础胰岛素。
- DOI:
10.1021/acs.jmedchem.1c02073 - 发表时间:
2022 - 期刊:
- 影响因子:7.3
- 作者:
Dmitri A Pissarnitski;A. Kekeç;Lin Yan;Yuping Zhu;D. Feng;Pei Huo;Christina B. Madsen;C. Moyes;R. Nargund;Theresa Kelly;Xiaoping Zhang;E. Carballo;Judith N. Gorski;Peter T. Zafian;Mo Qatanani;N. Kaarsholm;Fanyu Meng;X. Jia;Keun;Weixun Wang;Sherrie Xu;Michael J. Hohn;M. Iammarino;M. Mccoy;Grace A Okoh;Yingkai Liang;S. Hollingsworth;M. Erion;D. Kelley;R. Garbaccio;Amy Zhang;J. Mu;Songnian Lin - 通讯作者:
Songnian Lin
The scope of tobacco cessation randomized controlled trials in low- to middle-income countries: protocol for a scoping review
中低收入国家戒烟随机对照试验的范围:范围界定审查方案
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:3.7
- 作者:
Navin Kumar;Jessica Ainooson;A. Billings;Grace Q. Chen;Lauren Cueto;Kamila Janmohamed;Jeannette Jiang;R. Niaura;Amy Zhang - 通讯作者:
Amy Zhang
The Impact of Inotuzumab Ozogamicin (InO) Treatment on Brexucabtagene Autoleucel (Brexu-cel) Outcomes in Adults with Relapsed/Refractory B-Cell Acute Lymphoblastic Leukemia (B-ALL)
- DOI:
10.1182/blood-2023-182404 - 发表时间:
2023-11-02 - 期刊:
- 影响因子:
- 作者:
Ibrahim Aldoss;Gregory W Roloff;Anjali S. Advani;Noam E. Kopmar;Chenyu Lin;Simone E. Dekker;Vishal K Gupta;Nikeshan Jeyakumar;Timothy E O'Connor;Amy Zhang;Katharine Miller;Kaitlyn C Dykes;Mohamed Ahmed;Hector Zambrano;Danielle Bradshaw;Santiago Mercadal;Marc Schwartz;Sean Tracy;Bhagirathbhai Dholaria;Michal Kubiak - 通讯作者:
Michal Kubiak
Amy Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Amy Zhang', 18)}}的其他基金
CAREER: Tools for User and Community-Led Social Media Curation
职业:用户和社区主导的社交媒体管理工具
- 批准号:
2236618 - 财政年份:2023
- 资助金额:
$ 59.98万 - 项目类别:
Continuing Grant
Collaborative Research: DASS: Transitioning open-source software projects to accountable community governance
合作研究:DASS:将开源软件项目转变为负责任的社区治理
- 批准号:
2217653 - 财政年份:2022
- 资助金额:
$ 59.98万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Large: Privacy-Preserving Abuse Prevention for Encrypted Communications Platforms
协作研究:SaTC:核心:大型:加密通信平台的隐私保护滥用预防
- 批准号:
2120497 - 财政年份:2021
- 资助金额:
$ 59.98万 - 项目类别:
Continuing Grant
相似国自然基金
基于双荧光结核菌和Dual RNA-seq技术的病原宿主免疫互作关键基因挖掘及机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
Dual AGN 的系统搜寻及其性质研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
AKAP3通过其Dual和RI结构域整合多重信号通路调控精子活力和男性育性的机理研究
- 批准号:82171602
- 批准年份:2021
- 资助金额:54 万元
- 项目类别:面上项目
磷化双贱金属合金超薄膜dual-(Bimetallene-P)催化材料的超声脉冲界面构筑及其电解水性能研究
- 批准号:
- 批准年份:2021
- 资助金额:60 万元
- 项目类别:面上项目
基于dual-buck本征安全型单相 V2G 变换器解耦式拓扑集成及复合调制策略
- 批准号:62141103
- 批准年份:2021
- 资助金额:12.00 万元
- 项目类别:专项项目
利用Dual RNA-sep研究猪瘟病毒与宿主转录组相互作用的分子机制
- 批准号:31872484
- 批准年份:2018
- 资助金额:61.0 万元
- 项目类别:面上项目
基于Dual-Kriging代理模型的稳健设计新方法及在板料成形中应用研究
- 批准号:51005193
- 批准年份:2010
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
- 批准号:
2334970 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Standard Grant
超高分解能超音波とDual-energy CTによる腸管虚血ペナンブライメージング技術の開発
超高分辨率超声和双能CT肠道缺血半暗带成像技术的发展
- 批准号:
24K18752 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Expanding syphilis screening among pregnant women in Indonesia using the rapid dual test for syphilis & HIV with capacity building: The DUALIS Study
使用梅毒快速双重检测扩大印度尼西亚孕妇梅毒筛查
- 批准号:
MR/Y004825/1 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Research Grant
ICF: A novel dual-target gene therapy for safe and efficacious treatment of chronic non-infectious uveitis
ICF:一种安全有效治疗慢性非感染性葡萄膜炎的新型双靶点基因疗法
- 批准号:
MR/Z50385X/1 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Research Grant
CAREER: Rational Design of Dual-Functional Photocatalysts for Synthetic Reactions: Controlling Photosensitization and Reaction with a Single Nanocrystal
职业:用于合成反应的双功能光催化剂的合理设计:用单个纳米晶体控制光敏化和反应
- 批准号:
2339866 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Continuing Grant
Dual Syphilis and HIV: Evaluation of POC and Self-Test by Untrained Persons, Peers and Intended Users
双梅毒和 HIV:未经培训的人员、同行和目标用户对 POC 和自检的评估
- 批准号:
502788 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Directed Grant
NSF Convergence Accelerator Track L: UAV-assisted dual-comb spectroscopic detection, localization, and quantification of multiple atmospheric trace-gas emissions
NSF 收敛加速器轨道 L:无人机辅助的双梳光谱检测、定位和多种大气痕量气体排放的量化
- 批准号:
2344395 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Standard Grant
Dual Series Gate Configuration, Materials Design, and Mechanistic Modeling for Drift-Stabilized, Highly Sensitive Organic Electrochemical Transistor Biosensors
用于漂移稳定、高灵敏度有机电化学晶体管生物传感器的双串联栅极配置、材料设计和机械建模
- 批准号:
2402407 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Standard Grant
Chirality-Driven Self-Assembly of Dual Catalytic Dendrimers: Application Toward One-Pot Tandem Reactions
双催化树枝状聚合物的手性驱动自组装:一锅串联反应的应用
- 批准号:
2426644 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Standard Grant
Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
- 批准号:
2334969 - 财政年份:2024
- 资助金额:
$ 59.98万 - 项目类别:
Standard Grant














{{item.name}}会员




