RI: Small: Towards Provably Efficient Representation Learning in Reinforcement Learning via Rich Function Approximation

RI:小:通过丰富函数逼近实现强化学习中可证明有效的表示学习

基本信息

  • 批准号:
    2154711
  • 负责人:
  • 金额:
    $ 38.46万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-10-01 至 2025-09-30
  • 项目状态:
    未结题

项目摘要

Reinforcement Learning enables artificial intelligence systems to learn by themselves. While today’s reinforcement Learning systems can empirically outperform humans on some tasks (such as chess), these systems often rely on an extreme amount of data and computation resources. This makes them not suitable for real-world applications where data are expensive. Also, reinforcement learning algorithms used in these systems often do not have any performance guarantees, such as how many data points the algorithm needs in order to solve the task with high confidence, which also limits their usage in safety critical applications. The main novelty of this project will be the development of new reinforcement learning algorithms that can learn efficiently, using as few training data points as possible. The development of efficient reinforcement learning algorithms can expand the applications of these systems to real-world applications where data are expensive to collect. For example, in autonomous driving systems, the developed technologies would have the potential to enable self-driving cars to adapt to new road conditions faster by making fewer mistakes. In personalized navigation systems for visually impaired people, systems trained with efficient reinforcement learning algorithms can engage with users via high-quality interactions at an early stage of the learning process, thus positively influence the user experience.The project aims to bridge the gap between reinforcement learning theory and practice by developing computationally and statistically efficient algorithms for large-scale Markov Decision Processes where data are high-dimensional and complex. The key innovation proposed in this project is to open the black box by incorporating representation learning into the reinforcement learning framework. The representation learning approach allows algorithms to extract compact information from high dimensional and unstructured data, and perform reasoning and decision making only using the compact representation — thus vastly improving the sample and computation efficiency. Two main thrusts are: (1) how to learn representations for counterfactual reinforcement learning where the learner only has access to a static dataset and has no ability to further interact with the environment; (2) how to integrate representation learning, exploration, and exploitation in the online reinforcement learning setting where the agent needs to actively interact with the environment for data acquisition. In addition to the algorithms and reinforcement learning representation learning theory development, this project proposes to design personalized voice navigation systems that can adapt to end-users, where sample efficient offline and online reinforcement learning plays an important role in fast and safe adaptation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
强化学习使人工智能系统能够自行学习。虽然今天的强化学习系统可以在某些任务(如国际象棋)上超越人类,但这些系统通常依赖于极端数量的数据和计算资源。这使得它们不适合数据昂贵的实际应用程序。此外,在这些系统中使用的强化学习算法通常没有任何性能保证,例如算法需要多少数据点才能以高置信度解决任务,这也限制了它们在安全关键应用中的使用。这个项目的主要新颖之处在于开发新的强化学习算法,该算法可以使用尽可能少的训练数据点进行高效学习。高效强化学习算法的发展可以将这些系统的应用扩展到数据收集成本高昂的现实应用中。例如,在自动驾驶系统中,开发的技术将有可能使自动驾驶汽车更快地适应新的道路条件,减少错误。在针对视障人士的个性化导航系统中,经过高效强化学习算法训练的系统可以在学习过程的早期阶段通过高质量的交互与用户互动,从而对用户体验产生积极影响。该项目旨在通过为数据高维和复杂的大规模马尔可夫决策过程开发计算和统计上有效的算法,弥合强化学习理论与实践之间的差距。在这个项目中提出的关键创新是通过将表示学习纳入强化学习框架来打开黑箱。表示学习方法允许算法从高维和非结构化数据中提取紧凑信息,并仅使用紧凑表示执行推理和决策-从而大大提高了样本和计算效率。两个主要的重点是:(1)如何学习反事实强化学习的表示,其中学习者只能访问静态数据集,并且无法进一步与环境交互;(2)如何在智能体需要主动与环境交互以获取数据的在线强化学习环境中整合表征学习、探索和利用。除了算法和强化学习表征学习理论的发展之外,本项目还提出设计能够适应最终用户的个性化语音导航系统,其中样本高效的离线和在线强化学习在快速和安全的适应中起着重要作用。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-task Representation Learning for Pure Exploration in Linear Bandits
  • DOI:
    10.48550/arxiv.2302.04441
  • 发表时间:
    2023-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yihan Du;Longbo Huang;Wen Sun
  • 通讯作者:
    Yihan Du;Longbo Huang;Wen Sun
Provable Benefits of Representational Transfer in Reinforcement Learning
  • DOI:
    10.48550/arxiv.2205.14571
  • 发表时间:
    2022-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alekh Agarwal;Yuda Song;Wen Sun;Kaiwen Wang;Mengdi Wang;Xuezhou Zhang
  • 通讯作者:
    Alekh Agarwal;Yuda Song;Wen Sun;Kaiwen Wang;Mengdi Wang;Xuezhou Zhang
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
  • DOI:
    10.48550/arxiv.2210.06718
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuda Song;Yi Zhou;Ayush Sekhari;J. Bagnell;A. Krishnamurthy;Wen Sun
  • 通讯作者:
    Yuda Song;Yi Zhou;Ayush Sekhari;J. Bagnell;A. Krishnamurthy;Wen Sun
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Wen Sun其他文献

Synchronization criterions between two identical or different fractional order chaotic systems
两个相同或不同分数阶混沌系统之间的同步准则
Composite of nonexpansion reduced graphite oxide and carbon derived from pitch as anodes of Na-ion batteries with high coulombic efficiency
非膨胀还原石墨氧化物与沥青碳复合材料作为高库伦效率钠离子电池负极
  • DOI:
    10.1016/j.cej.2016.10.074
  • 发表时间:
    2017-02
  • 期刊:
  • 影响因子:
    15.1
  • 作者:
    Wen Sun;Xiaodong Hong;Ming Wang;Yongqiang Mao
  • 通讯作者:
    Yongqiang Mao
Research on TVD Control of Cornering Energy Consumption for Distributed Drive Electric Vehicles Based on PMP
基于PMP的分布式驱动电动汽车转弯能耗TVD控制研究
  • DOI:
    10.3390/en15072641
  • 发表时间:
    2022-04
  • 期刊:
  • 影响因子:
    3.2
  • 作者:
    Wen Sun;Yang Chen;Junnian Wang;Xiangyu Wang;Lili Liu
  • 通讯作者:
    Lili Liu
Investigating thrust-fault growth and segment linkage using displacement distribution analysis in the active Duzhanzi thrust fault zone, Northern Tian Shan of China
利用位移分布分析研究中国北天山独站子逆冲断裂带的逆冲断层生长和节段联系
  • DOI:
    10.1016/j.jsg.2020.103990
  • 发表时间:
    2020-04
  • 期刊:
  • 影响因子:
    3.1
  • 作者:
    Zhanyu Wei;Honglin He;Wen Sun;Qitian Zhuang;Zihan Liang
  • 通讯作者:
    Zihan Liang
A comparative study of bedrock fault scarps by s-UAV and t-LiDAR: Insights into site selection criteria for paleo-seismology studies
s-UAV 和 t-LiDAR 对基岩断层陡坎的比较研究:深入了解古地震学研究的选址标准
  • DOI:
    10.1016/j.geomorph.2022.108372
  • 发表时间:
    2022-07
  • 期刊:
  • 影响因子:
    3.9
  • 作者:
    Junjie Zou;Honglin He;Yusuke Yokoyama;Yoshiki Shirahama;Shuang Geng;Yongsheng Zhou;Zhanyu Wei;Feng Shi;Chao Zhou;Wen Sun
  • 通讯作者:
    Wen Sun

Wen Sun的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Wen Sun', 18)}}的其他基金

CAREER: Towards Real-world Reinforcement Learning
职业:走向现实世界的强化学习
  • 批准号:
    2339395
  • 财政年份:
    2024
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Continuing Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

RI: Small: Towards Abstractive Summarization That Preserves the Original Meaning
RI:小:走向保留原意的抽象概括
  • 批准号:
    2303678
  • 财政年份:
    2022
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Towards Optimal and Adaptive Reinforcement Learning with Offline Data and Limited Adaptivity
RI:小型:利用离线数据和有限的适应性实现最优和自适应强化学习
  • 批准号:
    2007117
  • 财政年份:
    2020
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Towards Abstractive Summarization That Preserves the Original Meaning
RI:小:走向保留原意的抽象概括
  • 批准号:
    1909603
  • 财政年份:
    2019
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Learning Dynamics and Evolution towards Cognitive Understanding of Videos
RI:小:视频认知理解的学习动态和演化
  • 批准号:
    1813709
  • 财政年份:
    2018
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Towards a Formal Theory of Blameworthiness, Intention, and Moral Responsibility
RI:小:走向应受谴责、意图和道德责任的正式理论
  • 批准号:
    1718108
  • 财政年份:
    2017
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Extending Verb Semantics with Causality towards Physical World
RI:小:将动词语义与因果关系扩展到物理世界
  • 批准号:
    1617682
  • 财政年份:
    2016
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Dynamic Attractor Computing: A Novel Computational Approach Applied Towards Temporal Pattern and Speech Recognition
RI:小型:动态吸引子计算:一种应用于时间模式和语音识别的新颖计算方法
  • 批准号:
    1420897
  • 财政年份:
    2014
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Towards Modeling Source Separation from Measured Cortical Responses
RI:小型:协作研究:根据测量的皮质反应对源分离进行建模
  • 批准号:
    1320366
  • 财政年份:
    2013
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Towards Modeling Source Separation from Measured Cortical Responses
RI:小型:协作研究:根据测量的皮质反应对源分离进行建模
  • 批准号:
    1320260
  • 财政年份:
    2013
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
RI: Small: Towards Practical Tractability in Constraint Processing
RI:小:实现约束处理的实用易处理性
  • 批准号:
    1117956
  • 财政年份:
    2011
  • 资助金额:
    $ 38.46万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了