权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Towards Provably Efficient Representation Learning in Reinforcement Learning via Rich Function Approximation

RI：小：通过丰富函数逼近实现强化学习中可证明有效的表示学习

基本信息

批准号：
2154711
负责人：
Wen Sun
金额：
$ 38.46万
依托单位：
Cornell University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2025-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2154711&HistoricalAwards=false
关键词：
RI Small Towards Provably Efficient

项目摘要

Reinforcement Learning enables artificial intelligence systems to learn by themselves. While today’s reinforcement Learning systems can empirically outperform humans on some tasks (such as chess), these systems often rely on an extreme amount of data and computation resources. This makes them not suitable for real-world applications where data are expensive. Also, reinforcement learning algorithms used in these systems often do not have any performance guarantees, such as how many data points the algorithm needs in order to solve the task with high confidence, which also limits their usage in safety critical applications. The main novelty of this project will be the development of new reinforcement learning algorithms that can learn efficiently, using as few training data points as possible. The development of efficient reinforcement learning algorithms can expand the applications of these systems to real-world applications where data are expensive to collect. For example, in autonomous driving systems, the developed technologies would have the potential to enable self-driving cars to adapt to new road conditions faster by making fewer mistakes. In personalized navigation systems for visually impaired people, systems trained with efficient reinforcement learning algorithms can engage with users via high-quality interactions at an early stage of the learning process, thus positively influence the user experience.The project aims to bridge the gap between reinforcement learning theory and practice by developing computationally and statistically efficient algorithms for large-scale Markov Decision Processes where data are high-dimensional and complex. The key innovation proposed in this project is to open the black box by incorporating representation learning into the reinforcement learning framework. The representation learning approach allows algorithms to extract compact information from high dimensional and unstructured data, and perform reasoning and decision making only using the compact representation — thus vastly improving the sample and computation efficiency. Two main thrusts are: (1) how to learn representations for counterfactual reinforcement learning where the learner only has access to a static dataset and has no ability to further interact with the environment; (2) how to integrate representation learning, exploration, and exploitation in the online reinforcement learning setting where the agent needs to actively interact with the environment for data acquisition. In addition to the algorithms and reinforcement learning representation learning theory development, this project proposes to design personalized voice navigation systems that can adapt to end-users, where sample efficient offline and online reinforcement learning plays an important role in fast and safe adaptation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习使人工智能系统能够自行学习。虽然今天的强化学习系统可以在某些任务（如国际象棋）上超越人类，但这些系统通常依赖于极端数量的数据和计算资源。这使得它们不适合数据昂贵的实际应用程序。此外，在这些系统中使用的强化学习算法通常没有任何性能保证，例如算法需要多少数据点才能以高置信度解决任务，这也限制了它们在安全关键应用中的使用。这个项目的主要新颖之处在于开发新的强化学习算法，该算法可以使用尽可能少的训练数据点进行高效学习。高效强化学习算法的发展可以将这些系统的应用扩展到数据收集成本高昂的现实应用中。例如，在自动驾驶系统中，开发的技术将有可能使自动驾驶汽车更快地适应新的道路条件，减少错误。在针对视障人士的个性化导航系统中，经过高效强化学习算法训练的系统可以在学习过程的早期阶段通过高质量的交互与用户互动，从而对用户体验产生积极影响。该项目旨在通过为数据高维和复杂的大规模马尔可夫决策过程开发计算和统计上有效的算法，弥合强化学习理论与实践之间的差距。在这个项目中提出的关键创新是通过将表示学习纳入强化学习框架来打开黑箱。表示学习方法允许算法从高维和非结构化数据中提取紧凑信息，并仅使用紧凑表示执行推理和决策-从而大大提高了样本和计算效率。两个主要的重点是：(1)如何学习反事实强化学习的表示，其中学习者只能访问静态数据集，并且无法进一步与环境交互；(2)如何在智能体需要主动与环境交互以获取数据的在线强化学习环境中整合表征学习、探索和利用。除了算法和强化学习表征学习理论的发展之外，本项目还提出设计能够适应最终用户的个性化语音导航系统，其中样本高效的离线和在线强化学习在快速和安全的适应中起着重要作用。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Multi-task Representation Learning for Pure Exploration in Linear Bandits

DOI：
10.48550/arxiv.2302.04441
发表时间：
2023-02
期刊：
ArXiv
影响因子：
0
作者：
Yihan Du;Longbo Huang;Wen Sun
通讯作者：
Yihan Du;Longbo Huang;Wen Sun

Provable Benefits of Representational Transfer in Reinforcement Learning

DOI：
10.48550/arxiv.2205.14571
发表时间：
2022-05
期刊：
ArXiv
影响因子：
0
作者：
Alekh Agarwal;Yuda Song;Wen Sun;Kaiwen Wang;Mengdi Wang;Xuezhou Zhang
通讯作者：
Alekh Agarwal;Yuda Song;Wen Sun;Kaiwen Wang;Mengdi Wang;Xuezhou Zhang

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

DOI：
10.48550/arxiv.2210.06718
发表时间：
2022-10
期刊：
ArXiv
影响因子：
0
作者：
Yuda Song;Yi Zhou;Ayush Sekhari;J. Bagnell;A. Krishnamurthy;Wen Sun
通讯作者：
Yuda Song;Yi Zhou;Ayush Sekhari;J. Bagnell;A. Krishnamurthy;Wen Sun

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Wen Sun其他文献

Synchronization criterions between two identical or different fractional order chaotic systems

两个相同或不同分数阶混沌系统之间的同步准则

DOI：
发表时间：
2011
期刊：
Journal of Information and Computing Science
影响因子：
0
作者：
Yuhua Xu;Wuneng Zhou;Jian'an Fang;Lin Pan;Wen Sun
通讯作者：
Wen Sun

Composite of nonexpansion reduced graphite oxide and carbon derived from pitch as anodes of Na-ion batteries with high coulombic efficiency

非膨胀还原石墨氧化物与沥青碳复合材料作为高库伦效率钠离子电池负极

DOI：
10.1016/j.cej.2016.10.074
发表时间：
2017-02
期刊：
Chemical Engineering Journal
影响因子：
15.1
作者：
Wen Sun;Xiaodong Hong;Ming Wang;Yongqiang Mao
通讯作者：
Yongqiang Mao

Research on TVD Control of Cornering Energy Consumption for Distributed Drive Electric Vehicles Based on PMP

基于PMP的分布式驱动电动汽车转弯能耗TVD控制研究

DOI：
10.3390/en15072641
发表时间：
2022-04
期刊：
ENERGIES
影响因子：
3.2
作者：
Wen Sun;Yang Chen;Junnian Wang;Xiangyu Wang;Lili Liu
通讯作者：
Lili Liu

Investigating thrust-fault growth and segment linkage using displacement distribution analysis in the active Duzhanzi thrust fault zone, Northern Tian Shan of China

利用位移分布分析研究中国北天山独站子逆冲断裂带的逆冲断层生长和节段联系

DOI：
10.1016/j.jsg.2020.103990
发表时间：
2020-04
期刊：
Journal of Structural Geology
影响因子：
3.1
作者：
Zhanyu Wei;Honglin He;Wen Sun;Qitian Zhuang;Zihan Liang
通讯作者：
Zihan Liang

A comparative study of bedrock fault scarps by s-UAV and t-LiDAR: Insights into site selection criteria for paleo-seismology studies

s-UAV 和 t-LiDAR 对基岩断层陡坎的比较研究：深入了解古地震学研究的选址标准

DOI：
10.1016/j.geomorph.2022.108372
发表时间：
2022-07
期刊：
Geomorphology
影响因子：
3.9
作者：
Junjie Zou;Honglin He;Yusuke Yokoyama;Yoshiki Shirahama;Shuang Geng;Yongsheng Zhou;Zhanyu Wei;Feng Shi;Chao Zhou;Wen Sun
通讯作者：
Wen Sun