权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Towards Real-world Reinforcement Learning

职业：走向现实世界的强化学习

基本信息

批准号：
2339395
负责人：
Wen Sun
金额：
$ 60万
依托单位：
Cornell University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-03-01 至 2029-02-28
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339395&HistoricalAwards=false
关键词：
CAREER Towards Real world Reinforcement

项目摘要

Reinforcement learning (RL) is one of the most important paradigms for modeling data-driven decision-making. Recent years have witnessed several empirical successes of RL, such as RL agents that outperform humans in video and board games. However, many empirical RL algorithms today often require many training examples to learn and can produce unreliable solutions (solutions that exhibit catastrophic failures, for example). While these issues are typically not problematic when training RL agents in simulators, they pose significant difficulties when deploying RL to real-world problems where data (including human feedback) is expensive, and reliability is essential. The main novelty of this project will be the development of new RL algorithms that can learn efficiently (from as few training data points as possible) and reliably (avoid catastrophic failures with high probability). The development of such RL algorithms can expand the applications of RL systems from simulation to real-world applications where data is expensive to collect and safety is critical. In autonomous driving, the developed technologies can make self-driving cars adapt to new road conditions safely by making fewer mistakes. In generative Artificial Intelligence (AI), efficient and reliable RL algorithms that can learn from rich human feedback will enable better human-AI alignment, making AI systems improve reliably and safely under human guidance.The main research goal of this project is to enable real-world RL by advancing RL techniques, theoretically and empirically. The critical innovation in the project is to develop safe and efficient RL algorithms by leveraging specific problem structures and rich human feedback. The project has three main thrusts. First, the project will establish risk-averse RL algorithms that are provably correct and scalable to high dimensional data. Second, the project will develop RL algorithms that can leverage common problem-specific structures for improved sample efficiency. Third, the project will create new algorithms for RL with rich feedback beyond scalar rewards (including preference-based feedback and positive demonstrations). In addition to the proposed work on algorithmic advancements, the project will focus on their deployment to real-world problems, including database query optimization and optimizing generative models such as Large Language Models and Diffusion Models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习(RL)是数据驱动决策建模的重要范例之一。近年来，RL已经取得了几项经验上的成功，例如RL代理在视频和棋盘游戏中的表现优于人类。然而，今天的许多经验RL算法通常需要许多训练样本来学习，并且可能产生不可靠的解决方案(例如，表现出灾难性故障的解决方案)。虽然在模拟器中培训RL代理时，这些问题通常不是问题，但当将RL部署到数据(包括人工反馈)昂贵且可靠性至关重要的真实世界问题时，它们会带来重大困难。该项目的主要新颖性将是开发新的RL算法，该算法可以有效地(从尽可能少的训练数据点)和可靠的(避免高概率的灾难性故障)学习。这种RL算法的发展可以将RL系统的应用从模拟扩展到真实世界的应用，在现实世界中，收集数据的成本很高，而且安全至关重要。在自动驾驶中，开发的技术可以让自动驾驶汽车通过更少的错误安全地适应新的路况。在产生式人工智能(AI)中，能够从丰富的人类反馈中学习的高效可靠的RL算法将使人与AI更好地匹配，使AI系统在人类的指导下可靠而安全地改进。本项目的主要研究目标是通过从理论和经验上推进RL技术来实现现实世界的RL。该项目的关键创新是通过利用特定的问题结构和丰富的人类反馈来开发安全和高效的RL算法。该项目有三个主要推动力。首先，该项目将建立风险厌恶的RL算法，这些算法被证明是正确的，并且可扩展到高维数据。其次，该项目将开发RL算法，该算法可以利用常见的特定问题结构来提高样本效率。第三，该项目将为RL创建新的算法，提供除标量奖励之外的丰富反馈(包括基于偏好的反馈和正面演示)。除了在算法改进方面的拟议工作外，该项目还将专注于将它们部署到现实世界的问题中，包括数据库查询优化和优化生成模型，如大型语言模型和扩散模型。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Wen Sun其他文献

Synchronization criterions between two identical or different fractional order chaotic systems

两个相同或不同分数阶混沌系统之间的同步准则

DOI：
发表时间：
2011
期刊：
Journal of Information and Computing Science
影响因子：
0
作者：
Yuhua Xu;Wuneng Zhou;Jian'an Fang;Lin Pan;Wen Sun
通讯作者：
Wen Sun

Composite of nonexpansion reduced graphite oxide and carbon derived from pitch as anodes of Na-ion batteries with high coulombic efficiency

非膨胀还原石墨氧化物与沥青碳复合材料作为高库伦效率钠离子电池负极

DOI：
10.1016/j.cej.2016.10.074
发表时间：
2017-02
期刊：
Chemical Engineering Journal
影响因子：
15.1
作者：
Wen Sun;Xiaodong Hong;Ming Wang;Yongqiang Mao
通讯作者：
Yongqiang Mao

Research on TVD Control of Cornering Energy Consumption for Distributed Drive Electric Vehicles Based on PMP

基于PMP的分布式驱动电动汽车转弯能耗TVD控制研究

DOI：
10.3390/en15072641
发表时间：
2022-04
期刊：
ENERGIES
影响因子：
3.2
作者：
Wen Sun;Yang Chen;Junnian Wang;Xiangyu Wang;Lili Liu
通讯作者：
Lili Liu

Investigating thrust-fault growth and segment linkage using displacement distribution analysis in the active Duzhanzi thrust fault zone, Northern Tian Shan of China

利用位移分布分析研究中国北天山独站子逆冲断裂带的逆冲断层生长和节段联系

DOI：
10.1016/j.jsg.2020.103990
发表时间：
2020-04
期刊：
Journal of Structural Geology
影响因子：
3.1
作者：
Zhanyu Wei;Honglin He;Wen Sun;Qitian Zhuang;Zihan Liang
通讯作者：
Zihan Liang

A comparative study of bedrock fault scarps by s-UAV and t-LiDAR: Insights into site selection criteria for paleo-seismology studies

s-UAV 和 t-LiDAR 对基岩断层陡坎的比较研究：深入了解古地震学研究的选址标准

DOI：
10.1016/j.geomorph.2022.108372
发表时间：
2022-07
期刊：
Geomorphology
影响因子：
3.9
作者：
Junjie Zou;Honglin He;Yusuke Yokoyama;Yoshiki Shirahama;Shuang Geng;Yongsheng Zhou;Zhanyu Wei;Feng Shi;Chao Zhou;Wen Sun
通讯作者：
Wen Sun