权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: Towards the Foundation of Approximate Sampling-Based Exploration in Sequential Decision Making

协作研究：为顺序决策中基于近似采样的探索奠定基础

基本信息

批准号：
2323113
负责人：
Quanquan Gu
金额：
$ 30万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2026-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2323113&HistoricalAwards=false
关键词：
Collaborative Research Towards Foundation Approximate

项目摘要

Sequential decision-making problems, such as bandits and reinforcement learning, play a crucial role in various AI applications, including recommendation systems, robotics, games, and personalized healthcare. The main challenge lies in finding the optimal exploration strategy that strikes a balance between choosing actions with the best performance and choosing actions with high uncertainties. However, existing exploration strategies heavily depend on specific cases, requiring prior knowledge of reward distribution, function approximation, and the task at hand. This creates computational obstacles and hampers real-world applicability. This project aims to establish a theoretical foundation for using approximate sampling-based techniques to unify exploration strategies across different sequential decision problems. The goal is to develop efficient and provable algorithms applicable to diverse learning problems under a unified algorithmic framework based on approximate sampling. This project also provides research training opportunities for graduate students. The project consists of three tasks. Task one focuses on developing fast approximate sampling-based exploration strategies for contextual bandit problems, accompanied by theoretical guarantees. Task two involves implementing and generalizing these exploration algorithms to more complex sequential decision-making applications, leveraging deep neural networks. Task three aims to establish efficient and provably effective exploration strategies for reinforcement learning problems. These advancements will be translated into accessible tools for various bandit and reinforcement learning applications, providing verifiable guarantees. The open-source software and course materials resulting from this project will be made publicly available, benefiting research, education, and society at large.This award by the Division of Mathematical Sciences is jointly supported by the NSF Office of Advanced Cyberinfrastructure.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

顺序决策问题，如强盗和强化学习，在各种人工智能应用中发挥着至关重要的作用，包括推荐系统，机器人，游戏和个性化医疗保健。主要的挑战在于找到最优的探索策略，在选择具有最佳性能的行动和选择具有高不确定性的行动之间取得平衡。然而，现有的探索策略在很大程度上依赖于特定的情况下，需要先验知识的奖励分布，函数逼近，和手头的任务。这造成了计算障碍，并阻碍了现实世界的适用性。该项目旨在建立一个理论基础，使用近似抽样为基础的技术，统一探索策略，在不同的顺序决策问题。我们的目标是开发有效的和可证明的算法，适用于不同的学习问题在一个统一的算法框架下，基于近似采样。该项目还为研究生提供了研究培训机会。该项目包括三项任务。任务一的重点是发展快速近似抽样为基础的探索策略上下文强盗问题，伴随着理论保证。任务二涉及利用深度神经网络将这些探索算法实现和推广到更复杂的顺序决策应用程序。任务三旨在为强化学习问题建立有效且可证明有效的探索策略。这些进步将转化为各种强盗和强化学习应用程序的可访问工具，提供可验证的保证。该项目产生的开源软件和课程材料将公开提供，使研究、教育和整个社会受益。该奖项由数学科学部颁发，并得到NSF高级网络基础设施办公室的共同支持。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Quanquan Gu其他文献

Different patterns of gray matter density in early- and middle-late-onset Parkinson’s disease a voxel-based morphometry study

早发和中晚发帕金森病灰质密度的不同模式：基于体素的形态测量研究

DOI：
10.1007/s11682-017-9745-4
发表时间：
2017
期刊：
Brain Imaging Behav
影响因子：
0
作者：
Min Xuan;Xiaojun Guan;Peiyu Huang;Zhujing Shen;Quanquan Gu;Xinfeng Yu;Xiaojun Xu;Wei Luo;Minming Zhang
通讯作者：
Minming Zhang