权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: AF: Small: Parallel Reinforcement Learning with Communication and Adaptivity Constraints

协作研究：AF：小型：具有通信和适应性约束的并行强化学习

基本信息

批准号：
2006591
负责人：
Qin Zhang
金额：
$ 24.22万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-10-01 至 2023-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2006591&HistoricalAwards=false
关键词：
Collaborative Research AF Small Parallel

项目摘要

Reinforcement learning has witnessed great research advancement in recent years and achieved successes in many practical applications. However, reinforcement-learning algorithms also have the reputation for being data- and computation-hungry for large-scale applications. This project will address this issue by studying the important question of how to make reinforcement-learning algorithms scalable via introducing multiple learning agents and allowing them to collect data and learn optimal strategies collaboratively. The outcomes of this project will have impacts on numerous areas where reinforcement learning is used at a scale, e.g., multi-phase clinical trials, training autonomous-driving algorithms, crowdsourcing tasks, pricing, and assortment optimization for stores at different locations. The research products will be disseminated via talks at academic conferences and workshops, universities, industrial labs, and online media, and will also be integrated in two courses on the forefront of reinforcement learning and big-data algorithms.More technically, this project will study how to address the fundamental constraints on communication and adaptivity for the learning agents. In particular, this project will investigate a handful of collaborative learning models, including full communication, synchronized communication, synchronized communication with limited adaptivity, and asynchronized communication, and study the following general questions: (1) what is the fundamental advantage of allowing adaptivity in the parallel learning model; (2) are there inherent differences on the degree of parallelism between model-based and model-free reinforcement learning; (3) what is the impact of asynchronized communication; and (4) is it possible to communication-efficiently parallelize general algorithmic techniques in reinforcement learning? The team of researchers will address these questions by studying a set of core problems, including best arm(s) identification and regret minimization in multi-armed bandits, contextual bandits, finite-state Markov decision process (MDP) learning, reinforcement learning with function approximates, and coordinated exploration in MDPs. Through studying these questions, this project will bring new techniques, perspectives, and insight to communication-efficient parallel reinforcement learning. This project will also have a significant impact on a number of related research areas such as control theory, operations research, information theory and communication complexity, and multi-agent systems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习近年来取得了很大的研究进展，并在许多实际应用中取得了成功。然而，强化学习算法也有大规模应用需要大量数据和计算的名声。本项目将通过研究如何通过引入多个学习代理并允许它们收集数据并协作学习最优策略来使强化学习算法具有可扩展性这一重要问题来解决这一问题。该项目的结果将对大规模使用强化学习的许多领域产生影响，例如多阶段临床试验、培训自动驾驶算法、众包任务、定价和不同地点商店的分类优化。这些研究成果将通过在学术会议和研讨会、大学、工业实验室和在线媒体上的演讲进行传播，还将被整合到关于强化学习和大数据算法前沿的两门课程中。更严格地说，这个项目将研究如何解决学习代理在沟通和适应性方面的基本限制。具体地说，本项目将研究几种协作学习模型，包括完全通信、同步通信、有限自适应的同步通信和异步通信，并研究以下一般性问题：(1)并行学习模型中允许自适应的基本优势是什么；(2)基于模型的和非模型的强化学习在并行度上是否存在内在差异；(3)异步通信的影响是什么；以及(4)在强化学习中是否可以通信高效地并行化通用算法技术？研究团队将通过研究一系列核心问题来解决这些问题，包括多臂匪徒中的最佳ARM(S)识别和后悔最小化、上下文匪徒、有限状态马尔可夫决策过程学习、带函数逼近的强化学习以及MDP中的协调探索。通过对这些问题的研究，本项目将为通信高效的并行强化学习带来新的技术、视角和见解。该项目还将对控制理论、运筹学、信息理论和通信复杂性以及多智能体系统等相关研究领域产生重大影响。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Near-Optimal MNL Bandits Under Risk Criteria

风险标准下的近乎最优 MNL 强盗

DOI：
发表时间：
2021
期刊：
The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21
影响因子：
0
作者：
Xi, Guangyu;Tao, Chao;Zhou, Yuan
通讯作者：
Zhou, Yuan

Variance-Dependent Best Arm Identification

DOI：
发表时间：
2021-06
期刊：
ArXiv
影响因子：
0
作者：
P. Lu;Chao Tao;Xiaojin Zhang
通讯作者：
P. Lu;Chao Tao;Xiaojin Zhang

Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit

DOI：
10.1609/aaai.v36i7.20669
发表时间：
2020-12
期刊：
ArXiv
影响因子：
0
作者：
Nikolai Karpov;Qin Zhang
通讯作者：
Nikolai Karpov;Qin Zhang

Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control

DOI：
10.1109/ijcnn55064.2022.9892004
发表时间：
2022-07
期刊：
2022 International Joint Conference on Neural Networks (IJCNN)
影响因子：
0
作者：
Boli Fang;Zhenghao Peng;Hao Sun;Qin Zhang
通讯作者：
Boli Fang;Zhenghao Peng;Hao Sun;Qin Zhang

Communication-Efficient Collaborative Best Arm Identification

高效沟通的协作最佳手臂识别

DOI：
发表时间：
2023
期刊：
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-23
影响因子：
0
作者：
Nikolai Karpov, Qin Zhang
通讯作者：
Nikolai Karpov, Qin Zhang

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Qin Zhang其他文献

Receptor activity‐modifying protein 1 regulates the phenotypic expression of BMSCs via the Hippo/Yap pathway

受体活性-修饰蛋白1通过Hippo/Yap途径调节BMSCs的表型表达

DOI：
10.1002/jcp.28082
发表时间：
2019-08
期刊：
J Cell Physiol
影响因子：
0
作者：
Qin Zhang;Yanjun Guo;Hui Yu;Yufei Tang;Ying Yuan;Yixuan Jiang;Huilu Chen;Ping Gong;Lin Xiang
通讯作者：
Lin Xiang

The gut microbiota modulator berberine ameliorates collagen-induced arthritis in rats by facilitating the generation of butyrate and adjusting the intestinal hypoxia and nitrate supply

肠道微生物群调节剂小檗碱通过促进丁酸盐的产生并调节肠道缺氧和硝酸盐的供应来改善大鼠胶原诱导的关节炎

DOI：
10.1096/fj.201900425rr
发表时间：
2019
期刊：
The FASEB Journal
影响因子：
0
作者：
Mengfan Yue;Yu Tao;Yulai Fang;Xingpan Lian;Qin Zhang;Yufeng Xia;Zhifeng Wei;Yue Dai
通讯作者：
Yue Dai