CIF: Small: Compression Schemes for Communication Constrained Bandit and Reinforcement Learning

CIF：小：通信受限强盗和强化学习的压缩方案

基本信息

批准号：
2221871
负责人：
Lin Yang
金额：
$ 60万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2025-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2221871&HistoricalAwards=false
关键词：
CIF Small Compression Schemes Communication

项目摘要

Active learning and online learning are machine-learning paradigms in which computers learn to make complex decisions while receiving feedback from an environment. For instance, a drone may learn to fly by itself, or a car may learn to drive by trial and error. Recently, these learning paradigms have been widely applied and have achieved phenomenal successes with human-level performance in tasks like gameplay or robot control. As computing devices become smaller and less power-consuming, new distributed learning frameworks start to emerge. These frameworks contain low-capability learning agents (such as cell phones, unmanned vehicles, or drones) that are far apart but perform learning collectively by communicating with each other through (wireless) networks. However, existing communication approaches would become bottlenecks for learning since they were designed for high-power computers and consume too much power and network bandwidth. This project aims to address this issue by providing novel techniques that efficiently compress data to be communicated while preserving the learning ability. The techniques developed in this project will advance the state-of-the-art in distributed online/active learning by improving communication efficiencies. The overarching goal of this project is to establish efficient compression schemes that support effective active/online learning, such as bandit and reinforcement learning over communication-constrained networks. In these learning environments, a learner aims to make a good decision for the next steps based on experience; this project will explore fundamental bounds and efficient algorithms that support this goal while minimizing the number of bits communicated - by compressing in a way that only retains the necessary information for decision making. In other words, this project aims to explore the fundamental trade-off between compression and learnability in active/online environments. Building on promising preliminary work, the investigators will study problems ranging from the most basic multi-arm bandit setting to more complex reinforcement learning settings and consider both centralized and decentralized network topologies. More specifically, the investigators propose compression schemes and fundamental theoretical bounds for (1) rewards in multi-armed bandit problems, (2) context vectors for contextual bandit problems, and (3) state-action features and models for Markov decision problems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

积极学习和在线学习是机器学习范式，其中计算机学会在从环境中收到反馈的同时做出复杂的决策。例如，无人机可以学会自行飞行，或者汽车可能会通过反复试验学习开车。最近，这些学习范式已被广泛应用，并在游戏玩法或机器人控制等任务中获得了人级的表现，取得了惊人的成功。随着计算设备变得较小且功能减少，新的分布式学习框架开始出现。这些框架包含较低的能力学习代理（例如手机，无人驾驶汽车或无人机），但通过（无线）网络相互交流，可以集体进行学习。但是，现有的通信方法将成为学习的瓶颈，因为它们是为高功率计算机设计的，并且消耗了太多的功率和网络带宽。该项目旨在通过提供新技术来解决此问题，这些新技术在保留学习能力时有效地压缩要传达的数据。该项目中开发的技术将通过提高沟通效率来推进分布式在线/主动学习的最新技术。该项目的总体目标是建立有效的有效的压缩方案，以支持有效的主动/在线学习，例如强盗和对沟通受限的网络的强化学习。在这些学习环境中，学习者旨在根据经验为下一步做出一个很好的决定。该项目将探索基本的界限和有效的算法，这些算法支持该目标，同时最大程度地减少传达的位数 - 通过仅保留必要信息进行决策的方式来压缩。换句话说，该项目旨在探讨在活动/在线环境中的压缩与可学习性之间的基本权衡。在有希望的初步工作的基础上，研究人员将研究从最基本的多臂匪徒设置到更复杂的强化学习环境的问题，并考虑集中和分散的网络拓扑。更具体地说，研究人员提出了（1）多军匪徒问题的奖励的压缩方案和基本理论界限，（（2）上下文的上下文匪徒问题的上下文向量，以及（3）马尔可夫决策问题的国家行动特征和模型。该奖项反映了NSF的法定任务，并通过评估范围的范围来反映了范围的范围，并通过评估了范围的范围。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Near-Optimal Sample Complexity Bounds for Constrained MDPs

受限 MDP 的近乎最优样本复杂度界限

DOI：
发表时间：
2022
期刊：
Advances in neural information processing systems
影响因子：
0
作者：
Vaswani, Sharan;Yang, Lin;Szepesvári, Csaba
通讯作者：
Szepesvári, Csaba

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

DOI：
10.48550/arxiv.2304.08944
发表时间：
2023-04
期刊：
ArXiv
影响因子：
0
作者：
Dingwen Kong;Lin F. Yang
通讯作者：
Dingwen Kong;Lin F. Yang

PROVABLY EFFICIENT LIFELONG REINFORCEMENT LEARNING WITH LINEAR REPRESENTATION

具有线性表示的可证明有效的终身强化学习

DOI：
发表时间：
2023
期刊：
ICLR
影响因子：
0
作者：
Amani, Sanae;Yang, Lin;Cheng, Ching-An
通讯作者：
Cheng, Ching-An

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

DOI：
10.48550/arxiv.2306.09554
发表时间：
2023-06
期刊：
ArXiv
影响因子：
0
作者：
Yunfan Li-;Yiran Wang-;Y. Cheng;Lin F. Yang
通讯作者：
Yunfan Li-;Yiran Wang-;Y. Cheng;Lin F. Yang

Horizon-Free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds

马尔可夫决策过程和博弈的无地平线学习：随机有界奖励和改进界限

DOI：
发表时间：
2023
期刊：
Proceedings of Machine Learning Research
影响因子：
0
作者：
Li, Shengshi;Yang, Lin
通讯作者：
Yang, Lin

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Lin Yang其他文献

Discharge Behavior and Morphological Characteristics of Suspended Water-Drop on Shed Edge during Rain Flashover of Polluted Large-Diameter Post Insulator

污秽大直径支柱绝缘子雨闪时伞边悬浮水滴放电行为及形态特征

DOI：
10.3390/en14061652
发表时间：
2021-03
期刊：
Energies
影响因子：
3.2
作者：
Yifan Liao;Qiao Wang;Lin Yang;Zhiqiang Kuang;Yanpeng Hao;Chuyan Zhang
通讯作者：
Chuyan Zhang

DAWE: A Double Attention-Based Word Embedding Model with Sememe Structure Information

DAWE：具有义原结构信息的基于双重注意力的词嵌入模型

DOI：
10.3390/app10175804
发表时间：
2020
期刊：
Appl. Sci.
影响因子：
0
作者：
Shengwen Li;Renyao Chen;Bo Wan;Junfang Gong;Lin Yang;Hong Yao
通讯作者：
Hong Yao

Microglial AIM2 alleviates antiviral‐related neuro‐inflammation in mouse models of Parkinson's disease

小胶质细胞 AIM2 减轻帕金森病小鼠模型中抗病毒相关的神经炎症

DOI：
10.1002/glia.24260
发表时间：
2022-08
期刊：
Wiley
影响因子：
0
作者：
Wen‐Juan Rui;Sheng Li;Lin Yang;Ying Liu;Yi Fan;Ying‐Chao Hu;Chun‐Mei Ma;Bing‐Wei Wang;Jing‐Ping Shi
通讯作者：
Jing‐Ping Shi

Synthesis, structures and anticancer potentials of five platinum(II) complexes with benzothiazole-benzopyran targeting mitochondria

五种铂(II)铂(II)配合物与苯并噻唑-苯并吡喃靶向线粒体的合成、结构和抗癌潜力

DOI：
10.1016/j.poly.2020.115004
发表时间：
2021-03
期刊：
Polyhedron
影响因子：
2.6
作者：
Qing-Min Wei;Zu-Zhuang Wei;Jia-Jing Zeng;Lin Yang;Qi-Pin Qin;Ming-Xiong Tan;Hong Liang
通讯作者：
Hong Liang