权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

NSF-AoF: CNS Core: Small: Reinforcement Learning for Real-time Wireless Scheduling and Edge Caching: Theory and Algorithm Design

NSF-AoF：CNS 核心：小型：实时无线调度和边缘缓存的强化学习：理论和算法设计

基本信息

批准号：
2203239
负责人：
Junshan Zhang
金额：
$ 41.5万
依托单位：
University of California-Davis
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2203239&HistoricalAwards=false
关键词：
NSF AoF CNS Core Small

项目摘要

Recent years have witnessed a tremendous growth in real-time applications in wirelessly networked systems, such as connected cars and multi-user augmented reality (AR). Wireless edge caching is another emerging application requiring high bandwidth, where optimal caching decisions would depend on the cache contents and dynamic user demand profiles. To meet the explosive demand, 5G and Beyond (B5G) technology promises to offer enhanced mobile broadband (eMBB) and ultra-reliable low-latency communications (URLLC) services. Meeting URLLC requirements is very challenging in wireless networks, and requires massive modifications to the current wireless system design. Deadline-aware wireless scheduling of real-time traffic has been a long-standing open problem. This collaborative project makes a paradigm shift to tackle these challenges thus spurring a new line of thinking for QoS guarantee in terms of ultra-low latency and high bandwidth in a variety of IoT applications, including B5G, autonomous driving, augmented reality, smart health and smart city, benefiting both the US and Finland. The proposed research will also be integrated with education activities at the PIs' institutions for graduate, undergraduate, and K-12 students via curriculum development, research experiences, and outreach. This project leverages recent advances on offline reinforcement learning (RL) to study two important problems in B5G, namely 1) deadline-aware wireless scheduling to guarantee low latency and 2) edge caching to achieve high bandwidth content delivery. In Thrust 1, physics-aided offline RL will be devised to train deadline-aware scheduling policies. Specifically, the Actor-Critic (A-C) method will be used for offline training of scheduling policies, consisting of two phases: 1) initialization of Actor structure via behavioral cloning and 2) policy improvement via the physics-aided A-C method. With a good model-based scheduling algorithm as the initial actor structure, the A-C method can be leveraged to yield a better scheduling policy, thanks to its nature of policy improvement. Further, innovative algorithms will be devised to address the outstanding problems in the A-C method, namely overestimation bias and high variance, and Meta-RL will be used for adaptation to distribution shift in nonstationary network dynamics. Thrust 2 focuses on wireless edge caching, an application where the storage capacities at both the network edge and user devices are harnessed to alleviate the need of high-bandwidth communications over long distances. The combinatorial nature of joint communication and caching optimization herein, with the uncertainties of system dynamics, calls for non-trivial design of machine learning algorithms. The PIs will leverage RL to investigate wireless edge caching thoroughly.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

近年来，无线联网系统中的实时应用（例如联网汽车和多用户增强现实（AR））取得了巨大的增长。无线边缘缓存是另一种需要高带宽的新兴应用，其中最佳缓存决策将取决于该高速缓存内容和动态用户需求简档。为了满足爆炸性的需求，5G及以后（B5 G）技术有望提供增强的移动的宽带（eMBB）和超可靠的低延迟通信（URLLC）服务。满足URLLC要求在无线网络中是非常具有挑战性的，并且需要对当前无线系统设计进行大量修改。实时业务的截止时间感知无线调度一直是一个长期存在的开放问题。该合作项目实现了范式转变，以应对这些挑战，从而激发了各种物联网应用中超低延迟和高带宽方面的QoS保证的新思路，包括B5 G，自动驾驶，增强现实，智能健康和智能城市，使美国和芬兰受益。拟议的研究还将通过课程开发，研究经验和推广活动，与PI机构的研究生，本科生和K-12学生的教育活动相结合。该项目利用离线强化学习（RL）的最新进展来研究 B5 G中的两个重要问题，即1）最后期限感知无线调度以保证低延迟，以及2）边缘缓存以实现高带宽内容递送。在Thrust 1中，将设计物理辅助的离线RL来训练截止日期感知的调度策略。具体而言，Actor-Critic（A-C）方法将用于调度策略的离线训练，包括两个阶段：1）通过行为克隆初始化Actor结构，2）通过物理辅助的A-C方法改进策略。以一个好的基于模型的调度算法作为初始参与者结构，A-C方法可以被利用来产生更好的调度策略，这要归功于其策略改进的性质。此外，创新的算法将被设计来解决在A-C方法的突出问题，即高估偏差和高方差，和Meta-RL将被用于适应非平稳网络动态的分布转移。 Thrust 2专注于无线边缘缓存，这是一种利用网络边缘和用户设备的存储容量来缓解长距离高带宽通信需求的应用。本文中的联合通信和高速缓存优化的组合性质以及系统动态的不确定性要求机器学习算法的非平凡设计。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

DOI：
10.48550/arxiv.2302.04782
发表时间：
2023-02
期刊：
ArXiv
影响因子：
0
作者：
Sheng Yue;Guan Wang;Wei Shao;Zhaofeng Zhang;Sen Lin;Junkai Ren;Junshan Zhang
通讯作者：
Sheng Yue;Guan Wang;Wei Shao;Zhaofeng Zhang;Sen Lin;Junkai Ren;Junshan Zhang

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

DOI：
10.48550/arxiv.2306.11918
发表时间：
2023-06
期刊：
ArXiv
影响因子：
0
作者：
Hang Wang;Sen Lin;Junshan Zhang
通讯作者：
Hang Wang;Sen Lin;Junshan Zhang

MetaGater: Fast Learning of Conditional Channel Gated Networks via Federated Meta-Learning

DOI：
10.1109/mass52906.2021.00031
发表时间：
2020-11
期刊：
2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS)
影响因子：
0
作者：
Sen Lin;Li Yang;Zhezhi He;Deliang Fan;Junshan Zhang
通讯作者：
Sen Lin;Li Yang;Zhezhi He;Deliang Fan;Junshan Zhang

Scheduling Real-Time Wireless Traffic: A Network-Aided Offline Reinforcement Learning Approach