CAREER: Lyapunov Drift Methods for Stochastic Recursions: Applications in Cloud Computing and Reinforcement Learning
职业:随机递归的李亚普诺夫漂移方法:云计算和强化学习中的应用
基本信息
- 批准号:2144316
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-01 至 2027-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Part I:The ongoing Artificial Intelligence revolution is possible due to progresses in two distinct areas. The first is the development of novel algorithms in machine learning paradigms such as Reinforcement Learning, that overcome long-standing challenges; the second is the breakthroughs in cloud computing infrastructure based on large data centers that enables one to collect, store and process large amounts of data very easily and at a short notice. In spite of tremendous success stories in both these areas, fundamental trade-offs and optimal performance is not understand and theory lags far behind practice. In spite of seeming to be very distinct problems, both Reinforcement Learning and Cloud computing can be studied using stochastic recursions. The goal of this CAREER project is to take a unified theoretical viewpoint of both these seemingly distinct areas first developing a general theory of stochastic recursions, and then to use it to study both Reinforcement Learning and Cloud computing. In particular, we will use the theory to develop novel learning algorithms with provably optimal sample complexity across various paradigms such as off-policy learning and actor-critic framework. The theory of stochastic recursions as well as the novel learning algorithms will also be used to develop optimal scheduling algorithms for cloud computing data centers that minimize the tail of delay experienced by the users. The novel algorithms developed during the course of this project will be implemented through collaborations with partners in industry as well as at Georgia Tech’s internal cloud. A Jupyter based open source RL simulation platform will be developed, and the novel algorithms developed during the course of this project will be included in this platform. The platform is used not only in dissemination of the outcome of this project, but also for undergraduate research projects, course projects for a new course on Reinforcement learning, and for STEM outreach activities to K-12 education. In addition to dissemination of research results through conferences and journal publications, we will develop a novel special topics course, and bring out a monograph on the unified Lyapunov framework for stochastic recursions. In addition, training of graduate and undergraduate students forms a core part of the project with special emphasis on mentoring future faculty. Part 2: Intellectual Merit:The proposed work is organized into three interdependent thrusts. Thrust I builds a Lyapunov theory of stochastic recursions, where we obtain finite-time mean square error and exponential tail bounds, as well as characterize the steady-state limiting distribution for a broad class of stochastic recursions. This thrust forms the foundation for the next two thrusts.Thrust II studies the finite-time mean-square bounds, tail probability bounds (aka PAC bounds), sample complexity, and steady-state behavior of RL algorithms under three paradigms, viz., off-policy RL, two time-scale policy space algorithms (such as actor-critic) and average reward RL, and develops novel, fast, RL algorithms with near optimal sample efficiency. Thrust-III studies scheduling problems in data center networks, with the goal of minimizing mean delay and delay tails. Using the Lyapunov theory from Thrust I, we develop novel low complexity algorithms with provable guarantees on steady-state delay in the heavy-traffic asymptotic regime. With these as initial policies, we will deploy RL algorithms from Thrust II to learn new scheduling policies that are optimal even in the preasymptotic regime, which is of practical interest. All the proposed algorithms will be evaluated using real world traffic traces through our collaborations with industry partners. Broader Impacts:The proposed work, and the PI’s ongoing industry collaborations have potential for significant societal impact by making RL and cloud computing more efficient. The proposed Lyapunov theory for Stochastic Recursions is applicable in many other disciplines. And so, the PI will disseminate it widely through a special topics course, a monograph, and tutorials, in addition to conference and journal publications. The project integrates research with educational activities at every level. A Jupyter based RL simulation platform and a library of notebooks that we will build, will serve as an extensive pedagogical resource for these activities. The PI will continue his ongoing involvement in undergraduate research through the REU program and the VIP program at Georgia Tech. In order to fulfill a growing demand, the PI will develop a new interdisciplinary undergraduate level RL course and extensively use the RL simulation platform. To promote STEM activities, the PI will take part in outreach activities to local high schools working with an academic professional in ISyE and will mentor high school teachers through the GIFT program. To support Ph.D. students interested in academic career, the PI runs a future faculty mentorship program. The PI is committed to broadening participation, and currently advises a female Hispanic student, and has advised several URM undergraduate students.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
第一部分:由于两个不同领域的进展,正在进行的人工智能革命是可能的。第一个是在机器学习范例中开发新算法,如强化学习,克服了长期存在的挑战;第二个是基于大型数据中心的云计算基础设施的突破,使人们能够在短时间内轻松收集,存储和处理大量数据。尽管在这两个领域取得了巨大的成功,但基本的权衡和最佳性能并不被理解,理论远远落后于实践。尽管看起来是非常不同的问题,但强化学习和云计算都可以使用随机递归进行研究。这个CAREER项目的目标是对这两个看似不同的领域采取统一的理论观点,首先开发随机递归的一般理论,然后用它来研究强化学习和云计算。特别是,我们将使用该理论来开发新的学习算法,这些算法在各种范式中具有可证明的最佳样本复杂度,例如离线学习和演员-评论家框架。随机递归理论以及新的学习算法也将用于开发云计算数据中心的最佳调度算法,最大限度地减少用户所经历的延迟尾部。该项目过程中开发的新算法将通过与行业合作伙伴以及格鲁吉亚科技公司内部云的合作来实现。将开发一个基于Linux的开源RL仿真平台,在这个项目的过程中开发的新算法将被包含在这个平台中。该平台不仅用于传播该项目的成果,还用于本科生研究项目,强化学习新课程的课程项目,以及K-12教育的STEM推广活动。除了通过会议和期刊出版物传播研究成果外,我们还将开发一个新的专题课程,并推出一本关于随机递归的统一李雅普诺夫框架的专著。此外,研究生和本科生的培训形成了该项目的核心部分,特别强调指导未来的教师。第2部分:知识价值:拟议的工作被组织成三个相互依存的推力。推力我建立了一个随机递归的李雅普诺夫理论,在那里我们获得有限时间均方误差和指数尾界,以及表征的稳态极限分布的一类广泛的随机递归。Thrust II研究了RL算法在三种范式下的有限时间均方界、尾概率界(又称PAC界)、样本复杂度和稳态行为,即,非策略强化学习,两个时间尺度的策略空间算法(如演员评论家)和平均奖励强化学习,并开发新的,快速,强化学习算法与接近最佳的样本效率。Thrust-III研究数据中心网络中的调度问题,目标是最小化平均延迟和延迟尾部。使用李雅普诺夫理论的推力我,我们开发了新的低复杂度的算法,可证明的保证在繁忙的交通渐近制度的稳态延迟。以这些作为初始策略,我们将部署推力II的RL算法来学习新的调度策略,这些策略即使在前渐近状态下也是最优的,这是有实际意义的。所有提出的算法都将通过我们与行业合作伙伴的合作,使用真实的世界交通轨迹进行评估。更广泛的影响:拟议的工作,以及PI正在进行的行业合作,通过提高RL和云计算的效率,有可能产生重大的社会影响。所提出的随机递归的李雅普诺夫理论适用于许多其他学科。因此,除了会议和期刊出版物外,PI还将通过专题课程,专著和教程广泛传播。该项目将研究与各级教育活动结合起来。我们将建立一个基于机器学习的RL模拟平台和一个笔记本图书馆,作为这些活动的广泛教学资源。PI将通过REU计划和格鲁吉亚理工学院的VIP计划继续参与本科研究。为了满足不断增长的需求,PI将开发一个新的跨学科本科水平的RL课程,并广泛使用RL仿真平台。为了促进STEM活动,PI将与ISyE的学术专业人员一起参加当地高中的外联活动,并将通过GIFT计划指导高中教师。为了支持博士学位对学术生涯感兴趣的学生,PI运行一个未来的教师导师计划。PI致力于扩大参与,目前为一名西班牙裔女学生提供咨询,并为几名URM本科生提供咨询。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Logarithmic heavy traffic error bounds in generalized switch and load balancing systems
广义交换机和负载平衡系统中的对数大流量错误界限
- DOI:10.1017/jpr.2021.82
- 发表时间:2022
- 期刊:
- 影响因子:1
- 作者:Hurtado-Lange, Daniela;Varma, Sushil Mahavir;Maguluri, Siva Theja
- 通讯作者:Maguluri, Siva Theja
Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling
联合强化学习:马尔可夫采样下的线性加速
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Khodadadian, Sajad;Sharma, Pranay;Joshi, Gauri;Maguluri Siva Theja
- 通讯作者:Maguluri Siva Theja
Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
- DOI:10.1016/j.automatica.2022.110623
- 发表时间:2019-05
- 期刊:
- 影响因子:0
- 作者:Zaiwei Chen;Sheng Zhang;Thinh T. Doan;J. Clarke;S. T. Maguluri
- 通讯作者:Zaiwei Chen;Sheng Zhang;Thinh T. Doan;J. Clarke;S. T. Maguluri
Power-of-? Choices Load Balancing in the Sub-Halfin Whitt Regime
的力量-?
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Varma, Sushil M.;Castro, Francisco;Maguluri, Siva Theja
- 通讯作者:Maguluri, Siva Theja
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Siva Theja Maguluri其他文献
Siva Theja Maguluri的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Siva Theja Maguluri', 18)}}的其他基金
Two-sided Queues and Networked Matching Platforms
双边队列和网络化撮合平台
- 批准号:
2140534 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CRII: CIF: Resource Allocation in Data Center Networks: Algorithms, Fundamental Limits and Performance Bounds
CRII:CIF:数据中心网络中的资源分配:算法、基本限制和性能界限
- 批准号:
1850439 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CIF: Small: Collaborative Research: Analytics on Edge-labeled Hypergraphs: Limits to De-anonymization
CIF:小型:协作研究:边缘标记超图分析:去匿名化的限制
- 批准号:
1944993 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
相似国自然基金
向量Lyapunov函数架构下分数阶随机神经网络稳定性策略研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
大规模广义Lyapunov方程的快速算法及相关预处理技术研究
- 批准号:12361080
- 批准年份:2023
- 资助金额:27 万元
- 项目类别:地区科学基金项目
基于随机共振和Lyapunov指数结合的复杂环境下管道导波微小缺陷定量研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
基于非线性局部Lyapunov指数方法的台湾海峡台风观测布局研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于触发机制相关Lyapunov函数的混合驱动间歇控制系统分析与综合
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于Lyapunov-like函数的不确定切换系统的鲁棒吸引域计算
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于触发机制相关Lyapunov函数的混合驱动间歇控制系统分析与综合
- 批准号:62203412
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
腔QED系统中基于Lyapunov控制的量子态操控
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于非严格Lyapunov泛函技术的时标时滞系统的稳定性研究
- 批准号:62003195
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
切换系统输入输出稳定性分析的不定Lyapunov函数方法研究
- 批准号:2020JJ5990
- 批准年份:2020
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
Safe Lyapunov-Based Deep Neural Network Adaptive Control of a Rehabilitative Upper Extremity Hybrid Exoskeleton
基于安全李亚普诺夫的深度神经网络自适应控制康复上肢混合外骨骼
- 批准号:
2230971 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Parametric deformation of control Lyapunov function for nonlinear systems and its applications
非线性系统控制Lyapunov函数的参数变形及其应用
- 批准号:
23H01430 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Efficient Learning of Equilibria in Dynamic Bayesian Games with Nash, Bellman and Lyapunov
职业生涯:与纳什、贝尔曼和李亚普诺夫一起有效学习动态贝叶斯博弈中的均衡
- 批准号:
2238838 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Collaborative Research: Data-driven Power Systems Control with Stability Guarantee: A Lyapunov Approach
合作研究:具有稳定性保证的数据驱动电力系统控制:李亚普诺夫方法
- 批准号:
2200692 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Dynamical systems with observable Lyapunov irregular sets
具有可观测李亚普诺夫不规则集的动力系统
- 批准号:
22K03342 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Complex Dynamics of Large Systems with Long-Range Interactions: New Insights from Covariant Lyapunov Vectors
具有长程相互作用的大型系统的复杂动力学:来自协变 Lyapunov 向量的新见解
- 批准号:
2138055 - 财政年份:2022
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Lyapunov Exponents and Spectral Properties of Aperiodic Structures
非周期结构的李亚普诺夫指数和谱性质
- 批准号:
EP/S010335/1 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
Research Grant
Lyapunov Vector Field-Based Guidance law for Spacecraft Motion Synchronization
航天器运动同步的李亚普诺夫矢量场制导律
- 批准号:
539314-2019 - 财政年份:2019
- 资助金额:
$ 50万 - 项目类别:
University Undergraduate Student Research Awards
Lyapunov exponents and invariant measures of deterministic maps and random maps
确定性映射和随机映射的李雅普诺夫指数和不变测度
- 批准号:
526905-2018 - 财政年份:2018
- 资助金额:
$ 50万 - 项目类别:
University Undergraduate Student Research Awards