权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Lyapunov Drift Methods for Stochastic Recursions: Applications in Cloud Computing and Reinforcement Learning

职业：随机递归的李亚普诺夫漂移方法：云计算和强化学习中的应用

基本信息

批准号：
2144316
负责人：
Siva Theja Maguluri
金额：
$ 50万
依托单位：
Georgia Tech Research Corporation
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-05-01 至 2027-04-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2144316&HistoricalAwards=false
关键词：
CAREER Lyapunov Drift Methods Stochastic

项目摘要

Part I:The ongoing Artificial Intelligence revolution is possible due to progresses in two distinct areas. The first is the development of novel algorithms in machine learning paradigms such as Reinforcement Learning, that overcome long-standing challenges; the second is the breakthroughs in cloud computing infrastructure based on large data centers that enables one to collect, store and process large amounts of data very easily and at a short notice. In spite of tremendous success stories in both these areas, fundamental trade-offs and optimal performance is not understand and theory lags far behind practice. In spite of seeming to be very distinct problems, both Reinforcement Learning and Cloud computing can be studied using stochastic recursions. The goal of this CAREER project is to take a unified theoretical viewpoint of both these seemingly distinct areas first developing a general theory of stochastic recursions, and then to use it to study both Reinforcement Learning and Cloud computing. In particular, we will use the theory to develop novel learning algorithms with provably optimal sample complexity across various paradigms such as off-policy learning and actor-critic framework. The theory of stochastic recursions as well as the novel learning algorithms will also be used to develop optimal scheduling algorithms for cloud computing data centers that minimize the tail of delay experienced by the users. The novel algorithms developed during the course of this project will be implemented through collaborations with partners in industry as well as at Georgia Tech’s internal cloud. A Jupyter based open source RL simulation platform will be developed, and the novel algorithms developed during the course of this project will be included in this platform. The platform is used not only in dissemination of the outcome of this project, but also for undergraduate research projects, course projects for a new course on Reinforcement learning, and for STEM outreach activities to K-12 education. In addition to dissemination of research results through conferences and journal publications, we will develop a novel special topics course, and bring out a monograph on the unified Lyapunov framework for stochastic recursions. In addition, training of graduate and undergraduate students forms a core part of the project with special emphasis on mentoring future faculty. Part 2: Intellectual Merit:The proposed work is organized into three interdependent thrusts. Thrust I builds a Lyapunov theory of stochastic recursions, where we obtain finite-time mean square error and exponential tail bounds, as well as characterize the steady-state limiting distribution for a broad class of stochastic recursions. This thrust forms the foundation for the next two thrusts.Thrust II studies the finite-time mean-square bounds, tail probability bounds (aka PAC bounds), sample complexity, and steady-state behavior of RL algorithms under three paradigms, viz., off-policy RL, two time-scale policy space algorithms (such as actor-critic) and average reward RL, and develops novel, fast, RL algorithms with near optimal sample efficiency. Thrust-III studies scheduling problems in data center networks, with the goal of minimizing mean delay and delay tails. Using the Lyapunov theory from Thrust I, we develop novel low complexity algorithms with provable guarantees on steady-state delay in the heavy-traffic asymptotic regime. With these as initial policies, we will deploy RL algorithms from Thrust II to learn new scheduling policies that are optimal even in the preasymptotic regime, which is of practical interest. All the proposed algorithms will be evaluated using real world traffic traces through our collaborations with industry partners. Broader Impacts:The proposed work, and the PI’s ongoing industry collaborations have potential for significant societal impact by making RL and cloud computing more efficient. The proposed Lyapunov theory for Stochastic Recursions is applicable in many other disciplines. And so, the PI will disseminate it widely through a special topics course, a monograph, and tutorials, in addition to conference and journal publications. The project integrates research with educational activities at every level. A Jupyter based RL simulation platform and a library of notebooks that we will build, will serve as an extensive pedagogical resource for these activities. The PI will continue his ongoing involvement in undergraduate research through the REU program and the VIP program at Georgia Tech. In order to fulfill a growing demand, the PI will develop a new interdisciplinary undergraduate level RL course and extensively use the RL simulation platform. To promote STEM activities, the PI will take part in outreach activities to local high schools working with an academic professional in ISyE and will mentor high school teachers through the GIFT program. To support Ph.D. students interested in academic career, the PI runs a future faculty mentorship program. The PI is committed to broadening participation, and currently advises a female Hispanic student, and has advised several URM undergraduate students.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

第一部分：由于两个不同领域的进展，正在进行的人工智能革命是可能的。首先是机器学习范式（如强化学习）中新算法的发展，克服了长期存在的挑战；二是基于大型数据中心的云计算基础设施的突破，使人们能够非常容易地在短时间内收集、存储和处理大量数据。尽管在这两个领域都有巨大的成功案例，但人们并不了解基本的权衡和最佳性能，理论远远落后于实践。尽管看起来是非常不同的问题，但强化学习和云计算都可以使用随机递归进行研究。这个CAREER项目的目标是对这两个看似不同的领域采取统一的理论观点，首先发展随机递归的一般理论，然后将其用于研究强化学习和云计算。特别是，我们将使用该理论开发新的学习算法，这些算法在各种范式（如off-policy学习和行动者-评论家框架）中具有可证明的最佳样本复杂性。随机递归理论以及新的学习算法也将用于开发云计算数据中心的最佳调度算法，以最大限度地减少用户所经历的延迟尾部。在这个项目过程中开发的新算法将通过与行业合作伙伴以及佐治亚理工学院的内部云的合作来实施。将开发一个基于Jupyter的开源RL仿真平台，在此项目过程中开发的新算法将包含在该平台中。该平台不仅用于本项目成果的传播，还用于本科研究项目、强化学习新课程的课程项目，以及面向K-12教育的STEM外展活动。除了通过会议和期刊出版物传播研究成果外，我们还将开发一门新的专题课程，并推出一本关于随机递归统一Lyapunov框架的专著。此外，研究生和本科生的培训是该项目的核心部分，特别强调指导未来的教师。第2部分：知识价值：建议的工作被组织成三个相互依赖的重点。Thrust I建立了随机递归的Lyapunov理论，在该理论中，我们获得了有限时间均方误差和指数尾界，并表征了一类广泛的随机递归的稳态极限分布。这个逆冲形成了接下来两个逆冲的基础。Thrust II研究了非策略RL、双时间尺度策略空间算法（如actor-critic）和平均奖励RL三种范式下RL算法的有限时间均方边界、尾部概率边界（即PAC边界）、样本复杂度和稳态行为，并开发了新颖、快速、样本效率接近最优的RL算法。推力- iii研究数据中心网络中的调度问题，目标是最小化平均延迟和延迟尾。利用Thrust I的Lyapunov理论，我们开发了一种新的低复杂度算法，该算法在大流量渐近状态下具有可证明的稳态延迟保证。将这些作为初始策略，我们将部署来自Thrust II的RL算法来学习即使在预渐近状态下也是最优的新调度策略，这是具有实际意义的。我们将通过与行业合作伙伴的合作，利用真实世界的交通轨迹对所有提出的算法进行评估。更广泛的影响：通过提高RL和云计算的效率，提议的工作和PI正在进行的行业合作有可能产生重大的社会影响。随机递归的李雅普诺夫理论也适用于许多其他学科。因此，PI将通过专题课程、专著和教程，以及会议和期刊出版物，广泛传播它。该项目将研究与各级教育活动相结合。我们将建立一个基于Jupyter的强化学习模拟平台和一个笔记本库，作为这些活动的广泛教学资源。PI将通过佐治亚理工学院的REU项目和VIP项目继续参与本科研究。为了满足不断增长的需求，PI将开发一个新的跨学科本科水平的RL课程，并广泛使用RL仿真平台。为了促进STEM活动，PI将与ISyE的学术专家一起参加当地高中的外展活动，并将通过GIFT计划指导高中教师。为了支持对学术生涯感兴趣的博士生，PI运行了一个未来教师指导计划。PI致力于扩大参与，目前为一名西班牙裔女学生提供咨询，并为几名URM本科生提供咨询。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。