Robust and Efficient Model-based Reinforcement Learning
稳健高效的基于模型的强化学习
基本信息
- 批准号:EP/X03917X/1
- 负责人:
- 金额:$ 50.73万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Reinforcement learning (RL) is concerned with training data-driven agents to make decisions. In particular, an RL agent interacting with an environment needs to learn an optimal policy, i.e., which actions to take in different states to maximize its rewards. Recently, RL has become one of the most prominent areas of machine learning since RL methods can have tremendous potential in solving complex tasks across various fields (e.g., in autonomous driving, nuclear fusion, healthcare, hardware design, etc.). However, a number of challenges still stand in the way of its widespread adoption. Contemporary RL algorithms are often data-intensive and lack robustness guarantees. Established (deep) RL approaches require a vast amount of data that is readily available in some environments (e.g., in video games). This is often not the case with real-world tasks where data acquisition is costly. Another major challenge is to use the learned control policies in the real world while ensuring reliable, robust, and safe performance. This research aims to provide practical model-based RL algorithms with rigorous statistical and robustness guarantees. This is significant in safety-critical applications where obtaining data is expensive, e.g., in nuclear fusion, learning policies to control plasmas is performed via expensive simulators. The key novelty will be to incorporate the versatile robustness aspects into model-based RL allowing for its broad application across different applications and domains.This project focuses on designing algorithms that make use of powerful non-linear statistical models to learn about the world and can tackle large state spaces present in modern RL tasks. The focus is on obtaining near-optimal policies that are robust against distributional shifts in the environmental dynamics, (adversarial) data corruptions/outliers, and satisfy application-dependent safety constraints during exploration. A major contribution will be novel rigorous statistical sample complexity guarantees for designed algorithms that characterize convergence to optimal robust and safe policies. The obtained guarantees will be efficient in the sense of being independent of the number of states, and hence applicable to complex applications. This will require designing new robust estimators and confidence intervals for popular statistical models. Moreover, the project will result in an entire testbed with distributional shifts and attacking strategies that will be provided to benchmark the robustness of standard and novel robust RL algorithms. This project will be among the first contribution to achieving both robustness and efficiency in MBRL by providing practical algorithms that can be readily applied to emerging impactful real-world tasks such as robust control of nuclear plasmas (an exciting and promising path toward sustainable energy) and efficient discovery of system-on-chip designs.
强化学习(RL)涉及训练数据驱动的代理来做出决策。特别地,与环境交互的RL代理需要学习最优策略,即,在不同的状态下采取哪些行动以最大化其回报。最近,RL已经成为机器学习最突出的领域之一,因为RL方法在解决各个领域的复杂任务方面具有巨大的潜力(例如,自动驾驶、核聚变、医疗保健、硬件设计等)。然而,一些挑战仍然阻碍着它的广泛采用。当代RL算法通常是数据密集型的,缺乏鲁棒性保证。已建立的(深度)RL方法需要大量数据,这些数据在某些环境中很容易获得(例如,在视频游戏中)。对于数据获取成本高昂的现实任务来说,情况往往并非如此。另一个主要挑战是在真实的世界中使用学习的控制策略,同时确保可靠、健壮和安全的性能。本研究旨在提供实用的基于模型的强化学习算法,具有严格的统计和鲁棒性保证。这在获取数据昂贵的安全关键应用中是重要的,例如,在核聚变中,控制等离子体的学习策略是通过昂贵的模拟器来执行的。关键的新奇将是将多功能的鲁棒性方面纳入基于模型的强化学习,使其在不同的应用和领域中得到广泛的应用。该项目的重点是设计算法,利用强大的非线性统计模型来了解世界,并可以处理现代强化学习任务中存在的大状态空间。重点是获得接近最优的政策,对环境动态的分布变化,(对抗)数据损坏/离群值,并满足应用程序依赖的安全约束在勘探过程中是强大的。一个主要的贡献将是新的严格的统计样本的复杂性保证设计的算法,收敛到最佳的鲁棒性和安全的政策。所得到的保证将是有效的意义上的独立的状态的数量,因此适用于复杂的应用。这将需要为流行的统计模型设计新的稳健估计量和置信区间。此外,该项目将产生一个完整的测试平台,其中包含分布偏移和攻击策略,用于对标准和新型鲁棒RL算法的鲁棒性进行基准测试。该项目将通过提供实用的算法来实现MBRL的鲁棒性和效率,这些算法可以很容易地应用于新兴的有影响力的现实世界任务,例如核等离子体的鲁棒控制(一条令人兴奋和有前途的可持续能源之路)和片上系统设计的有效发现。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ilija Bogunovic其他文献
Robust Best-arm Identification in Linear Bandits
线性强盗中的鲁棒最佳臂识别
- DOI:
10.48550/arxiv.2311.04731 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Wei Wang;Sattar Vakili;Ilija Bogunovic - 通讯作者:
Ilija Bogunovic
On Actively Teaching the Crowd to Classify
论积极教导群众分类
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
A. Singla;Ilija Bogunovic;Gábor Bartók;Amin Karbasi;A. Krause - 通讯作者:
A. Krause
Robust Adaptive Decision Making: Bayesian Optimization and Beyond
- DOI:
10.5075/epfl-thesis-9147 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Ilija Bogunovic - 通讯作者:
Ilija Bogunovic
A distributed algorithm for partitioned robust submodular maximization
一种用于分区鲁棒子模最大化的分布式算法
- DOI:
10.1109/camsap.2017.8313155 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Ilija Bogunovic;Slobodan Mitrovic;J. Scarlett;V. Cevher - 通讯作者:
V. Cevher
Robust Protection of Networks against Cascading Phenomena
- DOI:
10.3929/ethz-a-007580645 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Ilija Bogunovic - 通讯作者:
Ilija Bogunovic
Ilija Bogunovic的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Adaptive and Efficient Robot Positioning Through Model and Task Fusion
通过模型和任务融合实现自适应且高效的机器人定位
- 批准号:
DE240100149 - 财政年份:2024
- 资助金额:
$ 50.73万 - 项目类别:
Discovery Early Career Researcher Award
Tractable human distal lung organoid model as a new efficient tool to study mesenchymal-epithelial interactions in COPD
易处理的人远端肺类器官模型作为研究慢性阻塞性肺病间充质-上皮相互作用的新有效工具
- 批准号:
NC/Y500641/1 - 财政年份:2024
- 资助金额:
$ 50.73万 - 项目类别:
Training Grant
CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware
职业:通过协同设计进行高效的大型语言模型推理:适应性软件分区和基于 FPGA 的分布式硬件
- 批准号:
2339084 - 财政年份:2024
- 资助金额:
$ 50.73万 - 项目类别:
Continuing Grant
CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
- 批准号:
2340011 - 财政年份:2024
- 资助金额:
$ 50.73万 - 项目类别:
Standard Grant
CAREER: New data integration approaches for efficient and robust meta-estimation, model fusion and transfer learning
职业:新的数据集成方法,用于高效、稳健的元估计、模型融合和迁移学习
- 批准号:
2337943 - 财政年份:2024
- 资助金额:
$ 50.73万 - 项目类别:
Continuing Grant
Collaborative Research: FMitF: Track I: DeepSmith: Scheduling with Quality Guarantees for Efficient DNN Model Execution
合作研究:FMitF:第一轨:DeepSmith:为高效 DNN 模型执行提供质量保证的调度
- 批准号:
2349461 - 财政年份:2023
- 资助金额:
$ 50.73万 - 项目类别:
Standard Grant
Collaborative Research: III: Small: Efficient and Robust Multi-model Data Analytics for Edge Computing
协作研究:III:小型:边缘计算的高效、稳健的多模型数据分析
- 批准号:
2311596 - 财政年份:2023
- 资助金额:
$ 50.73万 - 项目类别:
Standard Grant
Collaborative Research: III: Small: Efficient and Robust Multi-model Data Analytics for Edge Computing
协作研究:III:小型:边缘计算的高效、稳健的多模型数据分析
- 批准号:
2311598 - 财政年份:2023
- 资助金额:
$ 50.73万 - 项目类别:
Standard Grant
CoolCows - developing and demonstrating a new model using genomics and IVF to rapidly breed more methane-efficient, sustainable cattle
CoolCows - 开发并展示一种使用基因组学和 IVF 的新模型,以快速培育更高效、可持续的甲烷牛
- 批准号:
10078984 - 财政年份:2023
- 资助金额:
$ 50.73万 - 项目类别:
Responsive Strategy and Planning
Collaborative Research: III: Small: Efficient and Robust Multi-model Data Analytics for Edge Computing
协作研究:III:小型:边缘计算的高效、稳健的多模型数据分析
- 批准号:
2311597 - 财政年份:2023
- 资助金额:
$ 50.73万 - 项目类别:
Standard Grant