权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust and Sample Efficient Reinforcement Learning

鲁棒且样本高效的强化学习

基本信息

批准号：
RGPIN-2019-05014
负责人：
Poupart, Pascal
金额：
$ 4.01万
依托单位：
University of Waterloo
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=690617
关键词：
Robust Sample Efficient Reinforcement Learning

项目摘要

Reinforcement Learning (RL) is arguably one of the most comprehensive forms of machine learning. It facilitates active learning and it allows a system to learn over an extended period of time about its environment as it makes a sequence of decisions. The system can also learn from weak signals that might be delayed. This is particularly useful in robotics, autonomous vehicles, conversational agents, game playing, operations research, automated trading, non-myopic recommender systems and self-managing networks. The generality of reinforcement learning also makes it complex and therefore challenging algorithmically.******Objectives: The goal of this work is to develop algorithms to improve the robustness and sample efficiency of reinforcement learning. Tremendous progress has been achieved in recent years by deep reinforcement learning techniques that scale to high dimensional inputs (e.g., images and natural language) and complex tasks. However, most of the successes are limited to applications with simulated environments (e.g., games, simulated robotic environments) since current algorithms may execute costly/catastrophic actions and may require an amount of data that is prohibitively large for interaction with real environments. ******Methods: I will develop novel Bayesian reinforcement learning techniques that can quantify the uncertainty of the environment. This will be helpful both for robustness and sample efficiency. In Bayesian learning, a distribution over the unknowns is estimated and refined at each time step. This distribution also allows a system to explore more efficiently by focusing its actions on the parts of the environment that are still unknown. To that effect, I will develop scalable Bayesian techniques for deep reinforcement learning that explore safely and efficiently. I will also develop novel constrained reinforcement learning techniques that take into account secondary objectives such as variance and cost functions that should not be exceeded. This will further improve robustness by ensuring that key performance indicators (KPIs) are met in industrial applications. I will also develop generative reinforcement learning techniques that are robust to missing inputs. In some applications (e.g., non-myopic recommender systems and self-managing networks), some observations/sensors might not be available at each time step. Generative reinforcement learning techniques that can marginalize inputs in a principled way will be designed.

强化学习（RL）可以说是机器学习最全面的形式之一。它促进了主动学习，并允许系统在做出一系列决策时，在很长一段时间内学习其环境。该系统还可以从可能被延迟的弱信号中学习。这在机器人、自动驾驶汽车、会话代理、游戏、运筹学、自动交易、非近视推荐系统和自我管理网络中特别有用。强化学习的普遍性也使其变得复杂，因此在算法上具有挑战性。目标：这项工作的目标是开发算法来提高强化学习的鲁棒性和样本效率。近年来，通过扩展到高维输入的深度强化学习技术（例如，图像和自然语言）和复杂任务。然而，大多数成功仅限于具有模拟环境的应用（例如，游戏、模拟的机器人环境），因为当前的算法可能执行昂贵的/灾难性的动作，并且可能需要对于与真实的环境的交互而言过大的数据量。** 方法：我将开发新的贝叶斯强化学习技术，可以量化环境的不确定性。这对于鲁棒性和采样效率都有帮助。在贝叶斯学习中，在每个时间步估计和细化未知数的分布。这种分布还允许系统通过将其操作集中在环境中仍然未知的部分来更有效地探索。为此，我将开发可扩展的贝叶斯技术，用于安全有效地探索深度强化学习。我还将开发新的约束强化学习技术，这些技术将考虑不应超过的方差和成本函数等次要目标。这将通过确保在工业应用中满足关键性能指标（KPI）来进一步提高稳健性。我还将开发生成强化学习技术，这些技术对丢失的输入具有鲁棒性。在某些应用中（例如，非近视推荐系统和自管理网络），一些观察/传感器可能在每个时间步不可用。将设计生成式强化学习技术，可以以原则性的方式将输入边缘化。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Poupart, Pascal其他文献

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

前馈和循环和积网络的在线结构学习

DOI：
发表时间：
2018
期刊：
Advances in Neural Information Processing Systems (NeurIPS
影响因子：
0
作者：
Kalra, Agastya;Rashwan, Abdullah;Hsu, Wei-Shou;Poupart, Pascal;Doshi, Prashant;Trimponias, Georgios
通讯作者：
Trimponias, Georgios