权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust Decision-Aware Model-based Reinforcement Learning

基于鲁棒决策感知模型的强化学习

基本信息

批准号：
RGPIN-2021-03701
负责人：
Farahmand, Amirmassoud
金额：
$ 2.11万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=742682
关键词：
Robust Decision Aware Model based

项目摘要

Reinforcement learning (RL) is the problem of designing an agent that interacts with its environment and adaptively improves its long-term performance. Many complex real-world decision-making problems can be formulated as an RL problem. Example applications include energy management systems for hybrid cars, dynamic treatment regimes in healthcare, and many others in robotics, finance, etc. RL is at the core of AI and has the potential of having a huge impact on our economy and society, arguably more so than any other area of machine learning. Despite these successes, RL as a technology is not ready for most real-world applications. A major source of difficulty is the high sample complexity of RL agents. Sample complexity refers to the number of interactions (or data points) required to achieve a certain level of performance. An RL agent that requires too many samples before performing well is unsuitable for real-world applications, in which obtaining new samples is often costly and time consuming. Model-based RL (MBRL) is a promising approach to design sample-efficient agents for problems where the number of interactions with the real-world cannot be very large. The basic idea of MBRL is to learn a model of the environment, and then use the model in an internal simulator to plan a good policy, i.e., the strategy to select actions. This may improve the sample complexity of the agent. This is contingent, however, on learning an accurate model of the real-world. The conventional approach to model learning, which is based on learning a good predictive model of the environment, has an important shortcoming. It is based on the belief that an accurate predictor is sufficient for planning. The often-unnoticed fact is that no model can be completely accurate, and there are always some errors between the real-world and the model. The real-world is sometimes too complex for our models. What I suggest in my research program is to rethink how we should do MBRL. Trying to learn complex dynamics that are irrelevant to the underlying decision problem is pointless. A conventional model learning approach cannot discriminate between decision-relevant and irrelevant aspects of the environment, and hence wastes the capacity of a model on unnecessary detail. The fundamental idea of this research program is that instead of trying to learn a model that is a good predictor of the environment, one should only learn about the aspects that are relevant to the decision problem. The scientific impact of this research program is that it opens up and explores an unorthodox way of thinking about how an agent should learn about its environment. I expect my research team's progress on this direction to provide the theoretical and foundational groundwork for the future of model-based RL. I also expect that it leads to sample-efficient RL agents that can be used for real-world applications.

强化学习（RL）是设计一个与环境相互作用并自适应地提高其长期性能的智能体的问题。现实世界中许多复杂的决策问题都可以表述为强化学习问题。示例应用包括混合动力汽车的能源管理系统、医疗保健中的动态治疗机制，以及机器人、金融等领域的许多其他应用。强化学习是人工智能的核心，有可能对我们的经济和社会产生巨大影响，可以说比任何其他机器学习领域都要大。尽管取得了这些成功，强化学习作为一种技术还没有为大多数实际应用做好准备。困难的一个主要来源是RL代理的高样本复杂性。样本复杂度指的是达到某个性能级别所需的交互（或数据点）的数量。需要太多样本才能表现良好的RL代理不适合实际应用，因为在实际应用中，获取新样本通常既昂贵又耗时。基于模型的强化学习（MBRL）是一种很有前途的方法，用于设计与现实世界的交互数量不是很大的问题的样本高效代理。MBRL的基本思想是学习环境的模型，然后在内部模拟器中使用该模型来规划一个好的策略，即选择动作的策略。这可能会提高该试剂的样品复杂性。然而，这取决于学习真实世界的准确模型。传统的模型学习方法是基于学习一个良好的环境预测模型，这有一个重要的缺点。它基于这样一种信念，即一个准确的预测器足以用于计划。一个经常被忽视的事实是，没有一个模型可以完全准确，并且在真实世界和模型之间总是存在一些误差。对于我们的模型来说，现实世界有时过于复杂。在我的研究计划中，我建议重新思考我们应该如何做MBRL。试图学习与潜在决策问题无关的复杂动态是毫无意义的。传统的模型学习方法不能区分环境中与决策相关和不相关的方面，因此在不必要的细节上浪费了模型的能力。这个研究项目的基本思想是，人们应该只学习与决策问题相关的方面，而不是试图学习一个能很好地预测环境的模型。这个研究项目的科学影响在于，它开辟并探索了一种非正统的思考方式，即智能体应该如何了解其环境。我希望我的研究团队在这个方向上的进展能为未来基于模型的强化学习提供理论和基础基础。我还期望它能带来可用于实际应用程序的高效样本强化学习代理。