权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Robust Decision-Aware Model-based Reinforcement Learning

基于鲁棒决策感知模型的强化学习

基本信息

批准号：
RGPIN-2021-03701
负责人：
Farahmand, Amirmassoud
金额：
$ 2.11万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750032
关键词：
Robust Decision Aware Model based

项目摘要

Reinforcement learning (RL) is the problem of designing an agent that interacts with its environment and adaptively improves its long-term performance. Many complex real-world decision-making problems can be formulated as an RL problem. Example applications include energy management systems for hybrid cars, dynamic treatment regimes in healthcare, and many others in robotics, finance, etc. RL is at the core of AI and has the potential of having a huge impact on our economy and society, arguably more so than any other area of machine learning. Despite these successes, RL as a technology is not ready for most real-world applications. A major source of difficulty is the high sample complexity of RL agents. Sample complexity refers to the number of interactions (or data points) required to achieve a certain level of performance. An RL agent that requires too many samples before performing well is unsuitable for real-world applications, in which obtaining new samples is often costly and time consuming. Model-based RL (MBRL) is a promising approach to design sample-efficient agents for problems where the number of interactions with the real-world cannot be very large. The basic idea of MBRL is to learn a model of the environment, and then use the model in an internal simulator to plan a good policy, i.e., the strategy to select actions. This may improve the sample complexity of the agent. This is contingent, however, on learning an accurate model of the real-world. The conventional approach to model learning, which is based on learning a good predictive model of the environment, has an important shortcoming. It is based on the belief that an accurate predictor is sufficient for planning. The often-unnoticed fact is that no model can be completely accurate, and there are always some errors between the real-world and the model. The real-world is sometimes too complex for our models. What I suggest in my research program is to rethink how we should do MBRL. Trying to learn complex dynamics that are irrelevant to the underlying decision problem is pointless. A conventional model learning approach cannot discriminate between decision-relevant and irrelevant aspects of the environment, and hence wastes the capacity of a model on unnecessary detail. The fundamental idea of this research program is that instead of trying to learn a model that is a good predictor of the environment, one should only learn about the aspects that are relevant to the decision problem. The scientific impact of this research program is that it opens up and explores an unorthodox way of thinking about how an agent should learn about its environment. I expect my research team's progress on this direction to provide the theoretical and foundational groundwork for the future of model-based RL. I also expect that it leads to sample-efficient RL agents that can be used for real-world applications.

强化学习（RL）是设计一个与环境交互并自适应地提高其长期性能的代理的问题。许多复杂的现实决策问题都可以用RL问题来表述。示例应用包括混合动力汽车的能源管理系统，医疗保健中的动态治疗方案以及机器人，金融等许多其他领域。RL是人工智能的核心，有可能对我们的经济和社会产生巨大影响，可以说比机器学习的任何其他领域都要大。尽管取得了这些成功，但RL作为一种技术还没有为大多数现实世界的应用做好准备。困难的一个主要来源是RL代理的高样本复杂性。样本复杂度是指达到一定性能水平所需的交互（或数据点）数量。在表现良好之前需要太多样本的RL代理不适合现实世界的应用，其中获得新样本通常是昂贵且耗时的。基于模型的强化学习（MBRL）是一种很有前途的方法来设计样本有效的代理问题，其中与现实世界的交互数量不能很大。MBRL的基本思想是学习环境的模型，然后在内部模拟器中使用该模型来规划好的策略，即，选择行动的策略。这可以改善试剂的样品复杂性。然而，这取决于学习真实世界的准确模型。基于学习环境的良好预测模型的传统模型学习方法具有重要的缺点。它是基于这样一种信念，即一个准确的预测是足够的规划。一个经常被忽视的事实是，没有一个模型是完全准确的，现实世界和模型之间总是存在一些误差。现实世界有时对我们的模型来说太复杂了。在我的研究计划中，我建议重新思考我们应该如何做MBRL。试图学习与潜在决策问题无关的复杂动态是毫无意义的。传统的模型学习方法无法区分环境的决策相关和不相关方面，因此浪费了模型在不必要细节上的能力。这个研究计划的基本思想是，而不是试图学习一个模型，这是一个很好的预测环境，一个人应该只学习与决策问题相关的方面。这项研究计划的科学影响在于，它开辟并探索了一种非正统的思考方式，即智能体应该如何了解其环境。我希望我的研究团队在这个方向上的进展能够为基于模型的强化学习的未来提供理论和基础。我还希望它能带来可用于现实世界应用的样本高效RL代理。