Robust Decision-Aware Model-based Reinforcement Learning

基于鲁棒决策感知模型的强化学习

基本信息

  • 批准号:
    RGPIN-2021-03701
  • 负责人:
  • 金额:
    $ 2.11万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2021
  • 资助国家:
    加拿大
  • 起止时间:
    2021-01-01 至 2022-12-31
  • 项目状态:
    已结题

项目摘要

Reinforcement learning (RL) is the problem of designing an agent that interacts with its environment and adaptively improves its long-term performance. Many complex real-world decision-making problems can be formulated as an RL problem. Example applications include energy management systems for hybrid cars, dynamic treatment regimes in healthcare, and many others in robotics, finance, etc. RL is at the core of AI and has the potential of having a huge impact on our economy and society, arguably more so than any other area of machine learning. Despite these successes, RL as a technology is not ready for most real-world applications. A major source of difficulty is the high sample complexity of RL agents. Sample complexity refers to the number of interactions (or data points) required to achieve a certain level of performance. An RL agent that requires too many samples before performing well is unsuitable for real-world applications, in which obtaining new samples is often costly and time consuming. Model-based RL (MBRL) is a promising approach to design sample-efficient agents for problems where the number of interactions with the real-world cannot be very large. The basic idea of MBRL is to learn a model of the environment, and then use the model in an internal simulator to plan a good policy, i.e., the strategy to select actions. This may improve the sample complexity of the agent. This is contingent, however, on learning an accurate model of the real-world. The conventional approach to model learning, which is based on learning a good predictive model of the environment, has an important shortcoming. It is based on the belief that an accurate predictor is sufficient for planning. The often-unnoticed fact is that no model can be completely accurate, and there are always some errors between the real-world and the model. The real-world is sometimes too complex for our models. What I suggest in my research program is to rethink how we should do MBRL. Trying to learn complex dynamics that are irrelevant to the underlying decision problem is pointless. A conventional model learning approach cannot discriminate between decision-relevant and irrelevant aspects of the environment, and hence wastes the capacity of a model on unnecessary detail. The fundamental idea of this research program is that instead of trying to learn a model that is a good predictor of the environment, one should only learn about the aspects that are relevant to the decision problem. The scientific impact of this research program is that it opens up and explores an unorthodox way of thinking about how an agent should learn about its environment. I expect my research team's progress on this direction to provide the theoretical and foundational groundwork for the future of model-based RL. I also expect that it leads to sample-efficient RL agents that can be used for real-world applications.
强化学习(RL)是设计一个与环境相互作用并自适应地提高其长期性能的智能体的问题。现实世界中许多复杂的决策问题都可以表述为强化学习问题。示例应用包括混合动力汽车的能源管理系统、医疗保健中的动态治疗机制,以及机器人、金融等领域的许多其他应用。强化学习是人工智能的核心,有可能对我们的经济和社会产生巨大影响,可以说比任何其他机器学习领域都要大。尽管取得了这些成功,强化学习作为一种技术还没有为大多数实际应用做好准备。困难的一个主要来源是RL代理的高样本复杂性。样本复杂度指的是达到某个性能级别所需的交互(或数据点)的数量。需要太多样本才能表现良好的RL代理不适合实际应用,因为在实际应用中,获取新样本通常既昂贵又耗时。基于模型的强化学习(MBRL)是一种很有前途的方法,用于设计与现实世界的交互数量不是很大的问题的样本高效代理。MBRL的基本思想是学习环境的模型,然后在内部模拟器中使用该模型来规划一个好的策略,即选择动作的策略。这可能会提高该试剂的样品复杂性。然而,这取决于学习真实世界的准确模型。传统的模型学习方法是基于学习一个良好的环境预测模型,这有一个重要的缺点。它基于这样一种信念,即一个准确的预测器足以用于计划。一个经常被忽视的事实是,没有一个模型可以完全准确,并且在真实世界和模型之间总是存在一些误差。对于我们的模型来说,现实世界有时过于复杂。在我的研究计划中,我建议重新思考我们应该如何做MBRL。试图学习与潜在决策问题无关的复杂动态是毫无意义的。传统的模型学习方法不能区分环境中与决策相关和不相关的方面,因此在不必要的细节上浪费了模型的能力。这个研究项目的基本思想是,人们应该只学习与决策问题相关的方面,而不是试图学习一个能很好地预测环境的模型。这个研究项目的科学影响在于,它开辟并探索了一种非正统的思考方式,即智能体应该如何了解其环境。我希望我的研究团队在这个方向上的进展能为未来基于模型的强化学习提供理论和基础基础。我还期望它能带来可用于实际应用程序的高效样本强化学习代理。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Farahmand, Amirmassoud其他文献

Farahmand, Amirmassoud的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Farahmand, Amirmassoud', 18)}}的其他基金

Robust Decision-Aware Model-based Reinforcement Learning
基于鲁棒决策感知模型的强化学习
  • 批准号:
    RGPIN-2021-03701
  • 财政年份:
    2022
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Discovery Grants Program - Individual
Robust Decision-Aware Model-based Reinforcement Learning
基于鲁棒决策感知模型的强化学习
  • 批准号:
    DGECR-2021-00419
  • 财政年份:
    2021
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Discovery Launch Supplement

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队

相似海外基金

Collaborative Research: RI: Medium: Informed, Fair, Efficient, and Incentive-Aware Group Decision Making
协作研究:RI:媒介:知情、公平、高效和具有激励意识的群体决策
  • 批准号:
    2313137
  • 财政年份:
    2023
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Informed, Fair, Efficient, and Incentive-Aware Group Decision Making
协作研究:RI:媒介:知情、公平、高效和具有激励意识的群体决策
  • 批准号:
    2313136
  • 财政年份:
    2023
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Standard Grant
Robust Decision-Aware Model-based Reinforcement Learning
基于鲁棒决策感知模型的强化学习
  • 批准号:
    RGPIN-2021-03701
  • 财政年份:
    2022
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Discovery Grants Program - Individual
EAGER: DCL: SaTC: Enabling Interdisciplinary Collab: Impact-aware Machine Learning for Fair and Private Decision Making: Algorithms and Applications in Juvenile Justice Systems
EAGER:DCL:SaTC:实现跨学科协作:影响感知机器学习促进公平和私人决策:少年司法系统中的算法和应用
  • 批准号:
    2209951
  • 财政年份:
    2022
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Standard Grant
Robust Decision-Aware Model-based Reinforcement Learning
基于鲁棒决策感知模型的强化学习
  • 批准号:
    DGECR-2021-00419
  • 财政年份:
    2021
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Discovery Launch Supplement
RII Track-1: Data Analytics that are Robust and Trusted (DART): From Smart Curation to Socially Aware Decision Making
RII Track-1:稳健且值得信赖的数据分析 (DART):从智能管理到社会意识决策
  • 批准号:
    1946391
  • 财政年份:
    2020
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Cooperative Agreement
Fairness aware data mining for discrimination free decision-making
公平意识数据挖掘以实现无歧视决策
  • 批准号:
    DP200101210
  • 财政年份:
    2020
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Discovery Projects
RTML: Large: Real-Time Autonomic Decision Making on Sparsity-Aware Accelerated Hardware via Online Machine Learning and Approximation
RTML:大型:通过在线机器学习和近似在稀疏感知加速硬件上进行实时自主决策
  • 批准号:
    1937403
  • 财政年份:
    2019
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Standard Grant
NRI: Collaborative Research: Enabling Risk-Aware Decision Making in Human-Guided Unmanned Surface Vehicle Teams
NRI:协作研究:在人类引导的无人地面车辆团队中实现风险意识决策
  • 批准号:
    1634433
  • 财政年份:
    2016
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Standard Grant
NRI: Collaborative Research: Enabling Risk-Aware Decision Making in Human-Guided Unmanned Surface Vehicle Teams
NRI:协作研究:在人类引导的无人地面车辆团队中实现风险意识决策
  • 批准号:
    1526016
  • 财政年份:
    2015
  • 资助金额:
    $ 2.11万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了