权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Using Imperfect Predictions to Make Good Decisions

职业：利用不完美的预测做出正确的决策

基本信息

批准号：
1939827
负责人：
Erin Talvitie
金额：
$ 30.13万
依托单位：
Harvey Mudd College
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-07-01 至 2023-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1939827&HistoricalAwards=false
关键词：
CAREER Using Imperfect Predictions Make

项目摘要

As humans and other animals navigate the world they demonstrate remarkable flexibility in encountering unfamiliar systems, spaces and phenomena, learning to make predictions about how they will behave, and making good decisions based on those predictions. Crucial to this ability is the fact that one does not need to make perfectly accurate or fully detailed predictions to make good decisions. Though, due to our natural limitations, our predictions about the future are necessarily flawed, they are nevertheless sufficiently useful to make reasonable decisions. For artificial agents, in contrast, imperfect predictions often lead to catastrophic failures in decision making. Many existing approaches fundamentally assume that the agent will eventually learn to make perfect predictions and make perfect decisions, which is unreasonable in sufficiently rich, complex environments. This work considers the problem of developing artificial agents that are more aware of and more robust to their own limitations. Agents that can more robustly and flexibly learn from experience in truly complex environments have the potential to impact nearly any application in which decisions are made over time, for instance autonomous robots/vehicles, personal assistants, and medical/legal decision support. Furthermore, as the project will be undertaken at an undergraduate-only liberal arts college, undergraduate researchers will play an integral role in the work. The PI will also build on the strength of the liberal arts setting to enhance instruction of key discipline-specific research and writing skills throughout the Computer Science curriculum. Explicit development of these skills will not only improve students' preparation for a wide variety of career paths (including basic research) but is also aligned with best practices for broadening participation in the discipline. This project studies model-based reinforcement learning (MBRL) under the assumption that the agent has fundamental limitations that prevent it from learning a perfect model or from producing optimal plans. The central hypothesis is that in this context the MBRL problem cannot be decomposed into separate model-learning and planning problems, each treating the other as an idealized black box. Rather the optimization process for each component must be aware of its role in the overall architecture and of the limitations of its partner. One key aim of the work is to derive novel measures of model quality that are more tightly related to the true objective of control performance than standard measures of one-step prediction accuracy adapted from supervised learning settings. Another is to investigate how model learning objectives/algorithms can be adapted to account for the limitations of the specific planner that will use the model. Further, control algorithms will be investigated that can make effective use of models of non-homogeneous quality by mediating between model-based and model-free knowledge. The ultimate goal is to integrate these principles into novel MBRL agents that are significantly more robust to limitations in the model class and/or planner and are able to succeed in environments that are too complex and high-dimensional to be modeled or solved exactly.

当人类和其他动物在世界上航行时，它们在遇到不熟悉的系统、空间和现象时表现出非凡的灵活性，学会预测自己的行为，并根据这些预测做出正确的决策。对这种能力至关重要的是，一个人不需要做出完全准确或完全详细的预测来做出好的决定。虽然，由于我们天生的局限性，我们对未来的预测必然是有缺陷的，但它们仍然足够有用，可以做出合理的决定。相比之下，对于人工智能体来说，不完美的预测往往会导致灾难性的决策失败。许多现有的方法从根本上假设智能体最终将学会做出完美的预测并做出完美的决策，这在足够丰富、复杂的环境中是不合理的。这项工作考虑了开发人工代理的问题，这些代理对自己的局限性更有意识，更健壮。能够在真正复杂的环境中更稳健、更灵活地从经验中学习的智能体，有可能影响几乎任何需要长期做出决策的应用，例如自主机器人/车辆、个人助理和医疗/法律决策支持。此外，由于该项目将在一所只招收本科生的文理学院进行，本科生研究人员将在工作中发挥不可或缺的作用。PI还将以文科设置的优势为基础，在整个计算机科学课程中加强对关键学科特定研究和写作技巧的指导。这些技能的明确发展不仅将提高学生对各种职业道路（包括基础研究）的准备，而且还与扩大学科参与的最佳实践相一致。本项目研究基于模型的强化学习（MBRL），假设智能体有基本的限制，阻止它学习一个完美的模型或产生最优的计划。中心假设是，在这种情况下，MBRL问题不能分解为单独的模型学习和规划问题，每个问题都将对方视为理想的黑箱。相反，每个组件的优化过程必须了解其在整个体系结构中的作用以及其合作伙伴的局限性。这项工作的一个关键目标是推导出与控制性能的真实目标更紧密相关的模型质量的新度量，而不是从监督学习设置中适应的一步预测精度的标准度量。另一个是研究如何调整模型学习目标/算法，以解释将使用该模型的特定规划器的局限性。此外，还将研究控制算法，通过在基于模型的知识和无模型的知识之间进行中介，有效地利用非同质质量的模型。最终目标是将这些原则集成到新的MBRL代理中，这些代理对模型类和/或规划器的限制具有更强的鲁棒性，并且能够在过于复杂和高维的环境中取得成功，无法精确建模或解决。