Unifying Recent Advances in Deep Learning with Decision-theoretic Planning for Learned MDPs and POMDPs
将深度学习的最新进展与学习 MDP 和 POMDP 的决策理论规划相结合
基本信息
- 批准号:RGPIN-2022-04377
- 负责人:
- 金额:$ 4.01万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In many complex sequential decision-making problems ranging from urban traffic management to interactive conversational recommender systems, it is difficult (if not impossible) for humans to specify complete and accurate models of these domains. However, the ubiquity of modern sensing ranging from traffic cameras to speech-enabled home devices allows us to collect large quantities of data from these complex systems to learn predictively accurate models for purposes of optimal decision-theoretic planning. To this end, there have been two lines of research investigating planning in learned models: data-driven planning (DDP) and offline model-based reinforcement learning (MBRL). While DDP and offline MBRL have made important progress, they have not fully exploited recent advances in planning or deep learning that consider epistemic uncertainty (i.e., confidence in what a model knows), partial observability, and value-awareness (i.e., what is minimally relevant for decision-making) that are critical for many real-world applications. We will address these deficiencies in the following research themes: Theme 1 -- Planning with Deep Bayesian Models of Epistemic Uncertainty in MDPs: To improve planning and model-learning w.r.t. epistemic uncertainty, we will (a) investigate the efficacy of recent advances in Bayesian deep learning for MDP model acquisition, (b) develop robust end-to-end planning methods for these Bayesian deep models, and (c) learn heuristics to further improve planning efficiency. Theme 2 -- Planning in Deep-learned Partially Observed MDPs (POMDPs): To date, partial observability has received little direct attention in DDP and offline MBRL. To address this deficiency, we will (a) investigate transformers for learning accurate POMDP models, (b) investigate novel methods for deep Bayesian belief updating with complex observations (e.g., images or text), and (c) leverage (a) and (b) in novel end-to-end deep-learned POMDP planning techniques. Theme 3 -- Value-awareness in Learned MDPs and POMDPs: When learning from rich, but incomplete observational data, it is critical for both computational and sample efficiency to learn what is minimally relevant for predicting reward. To address value-awareness for Themes 1 and 2, we will investigate planning in (a) value-aware deep MDP models with epistemic uncertainty and (b) value-aware deep POMDP models. The research will be grounded in two key ongoing applied research projects: (i) MDPs for urban traffic signal control in collaboration with the University of Toronto Intelligent Transportation Systems Centre and (ii) POMDPs for interactive conversational recommender systems. While the proposed research will fundamentally contribute to the unification of recent advances in deep learning with decision-theoretic planning in learned MDPs and POMDPs for a variety of potential domains, these specific applications will serve as practical testbeds to validate this research program.
从城市交通管理到交互式会话推荐系统,在许多复杂的顺序决策问题中,人类很难(如果不是不可能的话)指定这些领域的完整和准确的模型。然而,从交通摄像头到支持语音的家用设备,现代传感技术的普遍存在使我们能够从这些复杂的系统中收集大量数据,以学习预测准确的模型,从而实现最佳决策理论规划。为此,有两条研究线调查学习模型中的规划:数据驱动规划(DDP)和离线基于模型的强化学习(MBRL)。虽然DDP和离线MBRL已经取得了重要进展,但它们还没有充分利用考虑认知不确定性的规划或深度学习的最新进展(即,对模型所知的置信度)、部分可观察性和价值感知(即,与决策最低限度相关的内容),这些内容对于许多现实世界的应用程序至关重要。我们将在以下研究主题中解决这些缺陷:主题1 -MDP中认知不确定性的深度贝叶斯模型规划:为了改善规划和模型学习w.r.t.认识不确定性,我们将(a)调查贝叶斯深度学习在MDP模型获取方面的最新进展的有效性,(B)为这些贝叶斯深度模型开发鲁棒的端到端规划方法,以及(c)学习算法以进一步提高规划效率。主题2 --在深度学习的部分可观察MDP(POMDP)中进行规划:迄今为止,部分可观察性在DDP和离线MBRL中几乎没有受到直接关注。为了解决这一缺陷,我们将(a)研究用于学习精确POMDP模型的转换器,(B)研究用于复杂观测的深度贝叶斯信念更新的新方法(例如,图像或文本),以及(c)在新颖的端到端深度学习POMDP规划技术中利用(a)和(B)。主题3 --学习MDP和POMDP的价值意识:当从丰富但不完整的观察数据中学习时,学习与预测奖励最小相关的内容对于计算和样本效率都至关重要。为了解决主题1和主题2的价值意识,我们将研究(a)具有认知不确定性的价值意识深度MDP模型和(B)价值意识深度POMDP模型中的规划。该研究将基于两个关键的正在进行的应用研究项目:(一)与多伦多大学智能交通系统中心合作的城市交通信号控制MDPs和(二)POMDPs的交互式会话推荐系统。虽然拟议的研究将从根本上促进深度学习的最新进展与各种潜在领域的学习MDP和POMDP中的决策理论规划的统一,但这些具体应用将作为验证该研究计划的实际测试平台。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sanner, Scott其他文献
Evaluation of Machine Learning Algorithms for Predicting Readmission After Acute Myocardial Infarction Using Routinely Collected Clinical Data
- DOI:
10.1016/j.cjca.2019.10.023 - 发表时间:
2020-06-01 - 期刊:
- 影响因子:6.2
- 作者:
Gupta, Shagun;Ko, Dennis T.;Sanner, Scott - 通讯作者:
Sanner, Scott
Online continual learning in image classification: An empirical survey
- DOI:
10.1016/j.neucom.2021.10.021 - 发表时间:
2021-11-05 - 期刊:
- 影响因子:6
- 作者:
Mai, Zheda;Li, Ruiwen;Sanner, Scott - 通讯作者:
Sanner, Scott
Relevance- and interface-driven clustering for visual information retrieval
- DOI:
10.1016/j.is.2020.101592 - 发表时间:
2020-12-01 - 期刊:
- 影响因子:3.7
- 作者:
Bouadjenek, Mohamed Reda;Sanner, Scott;Du, Yihao - 通讯作者:
Du, Yihao
A longitudinal study of topic classification on Twitter.
Twitter上的主题分类的纵向研究。
- DOI:
10.7717/peerj-cs.991 - 发表时间:
2022 - 期刊:
- 影响因子:3.8
- 作者:
Bouadjenek, Mohamed Reda;Sanner, Scott;Iman, Zahra;Xie, Lexing;Shi, Daniel Xiaoliang - 通讯作者:
Shi, Daniel Xiaoliang
Comparison of machine learning models for occupancy prediction in residential buildings using connected thermostat data
- DOI:
10.1016/j.buildenv.2019.106177 - 发表时间:
2019-08-01 - 期刊:
- 影响因子:7.4
- 作者:
Huchuk, Brent;Sanner, Scott;O'Brien, William - 通讯作者:
O'Brien, William
Sanner, Scott的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sanner, Scott', 18)}}的其他基金
Continuous Decision Diagrams for Machine Learning and Decision-theoretic AI Planning
用于机器学习和决策理论人工智能规划的连续决策图
- 批准号:
RGPIN-2016-05705 - 财政年份:2021
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Continuous Decision Diagrams for Machine Learning and Decision-theoretic AI Planning
用于机器学习和决策理论人工智能规划的连续决策图
- 批准号:
RGPIN-2016-05705 - 财政年份:2020
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Machine learning for residential building HVAC analytics platform
用于住宅建筑 HVAC 分析平台的机器学习
- 批准号:
508857-2017 - 财政年份:2020
- 资助金额:
$ 4.01万 - 项目类别:
Collaborative Research and Development Grants
Continuous Decision Diagrams for Machine Learning and Decision-theoretic AI Planning
用于机器学习和决策理论人工智能规划的连续决策图
- 批准号:
RGPIN-2016-05705 - 财政年份:2019
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Machine learning for residential building HVAC analytics platform
用于住宅建筑 HVAC 分析平台的机器学习
- 批准号:
508857-2017 - 财政年份:2019
- 资助金额:
$ 4.01万 - 项目类别:
Collaborative Research and Development Grants
Continuous Decision Diagrams for Machine Learning and Decision-theoretic AI Planning
用于机器学习和决策理论人工智能规划的连续决策图
- 批准号:
RGPIN-2016-05705 - 财政年份:2018
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Machine Learning, Sentiment, and Social Media Analysis for Financial Analytics
用于财务分析的机器学习、情绪和社交媒体分析
- 批准号:
531275-2018 - 财政年份:2018
- 资助金额:
$ 4.01万 - 项目类别:
Engage Grants Program
Machine learning for residential building HVAC analytics platform
用于住宅建筑 HVAC 分析平台的机器学习
- 批准号:
508857-2017 - 财政年份:2018
- 资助金额:
$ 4.01万 - 项目类别:
Collaborative Research and Development Grants
Machine learning for residential building HVAC analytics platform
用于住宅建筑 HVAC 分析平台的机器学习
- 批准号:
508857-2017 - 财政年份:2017
- 资助金额:
$ 4.01万 - 项目类别:
Collaborative Research and Development Grants
Deep Unsupervised Learning for Network Anomaly Detection
用于网络异常检测的深度无监督学习
- 批准号:
514078-2017 - 财政年份:2017
- 资助金额:
$ 4.01万 - 项目类别:
Engage Grants Program
相似海外基金
REU Site: Recent Advances in Natural Language Processing
REU 网站:自然语言处理的最新进展
- 批准号:
2349452 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Conference: Geometric Measure Theory, Harmonic Analysis, and Partial Differential Equations: Recent Advances
会议:几何测度理论、调和分析和偏微分方程:最新进展
- 批准号:
2402028 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Conference: Recent advances in nonlinear Partial Differential Equations
会议:非线性偏微分方程的最新进展
- 批准号:
2346780 - 财政年份:2024
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Conference: Recent advances in applications of harmonic analysis to convex geometry
会议:调和分析在凸几何中的应用的最新进展
- 批准号:
2246779 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Conference: Recent Advances in Mathematical Fluid Dynamics
会议:数学流体动力学的最新进展
- 批准号:
2247145 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
KE Fellowship: Sediment matters - using recent advances to unlock effective catchment decision-making
KE 奖学金:沉积物很重要 - 利用最新进展来解锁有效的流域决策
- 批准号:
NE/V018701/2 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Research Grant
Conference: IHES 2023 Summer School: Recent advances in algebraic K-theory
会议:IHES 2023 暑期学校:代数 K 理论的最新进展
- 批准号:
2304723 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Conference: Recent advances in the mechanistic understanding of avian responses to environmental challenges
会议:鸟类应对环境挑战的机制理解的最新进展
- 批准号:
2336743 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Recent Advances in Nanomaterial-Assisted Combinational Sonodynamic Cancer Therapy.
纳米材料辅助组合声动力癌症治疗的最新进展。
- 批准号:
22K12851 - 财政年份:2022
- 资助金额:
$ 4.01万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Conference: Chemical Sensing Innovation: Harnessing Recent Advances in Biological, Physical, Chemical and Data sciences for Engineering Next Generation Electronic Noses
会议:化学传感创新:利用生物、物理、化学和数据科学的最新进展设计下一代电子鼻
- 批准号:
2231526 - 财政年份:2022
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant