RI: Small: Collaborative Research: Hidden Parameter Markov Decision Processes: Exploiting Structure in Families of Tasks
RI:小型:协作研究:隐藏参数马尔可夫决策过程:利用任务族中的结构
基本信息
- 批准号:1718306
- 负责人:
- 金额:$ 24.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-08-01 至 2022-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Part 1Machine learning has the potential to automate many complex, real-life tasks. However, learning algorithms typically require a substantial amount of data from each specific task they are asked to solve, requiring repeated interactions with the world, each of which take time and effort. Many real-life learning scenarios involve repeated interactions with tasks that are similar, but not identical. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs - each has a similar disease but a different progression, requiring individualized treatment; a robot may have to manipulate objects of different size and weight - each requiring similar but not identical grasping strategies. In such cases treating all of the tasks as the same results in poor performance, but learning to solve each as if they were completely different takes far too long. This project will develop intelligent agents that can use knowledge gained when solving prior tasks to much more rapidly learn new tasks that are similar but not quite the same.The principal technical component of this project will lie in rigorously defining what it means for tasks to be related and in producing algorithms for leveraging that definition to enable rapid learning. To do so, the project will introduce the Hidden-Parameter Markov Decision Process, which models a family of tasks through a parameter which describes variation through the family but is hidden from the learner. The project will investigate methods that exploit this structure by learning a model of task variation and then seeking to identify the parameter value for each specific task. The planned work will focus on healthcare applications, where families of related but distinct tasks are common (i.e. each patient will have unique characteristics). However, the project aims to produce foundational learning algorithms applicable to many application areas, ranging from robotics to systems design. This research will also be integrated into the courses taught by the PIs at Harvard and Brown and made available online; the PIs will include a diverse population, including REUs, both in these classes and in their research groups.Part 2Many real-life learning scenarios involve repeated interactions with tasks that have similar, but not identical, dynamics. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs; a robot may have to manipulate objects of different size and weight. These cases describe a family of related tasks, each of which is similar but not quite the same. An intelligent agent should be able to transfer knowledge learned during previous experiences to rapidly solve new tasks in the same family. However, while many algorithms have been developed to transfer knowledge, the lack of a model of task relatedness inhibits our ability to formally understand the benefits of such algorithms or the structure they exploit.The planned work will model such scenarios by embedding the tasks on a low dimensional manifold that captures relevant variation between instances. Each location on this manifold (unobserved by the agent) describes a task instance, forming a sufficient statistic for solving the task in the context of the task family. Preliminary work by the PIs has shown that it is possible to learn such a manifold after solving just a few individual task instances and enable the rapid optimization of policies for new task instances. Building on these promising initial results, the PIs plan to: 1) Develop methods for task family characterization, by determining whether a collection of tasks can be modeled via a single manifold or consists of several clusters; whether a new task belongs to an existing cluster or manifold; and if so, and whether or not transfer is worthwhile. 2) Scale inference by adapting recent results from machine learning to deal with large state and action spaces. 3) Generate policies using Bayesian reinforcement learning algorithms, and by exploiting formal links between state and policy representations.In addition to synthetic domains, progress on these directions will be applied to problems of treatment optimization for patients with HIV, sepsis, and depression via clinical collaborations that the PIs have with world-experts in these diseases.
第 1 部分机器学习有潜力自动执行许多复杂的现实任务。然而,学习算法通常需要来自它们被要求解决的每个特定任务的大量数据,需要与世界反复交互,每一个都需要时间和精力。许多现实生活中的学习场景涉及与相似但不相同的任务的重复交互。例如,免疫学家可能会遇到患有不同合并症和潜伏病毒库的艾滋病毒患者——每个人都有相似的疾病,但进展情况不同,需要个体化治疗;机器人可能必须操纵不同大小和重量的物体——每个物体都需要相似但不相同的抓取策略。在这种情况下,将所有任务视为相同会导致性能不佳,但学习解决每个任务就好像它们完全不同一样需要太长时间。该项目将开发智能代理,可以利用解决先前任务时获得的知识来更快地学习相似但不完全相同的新任务。该项目的主要技术组成部分将在于严格定义相关任务的含义,并生成利用该定义实现快速学习的算法。为此,该项目将引入隐参数马尔可夫决策过程,该过程通过一个参数对一系列任务进行建模,该参数描述了整个系列的变化,但对学习者来说是隐藏的。该项目将通过学习任务变化模型来研究利用这种结构的方法,然后寻求识别每个特定任务的参数值。计划的工作将侧重于医疗保健应用,其中相关但不同的任务系列很常见(即每个患者都有独特的特征)。 然而,该项目的目标是产生适用于从机器人到系统设计等许多应用领域的基础学习算法。这项研究还将被纳入哈佛大学和布朗大学 PI 教授的课程中,并在网上提供; PI 将包括不同的群体,包括 REU,无论是在这些班级还是在他们的研究小组中。第 2 部分许多现实生活中的学习场景都涉及与具有相似但不相同动态的任务的重复交互。 例如,免疫学家可能会遇到患有不同合并症和潜伏病毒库的艾滋病毒患者;机器人可能必须操纵不同大小和重量的物体。 这些案例描述了一系列相关的任务,每个任务都很相似但又不完全相同。智能代理应该能够转移在以前的经验中学到的知识,以快速解决同一系列中的新任务。然而,虽然已经开发了许多算法来传递知识,但缺乏任务相关性模型阻碍了我们正式理解此类算法或其利用的结构的好处的能力。计划的工作将通过将任务嵌入到捕获实例之间相关变化的低维流形上来对此类场景进行建模。 该流形上的每个位置(代理未观察到)都描述一个任务实例,形成足够的统计数据来解决任务族上下文中的任务。 PI 的初步工作表明,在解决几个单独的任务实例后可以学习这样的流形,并能够快速优化新任务实例的策略。 基于这些有希望的初步结果,PI 计划: 1) 通过确定任务集合是否可以通过单个流形建模或由多个集群组成来开发任务族表征方法;新任务是否属于现有集群或流形;如果是这样,以及是否值得转移。 2)通过调整机器学习的最新结果来处理大型状态和动作空间来扩展推理。 3) 使用贝叶斯强化学习算法,并利用国家和政策表示之间的正式联系来生成政策。除了综合领域之外,这些方向的进展将通过 PI 与这些疾病的世界专家的临床合作,应用于艾滋病毒、脓毒症和抑郁症患者的治疗优化问题。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
- DOI:
- 发表时间:2019-05
- 期刊:
- 影响因子:0
- 作者:Omer Gottesman;Yao Liu;Scott Sussex;E. Brunskill;F. Doshi-Velez
- 通讯作者:Omer Gottesman;Yao Liu;Scott Sussex;E. Brunskill;F. Doshi-Velez
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
- DOI:10.1609/aaai.v31i1.11065
- 发表时间:2017-02
- 期刊:
- 影响因子:0
- 作者:Taylor W. Killian;G. Konidaris;F. Doshi-Velez
- 通讯作者:Taylor W. Killian;G. Konidaris;F. Doshi-Velez
Representation Balancing MDPs for Off-Policy Policy Evaluation
- DOI:
- 发表时间:2018-05
- 期刊:
- 影响因子:0
- 作者:Yao Liu;Omer Gottesman;Aniruddh Raghu;M. Komorowski;A. Faisal;F. Doshi-Velez;E. Brunskill
- 通讯作者:Yao Liu;Omer Gottesman;Aniruddh Raghu;M. Komorowski;A. Faisal;F. Doshi-Velez;E. Brunskill
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Finale Doshi-Velez其他文献
How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection
机器学习推荐如何影响临床医生的治疗选择:以抗抑郁药选择为例
- DOI:
10.1038/s41398-021-01224-x - 发表时间:
2021-02-04 - 期刊:
- 影响因子:6.200
- 作者:
Maia Jacobs;Melanie F. Pradier;Thomas H. McCoy;Roy H. Perlis;Finale Doshi-Velez;Krzysztof Z. Gajos - 通讯作者:
Krzysztof Z. Gajos
Ethical and regulatory challenges of large language models in medicine
医学中大型语言模型的伦理和监管挑战
- DOI:
10.1016/s2589-7500(24)00061-x - 发表时间:
2024-06-01 - 期刊:
- 影响因子:24.100
- 作者:
Jasmine Chiat Ling Ong;Shelley Yin-Hsi Chang;Wasswa William;Atul J Butte;Nigam H Shah;Lita Sui Tjien Chew;Nan Liu;Finale Doshi-Velez;Wei Lu;Julian Savulescu;Daniel Shu Wei Ting - 通讯作者:
Daniel Shu Wei Ting
Association between prescriber practices and major depression treatment outcomes
- DOI:
10.1016/j.xjmad.2024.100080 - 发表时间:
2024-12-01 - 期刊:
- 影响因子:
- 作者:
Sarah Rathnam;Abhishek Sharma;Kamber L. Hart;Pilar F. Verhaak;Thomas H. McCoy;Roy H. Perlis;Finale Doshi-Velez - 通讯作者:
Finale Doshi-Velez
Finale Doshi-Velez的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Finale Doshi-Velez', 18)}}的其他基金
RI: Small: Human Validation in Batch Reinforcement Learning
RI:小:批量强化学习中的人工验证
- 批准号:
2007076 - 财政年份:2020
- 资助金额:
$ 24.2万 - 项目类别:
Continuing Grant
CAREER: Generative Models for Targeted Domain Interpretability with Applications to Healthcare
职业:目标领域可解释性的生成模型及其在医疗保健领域的应用
- 批准号:
1750358 - 财政年份:2018
- 资助金额:
$ 24.2万 - 项目类别:
Continuing Grant
RI: Small: Workshop for Women in Machine Learning
RI:小型:机器学习领域女性研讨会
- 批准号:
1649706 - 财政年份:2016
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Scalable Bayesian Inference for Interpretable Time-Series Models
可解释时间序列模型的可扩展贝叶斯推理
- 批准号:
1544628 - 财政年份:2015
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Scalable Bayesian Inference in Large Medical Databases
大型医学数据库中的可扩展贝叶斯推理
- 批准号:
1225204 - 财政年份:2012
- 资助金额:
$ 24.2万 - 项目类别:
Fellowship Award
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
- 批准号:
2345528 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232298 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232055 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232054 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232300 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232299 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313130 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
RI: Small: Collaborative Research: Evolutionary Approach to Optimal Morphology and Control of Transformable Soft Robots
RI:小型:协作研究:可变形软机器人的最佳形态和控制的进化方法
- 批准号:
2325491 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2334936 - 财政年份:2023
- 资助金额:
$ 24.2万 - 项目类别:
Standard Grant