RI: Small: Collaborative Research: Hidden Parameter Markov Decision Processes: Exploiting Structure in Families of Tasks

RI:小型:协作研究:隐藏参数马尔可夫决策过程:利用任务族中的结构

基本信息

  • 批准号:
    1717569
  • 负责人:
  • 金额:
    $ 20.8万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-08-01 至 2021-07-31
  • 项目状态:
    已结题

项目摘要

Part 1Machine learning has the potential to automate many complex, real-life tasks. However, learning algorithms typically require a substantial amount of data from each specific task they are asked to solve, requiring repeated interactions with the world, each of which take time and effort. Many real-life learning scenarios involve repeated interactions with tasks that are similar, but not identical. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs - each has a similar disease but a different progression, requiring individualized treatment; a robot may have to manipulate objects of different size and weight - each requiring similar but not identical grasping strategies. In such cases treating all of the tasks as the same results in poor performance, but learning to solve each as if they were completely different takes far too long. This project will develop intelligent agents that can use knowledge gained when solving prior tasks to much more rapidly learn new tasks that are similar but not quite the same.The principal technical component of this project will lie in rigorously defining what it means for tasks to be related and in producing algorithms for leveraging that definition to enable rapid learning. To do so, the project will introduce the Hidden-Parameter Markov Decision Process, which models a family of tasks through a parameter which describes variation through the family but is hidden from the learner. The project will investigate methods that exploit this structure by learning a model of task variation and then seeking to identify the parameter value for each specific task. The planned work will focus on healthcare applications, where families of related but distinct tasks are common (i.e. each patient will have unique characteristics). However, the project aims to produce foundational learning algorithms applicable to many application areas, ranging from robotics to systems design. This research will also be integrated into the courses taught by the PIs at Harvard and Brown and made available online; the PIs will include a diverse population, including REUs, both in these classes and in their research groups.Part 2Many real-life learning scenarios involve repeated interactions with tasks that have similar, but not identical, dynamics. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs; a robot may have to manipulate objects of different size and weight. These cases describe a family of related tasks, each of which is similar but not quite the same. An intelligent agent should be able to transfer knowledge learned during previous experiences to rapidly solve new tasks in the same family. However, while many algorithms have been developed to transfer knowledge, the lack of a model of task relatedness inhibits our ability to formally understand the benefits of such algorithms or the structure they exploit.The planned work will model such scenarios by embedding the tasks on a low dimensional manifold that captures relevant variation between instances. Each location on this manifold (unobserved by the agent) describes a task instance, forming a sufficient statistic for solving the task in the context of the task family. Preliminary work by the PIs has shown that it is possible to learn such a manifold after solving just a few individual task instances and enable the rapid optimization of policies for new task instances. Building on these promising initial results, the PIs plan to: 1) Develop methods for task family characterization, by determining whether a collection of tasks can be modeled via a single manifold or consists of several clusters; whether a new task belongs to an existing cluster or manifold; and if so, and whether or not transfer is worthwhile. 2) Scale inference by adapting recent results from machine learning to deal with large state and action spaces. 3) Generate policies using Bayesian reinforcement learning algorithms, and by exploiting formal links between state and policy representations.In addition to synthetic domains, progress on these directions will be applied to problems of treatment optimization for patients with HIV, sepsis, and depression via clinical collaborations that the PIs have with world-experts in these diseases.
第1部分机器学习有潜力自动化许多复杂的现实任务。然而,学习算法通常需要来自要求它们解决的每个特定任务的大量数据,需要与世界重复交互,每个都需要时间和精力。许多现实生活中的学习场景涉及与类似但不相同的任务的重复交互。例如,免疫学家可能会遇到患有不同共病和潜伏病毒库的HIV患者-每个人都有类似的疾病,但进展不同,需要个性化治疗;机器人可能必须操纵不同大小和重量的物体-每个人都需要类似但不相同的抓取策略。在这种情况下,把所有的任务都视为相同的结果是表现不佳,但学习解决每一个任务,好像它们是完全不同的,需要太长的时间。该项目将开发智能代理,这些智能代理可以使用解决先前任务时获得的知识来更快地学习类似但不完全相同的新任务。该项目的主要技术组成部分将在于严格定义任务相关的含义,并产生利用该定义实现快速学习的算法。为此,该项目将引入隐参数马尔可夫决策过程,该过程通过一个参数对一系列任务进行建模,该参数描述了整个家庭的变化,但对学习者是隐藏的。该项目将研究通过学习任务变化模型来利用这种结构的方法,然后寻求识别每个特定任务的参数值。计划的工作将集中在医疗保健应用程序,其中相关但不同的任务是常见的(即每个患者都有独特的特征)。 然而,该项目旨在产生适用于许多应用领域的基础学习算法,从机器人到系统设计。这项研究也将被整合到哈佛和布朗的PI教授的课程中,并在网上提供; PI将包括不同的人群,包括雷乌斯,在这些班级和他们的研究小组中。 例如,免疫学家可能会遇到患有不同合并症和潜伏病毒宿主的艾滋病毒患者;机器人可能必须操作不同大小和重量的物体。 这些案例描述了一系列相关的任务,每个任务都很相似,但又不完全相同。智能代理应该能够转移在以前的经验中学到的知识,以快速解决同一家庭中的新任务。然而,虽然已经开发了许多算法来传递知识,但缺乏任务相关性模型抑制了我们正式理解这些算法的好处或它们所利用的结构的能力。计划中的工作将通过将任务嵌入到低维流形上来模拟这些场景,该流形捕捉实例之间的相关变化。 这个流形上的每个位置(智能体未观察到的)描述了一个任务实例,形成了一个足够的统计数据,用于在任务族的上下文中解决任务。 PI的初步工作表明,在解决几个单独的任务实例后,可以学习这样的流形,并为新的任务实例快速优化策略。 在这些有希望的初步结果的基础上,PI计划:1)通过确定任务集合是否可以通过单个流形建模或由多个集群组成,开发任务族表征的方法;新任务是否属于现有集群或流形;如果是,以及是否值得转移。2)通过调整机器学习的最新结果来处理大的状态和动作空间。3)使用贝叶斯强化学习算法生成策略,并利用状态和策略表示之间的正式联系。除了合成领域,这些方向的进展将通过PI与这些疾病的世界专家的临床合作应用于HIV,败血症和抑郁症患者的治疗优化问题。

项目成果

期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Coarse-Grained Smoothness for Reinforcement Learning in Metric Spaces
度量空间中强化学习的粗粒度平滑度
Optimistic Initialization for Exploration in Continuous Control
  • DOI:
    10.1609/aaai.v36i7.20727
  • 发表时间:
    2022-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sam Lobel;Omer Gottesman;Cameron S. Allen;Akhil Bagaria;G. Konidaris
  • 通讯作者:
    Sam Lobel;Omer Gottesman;Cameron S. Allen;Akhil Bagaria;G. Konidaris
Simultaneously Learning Transferable Symbols and Language Groundings from Perceptual Data for Instruction Following
  • DOI:
    10.15607/rss.2020.xvi.102
  • 发表时间:
    2020-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    N. Gopalan;Eric Rosen;G. Konidaris;Stefanie Tellex
  • 通讯作者:
    N. Gopalan;Eric Rosen;G. Konidaris;Stefanie Tellex
Skill Discovery for Exploration and Planning using Deep Skill Graphs
使用深度技能图进行探索和规划的技能发现
Q-functionals for Value-Based Continuous Control
  • DOI:
    10.1609/aaai.v37i7.26073
  • 发表时间:
    2023-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bowen He;Sam Lobel;Sreehari Rammohan;Shangqun Yu;G. Konidaris
  • 通讯作者:
    Bowen He;Sam Lobel;Sreehari Rammohan;Shangqun Yu;G. Konidaris
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

George Konidaris其他文献

George Konidaris的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('George Konidaris', 18)}}的其他基金

RI: Medium: Learning Task-Specific Representations for Broadly Capable Reinforcement Learning Agents
RI:中:学习具有广泛能力的强化学习代理的特定任务表示
  • 批准号:
    1955361
  • 财政年份:
    2020
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
CAREER: Learning Symbolic Representations for Robot Manipulation
职业:学习机器人操作的符号表示
  • 批准号:
    1844960
  • 财政年份:
    2019
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Continuing Grant
FMitF: Collaborative Research: User-Centered Verification and Repair of Trigger-Action Programs
FMITF:协作研究:以用户为中心的触发操作程序验证和修复
  • 批准号:
    1836948
  • 财政年份:
    2018
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Robotics Activities at Association for the Advancement of Artificial Intelligence (AAAI) 2016
2016 年人工智能促进协会 (AAAI) 机器人活动
  • 批准号:
    1600043
  • 财政年份:
    2016
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
  • 批准号:
    2313131
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
  • 批准号:
    2345528
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
  • 批准号:
    2232298
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
  • 批准号:
    2232055
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
  • 批准号:
    2232054
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
  • 批准号:
    2232300
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
  • 批准号:
    2232299
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
  • 批准号:
    2313130
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Evolutionary Approach to Optimal Morphology and Control of Transformable Soft Robots
RI:小型:协作研究:可变形软机器人的最佳形态和控制的进化方法
  • 批准号:
    2325491
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
  • 批准号:
    2334936
  • 财政年份:
    2023
  • 资助金额:
    $ 20.8万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了