权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Collaborative Research: Hidden Parameter Markov Decision Processes: Exploiting Structure in Families of Tasks

RI：小型：协作研究：隐藏参数马尔可夫决策过程：利用任务族中的结构

基本信息

批准号：
1717569
负责人：
George Konidaris
金额：
$ 20.8万
依托单位：
Brown University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-08-01 至 2021-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1717569&HistoricalAwards=false
关键词：
RI Small Collaborative Research Hidden

项目摘要

Part 1Machine learning has the potential to automate many complex, real-life tasks. However, learning algorithms typically require a substantial amount of data from each specific task they are asked to solve, requiring repeated interactions with the world, each of which take time and effort. Many real-life learning scenarios involve repeated interactions with tasks that are similar, but not identical. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs - each has a similar disease but a different progression, requiring individualized treatment; a robot may have to manipulate objects of different size and weight - each requiring similar but not identical grasping strategies. In such cases treating all of the tasks as the same results in poor performance, but learning to solve each as if they were completely different takes far too long. This project will develop intelligent agents that can use knowledge gained when solving prior tasks to much more rapidly learn new tasks that are similar but not quite the same.The principal technical component of this project will lie in rigorously defining what it means for tasks to be related and in producing algorithms for leveraging that definition to enable rapid learning. To do so, the project will introduce the Hidden-Parameter Markov Decision Process, which models a family of tasks through a parameter which describes variation through the family but is hidden from the learner. The project will investigate methods that exploit this structure by learning a model of task variation and then seeking to identify the parameter value for each specific task. The planned work will focus on healthcare applications, where families of related but distinct tasks are common (i.e. each patient will have unique characteristics). However, the project aims to produce foundational learning algorithms applicable to many application areas, ranging from robotics to systems design. This research will also be integrated into the courses taught by the PIs at Harvard and Brown and made available online; the PIs will include a diverse population, including REUs, both in these classes and in their research groups.Part 2Many real-life learning scenarios involve repeated interactions with tasks that have similar, but not identical, dynamics. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs; a robot may have to manipulate objects of different size and weight. These cases describe a family of related tasks, each of which is similar but not quite the same. An intelligent agent should be able to transfer knowledge learned during previous experiences to rapidly solve new tasks in the same family. However, while many algorithms have been developed to transfer knowledge, the lack of a model of task relatedness inhibits our ability to formally understand the benefits of such algorithms or the structure they exploit.The planned work will model such scenarios by embedding the tasks on a low dimensional manifold that captures relevant variation between instances. Each location on this manifold (unobserved by the agent) describes a task instance, forming a sufficient statistic for solving the task in the context of the task family. Preliminary work by the PIs has shown that it is possible to learn such a manifold after solving just a few individual task instances and enable the rapid optimization of policies for new task instances. Building on these promising initial results, the PIs plan to: 1) Develop methods for task family characterization, by determining whether a collection of tasks can be modeled via a single manifold or consists of several clusters; whether a new task belongs to an existing cluster or manifold; and if so, and whether or not transfer is worthwhile. 2) Scale inference by adapting recent results from machine learning to deal with large state and action spaces. 3) Generate policies using Bayesian reinforcement learning algorithms, and by exploiting formal links between state and policy representations.In addition to synthetic domains, progress on these directions will be applied to problems of treatment optimization for patients with HIV, sepsis, and depression via clinical collaborations that the PIs have with world-experts in these diseases.

第1部分机器学习有潜力自动化许多复杂的现实任务。然而，学习算法通常需要来自要求它们解决的每个特定任务的大量数据，需要与世界重复交互，每个都需要时间和精力。许多现实生活中的学习场景涉及与类似但不相同的任务的重复交互。例如，免疫学家可能会遇到患有不同共病和潜伏病毒库的HIV患者-每个人都有类似的疾病，但进展不同，需要个性化治疗;机器人可能必须操纵不同大小和重量的物体-每个人都需要类似但不相同的抓取策略。在这种情况下，把所有的任务都视为相同的结果是表现不佳，但学习解决每一个任务，好像它们是完全不同的，需要太长的时间。该项目将开发智能代理，这些智能代理可以使用解决先前任务时获得的知识来更快地学习类似但不完全相同的新任务。该项目的主要技术组成部分将在于严格定义任务相关的含义，并产生利用该定义实现快速学习的算法。为此，该项目将引入隐参数马尔可夫决策过程，该过程通过一个参数对一系列任务进行建模，该参数描述了整个家庭的变化，但对学习者是隐藏的。该项目将研究通过学习任务变化模型来利用这种结构的方法，然后寻求识别每个特定任务的参数值。计划的工作将集中在医疗保健应用程序，其中相关但不同的任务是常见的（即每个患者都有独特的特征）。然而，该项目旨在产生适用于许多应用领域的基础学习算法，从机器人到系统设计。这项研究也将被整合到哈佛和布朗的PI教授的课程中，并在网上提供; PI将包括不同的人群，包括雷乌斯，在这些班级和他们的研究小组中。例如，免疫学家可能会遇到患有不同合并症和潜伏病毒宿主的艾滋病毒患者;机器人可能必须操作不同大小和重量的物体。这些案例描述了一系列相关的任务，每个任务都很相似，但又不完全相同。智能代理应该能够转移在以前的经验中学到的知识，以快速解决同一家庭中的新任务。然而，虽然已经开发了许多算法来传递知识，但缺乏任务相关性模型抑制了我们正式理解这些算法的好处或它们所利用的结构的能力。计划中的工作将通过将任务嵌入到低维流形上来模拟这些场景，该流形捕捉实例之间的相关变化。这个流形上的每个位置（智能体未观察到的）描述了一个任务实例，形成了一个足够的统计数据，用于在任务族的上下文中解决任务。 PI的初步工作表明，在解决几个单独的任务实例后，可以学习这样的流形，并为新的任务实例快速优化策略。在这些有希望的初步结果的基础上，PI计划：1）通过确定任务集合是否可以通过单个流形建模或由多个集群组成，开发任务族表征的方法;新任务是否属于现有集群或流形;如果是，以及是否值得转移。2)通过调整机器学习的最新结果来处理大的状态和动作空间。3)使用贝叶斯强化学习算法生成策略，并利用状态和策略表示之间的正式联系。除了合成领域，这些方向的进展将通过PI与这些疾病的世界专家的临床合作应用于HIV，败血症和抑郁症患者的治疗优化问题。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Coarse-Grained Smoothness for Reinforcement Learning in Metric Spaces

度量空间中强化学习的粗粒度平滑度

DOI：
发表时间：
2023
期刊：
Proceedings of the 26th International Conference on Artificial Intelligence and Statistics
影响因子：
0
作者：
Gottesman, O;Asadi, K;Allen, C;Lobel, S;Konidaris, GD;Littman, ML
通讯作者：
Littman, ML

Optimistic Initialization for Exploration in Continuous Control

DOI：
10.1609/aaai.v36i7.20727
发表时间：
2022-06
期刊：
影响因子：
0
作者：
Sam Lobel;Omer Gottesman;Cameron S. Allen;Akhil Bagaria;G. Konidaris
通讯作者：
Sam Lobel;Omer Gottesman;Cameron S. Allen;Akhil Bagaria;G. Konidaris

Simultaneously Learning Transferable Symbols and Language Groundings from Perceptual Data for Instruction Following

DOI：
10.15607/rss.2020.xvi.102
发表时间：
2020-07
期刊：
Robotics: Science and Systems XVI
影响因子：
0
作者：
N. Gopalan;Eric Rosen;G. Konidaris;Stefanie Tellex
通讯作者：
N. Gopalan;Eric Rosen;G. Konidaris;Stefanie Tellex

Skill Discovery for Exploration and Planning using Deep Skill Graphs

使用深度技能图进行探索和规划的技能发现

DOI：
发表时间：
2021
期刊：
Proceedings of the Thirty-Eighth International Conference on Machine Learning
影响因子：
0
作者：
Bagaria, A;Senthil, J.;Konidaris, G.D.
通讯作者：
Konidaris, G.D.

Q-functionals for Value-Based Continuous Control