Proto-Value Functions: A Unified Framework for Learning Task-Specific Behaviors and Task-Independent Representations
原始价值函数:学习任务特定行为和任务无关表示的统一框架
基本信息
- 批准号:0534999
- 负责人:
- 金额:$ 44.36万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2006
- 资助国家:美国
- 起止时间:2006-01-01 至 2009-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project addresses a longstanding puzzle in artificial intelligence (AI): how can agents transform their temporal experience into multiscale task-independent representations that can effectively guide long-term task-specific behavior? The project will investigate a nonparametric framework combining task-independent learning with task-specific learning. Algorithmically, the framework comprises of four phases. Initially, agents learn a discrete manifold representation of a given environment, which can be viewed as a topological graph representing the states reachable through single or multi-step actions. Next, the graph is analyzed using spectral clustering techniques to reveal "bottlenecks," symmetries, and other geometric invariants. In the third phase, an orthonormal set of task-independent basis functions called proto-value functions are extracted from the environment's topology: These basis functions capture large-scale geometric invariants that all value functions on the state space must adhere to. In the final phase, proto-value functions are combined with rewards to approximate task-specific value functions.The proposed framework unifies two previously disparate lines of research in AI: learning of behavior using value functions, pioneered by Arthur Samuel, and the learning of representations based on global state space analysis, pioneered by Saul Amarel. The theoretical basis for the framework draws upon links between discrete and continuous mathematics: Riemannian manifolds and the spectral theory of graphs; elliptic differential equations and abstract harmonic analysis on graphs. Specifically, the Hilbert space of smooth functions on a Riemannian manifold has a discrete spectrum based on the eigenfunctions of the Laplace-Beltrami operator. The applications of this theory to Markov decision processes will be explored, in particular the ability of Laplacian eigenfunctions or proto-value functions to both capture large-scale geometric structure and as well as approximate task-specific value functions. A novel class of algorithms termed Representation Policy Iteration (RPI) will be investigated, which interleave representation learning and behavior learning. The research thus also addresses a longstanding question not resolved in much previous work on approximation methods for solving large Markov decision processes: how can basis functions be generated automatically? The research will investigate the scalability of the proposed framework to larger problems, including both discrete factored state spaces as well as continuous state spaces. The testbeds include simulated discrete and continuous benchmark problems, simulated and real robot testbeds, and an information extraction task of maintaining the Reinforcement Learning Repository (RLR), the world's largest collection of documents and data relating to reinforcement learning.Broader impacts of this project include algorithmic and theoretical insights leading to a unified approach to learning behavior and representation, as well as applications to real-world problems such as humanoid robotics and web repository maintenance. Additionally, this project will give valuable research experience to women graduate students and to undergraduate students from local four year colleges.
这个项目解决了人工智能(AI)中一个长期存在的难题:智能体如何将他们的时间经验转化为多尺度任务独立的表征,从而有效地指导长期的任务特定行为?该项目将研究一个结合任务独立学习和任务特定学习的非参数框架。在算法上,该框架包括四个阶段。最初,智能体学习给定环境的离散流形表示,可以将其视为表示通过单步或多步操作可达到的状态的拓扑图。接下来,使用谱聚类技术分析图,以揭示“瓶颈”、对称性和其他几何不变量。在第三阶段,从环境的拓扑中提取一组称为原值函数的任务无关基函数的标准正交集:这些基函数捕获状态空间上的所有值函数必须遵守的大规模几何不变量。在最后阶段,原型价值函数与奖励相结合,以近似特定于任务的价值函数。提出的框架统一了人工智能中两个先前不同的研究方向:使用价值函数学习行为,由Arthur Samuel首创,以及基于全局状态空间分析的表征学习,由Saul Amarel首创。该框架的理论基础借鉴了离散数学和连续数学之间的联系:黎曼流形和图的谱理论;椭圆微分方程与图的抽象调和分析。具体地说,黎曼流形上光滑函数的希尔伯特空间具有基于拉普拉斯-贝尔特拉米算子的特征函数的离散谱。将探讨该理论在马尔可夫决策过程中的应用,特别是拉普拉斯特征函数或原值函数捕捉大规模几何结构以及近似任务特定值函数的能力。本文将研究表征策略迭代(RPI)算法,它将表征学习与行为学习相结合。因此,这项研究也解决了一个长期存在的问题,在解决大型马尔可夫决策过程的近似方法的许多以前的工作中没有解决:如何自动生成基函数?该研究将探讨所提出的框架在更大问题上的可扩展性,包括离散因子状态空间和连续状态空间。测试平台包括模拟离散和连续基准问题,模拟和真实机器人测试平台,以及维护强化学习存储库(RLR)的信息提取任务,RLR是世界上最大的与强化学习相关的文档和数据集合。该项目更广泛的影响包括算法和理论见解,导致学习行为和表示的统一方法,以及应用于现实世界的问题,如人形机器人和web存储库维护。此外,该项目将为本地四年制大学的女研究生和本科生提供宝贵的研究经验。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sridhar Mahadevan其他文献
Privacy Aware Experiments without Cookies
没有 Cookie 的隐私意识实验
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Shiv Shankar;Ritwik Sinha;Saayan Mitra;Viswanathan Swaminathan;Sridhar Mahadevan;Moumita Sinha - 通讯作者:
Moumita Sinha
C ATEGOROIDS : U NIVERSAL C ONDITIONAL I NDEPENDENCE
类别:普遍有条件独立
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
A. Preprint;Sridhar Mahadevan - 通讯作者:
Sridhar Mahadevan
Categoroids: Universal Conditional Independence
类别:普遍条件独立性
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Sridhar Mahadevan - 通讯作者:
Sridhar Mahadevan
Reconfigurable adaptable micro-robot
可重构的适应性微型机器人
- DOI:
10.1109/icsmc.1999.816634 - 发表时间:
1999 - 期刊:
- 影响因子:0
- 作者:
R. Tummala;Ranjan Mukherjee;D. M. Aslam;Ning Xi;Sridhar Mahadevan;J. Weng - 通讯作者:
J. Weng
Quantifying Prior Determination Knowledge Using the PAC Learning Model
- DOI:
10.1023/a:1022605018507 - 发表时间:
1994-10-01 - 期刊:
- 影响因子:2.900
- 作者:
Sridhar Mahadevan;Prasad Tadepalli - 通讯作者:
Prasad Tadepalli
Sridhar Mahadevan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sridhar Mahadevan', 18)}}的其他基金
Collaborative Research: Transfer Learning for Chemical Analyses from Laser-Induced Spectroscopy
合作研究:激光诱导光谱化学分析的迁移学习
- 批准号:
1307179 - 财政年份:2013
- 资助金额:
$ 44.36万 - 项目类别:
Standard Grant
RI: Small: Reinforcement Learning by Mirror Descent
RI:小:通过镜像下降的强化学习
- 批准号:
1216467 - 财政年份:2012
- 资助金额:
$ 44.36万 - 项目类别:
Standard Grant
NeTS Small: Analysis and Design of Best-Effort Content-Caching Networks
NeTS Small:尽力而为内容缓存网络的分析和设计
- 批准号:
1117764 - 财政年份:2011
- 资助金额:
$ 44.36万 - 项目类别:
Standard Grant
Manifold Alignment of High-Dimensional Data Sets
高维数据集的流形对齐
- 批准号:
1025120 - 财政年份:2010
- 资助金额:
$ 44.36万 - 项目类别:
Standard Grant
RI-Medium: Collaborative Research: Learning Multiscale Representations using Harmonic Analysis on Graphs
RI-Medium:协作研究:使用图的调和分析学习多尺度表示
- 批准号:
0803288 - 财政年份:2008
- 资助金额:
$ 44.36万 - 项目类别:
Standard Grant
Scaling Reinforcement Learning by Adaptive Task Selection and Linear Solution Merging
通过自适应任务选择和线性解决方案合并扩展强化学习
- 批准号:
9896122 - 财政年份:1997
- 资助金额:
$ 44.36万 - 项目类别:
Continuing Grant
Scaling Reinforcement Learning by Adaptive Task Selection and Linear Solution Merging
通过自适应任务选择和线性解决方案合并扩展强化学习
- 批准号:
9501852 - 财政年份:1995
- 资助金额:
$ 44.36万 - 项目类别:
Continuing Grant
Support for a Workshop on Reinforcement Learning
支持强化学习研讨会
- 批准号:
9529108 - 财政年份:1995
- 资助金额:
$ 44.36万 - 项目类别:
Standard Grant
相似国自然基金
基于时间序列间分位相依性(quantile dependence)的风险值(Value-at-Risk)预测模型研究
- 批准号:71903144
- 批准年份:2019
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
A study on value distribution properties of meromorphic functions generated by a wide variety of series and an investigation into their possible algebraic analogues
对各种级数生成的亚纯函数的值分布特性的研究及其可能的代数类似物的研究
- 批准号:
22K03335 - 财政年份:2022
- 资助金额:
$ 44.36万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The intermediate value theorem for functions on the Levi-Civita field
Levi-Civita 域上函数的中间值定理
- 批准号:
574661-2022 - 财政年份:2022
- 资助金额:
$ 44.36万 - 项目类别:
University Undergraduate Student Research Awards
Value-distribution theory of zeta and multiple zeta functions
zeta 和多重 zeta 函数的值分布理论
- 批准号:
22K03267 - 财政年份:2022
- 资助金额:
$ 44.36万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
New developments in the anticyclotomic Iwasawa theory and special value formulas on L-functions
反圆剖分Iwasawa理论和L函数特殊值公式的新进展
- 批准号:
22H00096 - 财政年份:2022
- 资助金额:
$ 44.36万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
Value distribution of L-functions in the critical strip
临界带中 L 函数的值分布
- 批准号:
565498-2021 - 财政年份:2021
- 资助金额:
$ 44.36万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
Approximation theory for two-level value functions with applications
两级值函数的逼近理论及其应用
- 批准号:
EP/V049038/1 - 财政年份:2021
- 资助金额:
$ 44.36万 - 项目类别:
Research Grant
Value-Distribution of Logarithmic Derivatives of L-functions
L 函数的对数导数的值分布
- 批准号:
551817-2020 - 财政年份:2020
- 资助金额:
$ 44.36万 - 项目类别:
University Undergraduate Student Research Awards
Research on the global structure of solutions and their stability for nonlocal boundary value problems by using elliptic functions
利用椭圆函数研究非局部边值问题解的全局结构及其稳定性
- 批准号:
19K03593 - 财政年份:2019
- 资助金额:
$ 44.36万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
On the density functions related to the value-distributions of zeta-functions
关于与 zeta 函数值分布相关的密度函数
- 批准号:
19J12037 - 财政年份:2019
- 资助金额:
$ 44.36万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Value-Distribution of Dirichlet L-functions
Dirichlet L 函数的值分布
- 批准号:
538455-2019 - 财政年份:2019
- 资助金额:
$ 44.36万 - 项目类别:
University Undergraduate Student Research Awards