权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Reinforcement Learning by Mirror Descent

RI：小：通过镜像下降的强化学习

基本信息

批准号：
1216467
负责人：
Sridhar Mahadevan
金额：
$ 45万
依托单位：
University of Massachusetts Amherst
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-08-01 至 2016-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1216467&HistoricalAwards=false
关键词：
RI Small Reinforcement Learning Mirror

项目摘要

A fundamental challenge in machine learning is the design of computational agents that, rather than being explicitly programmed, autonomously learn complex tasks in stochastic real-world environments. Past approaches, such as reinforcement learning algorithms for solving Markov decision processes, scale poorly to large state spaces. The proposed research addresses this curse of dimensionality by investigating a novel framework combining reinforcement learning and online convex optimization, in particular mirror descent and related algorithms. Mirror descent scales significantly better than classical first-order gradient descent in high-dimensional state spaces, by using a distance-generating function specific to a particular state space geometry.The proposed framework enables several significant algorithmic advances in the design of autonomous machine learning agents: a new class of first-order mirror-descent based methods for learning sparse solutions to Markov decision processes will be developed that scale significantly significantly better than previous second-order methods; novel hierarchical methods for solving semi-Markov decision processes will be investigated; and finally, applications to a variety of high-dimensional Markov decision processes will be explored.The anticipated outcomes of the proposed work include foundational advances in designing autonomous agents that learn to solve sequential decision-making problems, which will impact a large number of target applications from manufacturing to robotics and scheduling. The educational goal includes the development of a graduate-level course in online convex optimization for sequential decision-making, as well as interdisciplinary tutorials to enhance the cross-fertilization of ideas from applied mathematics and optimization to machine learning and artificial intelligence.

机器学习的一个基本挑战是设计计算代理，而不是显式编程，在随机的现实世界环境中自主学习复杂的任务。过去的方法，如用于解决马尔可夫决策过程的强化学习算法，在大状态空间中的扩展性很差。该研究通过研究一种结合强化学习和在线凸优化的新框架，特别是镜像下降和相关算法来解决这种维数灾难。通过使用特定于特定状态空间几何结构的距离生成函数，镜像下降在高维状态空间中的扩展性明显优于经典的一阶梯度下降。所提出的框架在自主机器学习代理的设计中实现了几个重要的算法进步：一种新的一阶镜将开发用于学习马尔可夫决策过程的稀疏解的基于下降的方法，该方法的规模明显优于以前的二阶方法;新的层次方法解决半马尔可夫决策过程将进行调查，最后，应用到各种高维马尔可夫决策过程将explored.The预期的成果，拟议的工作包括设计自主代理，学习解决顺序决策问题的基础性进展，这将影响大量的目标应用从制造到机器人和调度。教育目标包括开发用于顺序决策的在线凸优化研究生课程，以及跨学科教程，以加强从应用数学和优化到机器学习和人工智能的思想交叉。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sridhar Mahadevan其他文献

Privacy Aware Experiments without Cookies

没有 Cookie 的隐私意识实验

DOI：
发表时间：
2022
期刊：
Web Search and Data Mining
影响因子：
0
作者：
Shiv Shankar;Ritwik Sinha;Saayan Mitra;Viswanathan Swaminathan;Sridhar Mahadevan;Moumita Sinha
通讯作者：
Moumita Sinha

C ATEGOROIDS : U NIVERSAL C ONDITIONAL I NDEPENDENCE

类别：普遍有条件独立

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
A. Preprint;Sridhar Mahadevan
通讯作者：
Sridhar Mahadevan

Categoroids: Universal Conditional Independence

类别：普遍条件独立性

DOI：
发表时间：
2022
期刊：
arXiv.org
影响因子：
0
作者：
Sridhar Mahadevan
通讯作者：
Sridhar Mahadevan

Reconfigurable adaptable micro-robot

可重构的适应性微型机器人

DOI：
10.1109/icsmc.1999.816634
发表时间：
1999
期刊：
IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028)
影响因子：
0
作者：
R. Tummala;Ranjan Mukherjee;D. M. Aslam;Ning Xi;Sridhar Mahadevan;J. Weng
通讯作者：
J. Weng