权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Fast Reinforcement Learning Using Multiple Models and State Decomposition

使用多个模型和状态分解的快速强化学习

基本信息

批准号：
1407925
负责人：
Snehasis Mukhopadhyay
金额：
$ 15.42万
依托单位：
Indiana University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-08-15 至 2017-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1407925&HistoricalAwards=false
关键词：
Fast Reinforcement Learning Using Multiple

项目摘要

This project attempts to develop better methods for Reinforcement Learning and Approximate Dynamic Programming (RLADP), in order to be able to handle decision tasks with greater complexity both in time and in space. Reinforcement learning systems are systems which can learn to maximize any measure of performance or satisfaction, based on their experience of observing their environment, acting on the environment, and receiving feedback on performance, similar to the pain or pleasure which is used to reinforce animal behavior. Current reinforcement learning methods do not learn fast enough to perform well, when their environment is too complex in space or in time. This project will develop new methods to handle that kind of complexity. The team will also have a collaboration with IBM research, and will try to address a testbed problem involving the management of a fleet of plug-in hybrid cars.Complexity in time will be handled by use of a multiple model approach, connecting various options or skills by evaluation and updating of the landmark states which mark transitions between different regions of state space. This is similar to previous work on decision blocks and modified Bellman equations previously presented at the PI's workshop on learning and adaptive systems, but otherwise is a unique, new an important direction. Complexity in space is addressed by a multiagent approach, based on a kind of spatial decomposition.

该项目试图为强化学习和近似动态规划（RLADP）开发更好的方法，以便能够处理在时间和空间上都具有更大复杂性的决策任务。强化学习系统是一种可以学习最大化任何性能或满意度的系统，基于他们观察环境的经验，对环境采取行动，并接收关于性能的反馈，类似于用于加强动物行为的痛苦或快乐。当前的强化学习方法在空间或时间上过于复杂时，学习速度不够快，无法表现良好。这个项目将开发新的方法来处理这种复杂性。该团队还将与IBM研究部门合作，并将尝试解决一个涉及插电式混合动力汽车车队管理的试验台问题。时间上的复杂性将通过使用多模型方法来处理，通过评估和更新标志着状态空间不同区域之间转换的地标状态来连接各种选项或技能。这类似于先前在PI的学习和自适应系统研讨会上提出的决策块和修改的Bellman方程的工作，但在其他方面是一个独特的，新的重要方向。空间的复杂性是由多智能体的方法，基于一种空间分解。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Snehasis Mukhopadhyay其他文献

Homogeneous Agent-Based Distributed Information Filtering

DOI：
10.1023/a:1019760221121
发表时间：
2002-10-01
期刊：
Cluster Computing-The Journal of Networks Software Tools and Applications
影响因子：
4.100
作者：
Rajeev R. Raje;Mingyong Qiao;Snehasis Mukhopadhyay;Mathew Palakal;Shengquan Peng;Javed Mostafa
通讯作者：
Javed Mostafa

COBioSIFTER – A CORBA-Based Distributed Multi-Agent Biological Information Management System

DOI：
10.1023/b:clus.0000039496.64629.32
发表时间：
2004-10-01
期刊：
Cluster Computing-The Journal of Networks Software Tools and Applications
影响因子：
4.100
作者：
Rajeev R. Raje;Daocheng Zhu;Snehasis Mukhopadhyay;Liying Tang;Mathew Palakal;Javed Mostafa
通讯作者：
Javed Mostafa

A bidding mechanism for Web-based agents involved in information classification

DOI：
10.1023/a:1019215815209
发表时间：
1998-01-01
期刊：
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS
影响因子：
3.400
作者：
Rajeev R. Raje;Snehasis Mukhopadhyay;Michael Boyles;Artur Papiez;Nila Patel;Mathew Palakal;Javed Mostafa
通讯作者：
Javed Mostafa