权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ITR: Risk, Reward, and Reinforcement

ITR：风险、回报和强化

基本信息

批准号：
0342634
负责人：
John Moody
金额：
--
依托单位：
International Computer Science Institute
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2003
资助国家：
美国
起止时间：
2003-08-01 至 2008-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0342634&HistoricalAwards=false
关键词：
ITR Risk Reward Reinforcement

项目摘要

The objectives of this project are to develop efficient and reliable algorithms for direct reinforcement, to learn risk-averse behaviors for problems with high degrees of uncertainty, and to apply the methods developed to an economically important problem: global asset allocation. Reinforcement learning (RL) enables a goal-directed agent to discover strategies through trial and error exploration with only limited feedback. Direct Reinforcement (DR, or "policy gradient") methods enable an agent to discover a strategy without the need to learn a value function.Dynamic programming and related value function RL methods are often found to be inefficient, to produce unstable solutions, and to have difficulty scaling up to large problems. Hence, there have been relatively few real-world applications of the value function type RL. This project seeks to make several advancements in Direct Reinforcement that will enable the development of efficient and effective practical applications.By controlling the "exploration vs. exploitation" trade-off during on-line learning, DR agents will be able to discover better policies and do so more efficiently. Stochastic optimization methods, such as stochastic "search then converge" or annealing of a Boltzmann temperature are candidate approaches. By developing risk-averse reinforcement methods, DR agents will be able to learn robust policies for uncertain or risky environments. Using risk-sensitive intertemporal utilities, DR agents will learn to avoid risky states or actions while they pursue long-term reward. Dynamic programming is widely used in economics and finance, but few attempts have been made to solve important financial problems with reinforcement learning. As a demonstration of risk-averse DR, this project will build a prototype global asset allocation system.Risk-averse direct reinforcement may find application in a variety of engineering domains, from robotics to industrial control to autonomous agents. Many industries, such as energy and the airlines, need to manage operational and financial risks together, in order to avoid supply shortfalls or bankruptcy. Individual investors must manage risk while building their investment portfolios to meet future needs, such as children's college expenses or retirement. Risk-averse Direct Reinforcement may find application in many such contexts.

该项目的目标是开发有效和可靠的算法，用于直接强化，学习风险规避行为的高度不确定性的问题，并将开发的方法应用到一个经济上重要的问题：全球资产配置。强化学习（RL）使目标导向的智能体能够通过只有有限反馈的试错探索来发现策略。直接强化（DR，或“策略梯度”）方法使智能体能够发现策略而不需要学习值函数。动态规划和相关的值函数RL方法通常被发现效率低下，产生不稳定的解决方案，并且难以扩展到大型问题。因此，RL类型的值函数在现实世界中的应用相对较少。该项目旨在在直接强化方面取得一些进展，从而能够开发高效和有效的实际应用程序。通过控制在线学习期间的“探索与开发”权衡，DR代理将能够发现更好的策略并更有效地执行。随机优化方法，如随机“搜索然后收敛”或玻尔兹曼温度退火是候选方法。通过开发风险规避强化方法，DR代理将能够学习不确定或风险环境的鲁棒策略。利用风险敏感的跨期效用，DR智能体将学习避免风险状态或行为，同时追求长期回报。动态规划在经济学和金融学中有着广泛的应用，但很少有人尝试用强化学习来解决重要的金融问题。作为风险规避DR的一个示范，该项目将建立一个原型全球资产配置系统。风险规避直接加固可能会在各种工程领域中找到应用，从机器人到工业控制到自治代理。许多行业，如能源和航空公司，需要共同管理运营和财务风险，以避免供应短缺或破产。个人投资者必须在建立投资组合的同时管理风险，以满足未来的需求，例如子女的大学费用或退休金。风险厌恶直接强化法可以在许多这样的情况下找到应用。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

John Moody其他文献

Illuminating wildfire erosion and deposition patterns with repeat terrestrial lidar

利用重复地面激光雷达照亮野火侵蚀和沉积模式

DOI：
发表时间：
2016
期刊：
影响因子：
0
作者：
F. Rengers;Gregory E. Tucker;John Moody;B. Ebel
通讯作者：
B. Ebel

Gamer 2.0: Software Toolkit for Adaptive Mesh Generation from Structural Biological Datasets

DOI：
10.1016/j.bpj.2017.11.1921
发表时间：
2018-02-02
期刊：
Conference abstract
影响因子：
作者：
Christopher T. Lee;John Moody;Michael J. Holst;J. Andrew McCammon;Rommie E. Amaro
通讯作者：
Rommie E. Amaro