权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: Learning Strategic Behavior in Sequential Decision Tasks

RI：小：学习顺序决策任务中的策略行为

基本信息

批准号：
0915038
负责人：
Risto Miikkulainen
金额：
$ 45.5万
依托单位：
University of Texas at Austin
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2009
资助国家：
美国
起止时间：
2009-09-01 至 2014-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0915038&HistoricalAwards=false
关键词：
RI Small Learning Strategic Behavior

项目摘要

Many routine, real-world tasks can be seen as sequential decision tasks. For instance, navigating a robot through a complex environment, driving a car in congested traffic, and routing packets in a computer network requires making a sequence of decisions that together minimize time and resources used. It would be desirable to automate these tasks, yet it is difficult because the optimal decisions are generally not known. Many existing learning methods lead to reactive behaviors that perform well in short term, but do not amount to intelligent high-level behavior in the long term. This project is developing methods for learning strategic high-level behavior. Strategic methods need to (1) retain information from past states, (2) learn multimodal behavior, (3) choose between the different behaviors based on crucial detail, and (4) implement a sequential high-level strategy based on those behaviors. The neuroevolution methods developed in prior work solve the first problem by evolving (through genetic algorithms) recurrent neural networks to represent the behavior. To solve the remaining problems, these methods are being extended in the proposed work with multi-objective optimization, local nodes with cascaded structure, and with evolution of modules and their combinations. Preliminary results indicate that this approach is indeed feasible. In the long term, developed technology will make it possible to build robust sequential decision systems for real-world tasks. It leads to safer and more efficient vehicle, traffic, and robot control, improved process and manufacturing optimization, and more efficient computer and communication systems. It will also make the next generation of video games possible, with characters that exhibit realistic, strategic behaviors: Such technology should lead to more effective educational and training games in the future. The OpenNERO open source software platform developed in this work will be made available to the research community.

许多日常的、现实世界的任务可以被看作是顺序决策任务。例如，在复杂的环境中导航机器人，在拥挤的交通中驾驶汽车，以及在计算机网络中路由数据包，都需要做出一系列决策，以最大限度地减少所使用的时间和资源。这将是可取的自动化这些任务，但它是困难的，因为最佳的决策通常是未知的。许多现有的学习方法会导致反应性行为，这些行为在短期内表现良好，但从长期来看并不构成智能的高级行为。该项目正在开发学习战略高级行为的方法。战略方法需要（1）保留来自过去状态的信息，（2）学习多模态行为，（3）基于关键细节在不同行为之间进行选择，以及（4）基于这些行为实施顺序高级策略。在先前的工作中开发的神经进化方法通过进化（通过遗传算法）递归神经网络来表示行为来解决第一个问题。为了解决剩下的问题，这些方法正在扩展的多目标优化，级联结构的局部节点，并与模块及其组合的演变提出的工作。初步结果表明，这种方法确实可行。从长远来看，先进的技术将使人们有可能为现实世界的任务建立强大的顺序决策系统。它带来了更安全、更高效的车辆、交通和机器人控制，改进的工艺和制造优化，以及更高效的计算机和通信系统。它还将使下一代视频游戏成为可能，其角色表现出逼真的战略行为：这种技术应该会在未来带来更有效的教育和培训游戏。在这项工作中开发的OpenNERO开源软件平台将提供给研究界。