权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty

具有大型动作集和不确定性的决策域中的学习和搜索

基本信息

批准号：
RGPIN-2018-06677
负责人：
Buro, Michael
金额：
$ 2.99万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2019
资助国家：
加拿大
起止时间：
2019-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=688288
关键词：
Learning Search Decision Domains Featuring

项目摘要

Artificial Intelligence (AI) research has come a long way creating systems that challenge human supremacy in decision domains such as Chess, Jeopardy, stock trading, and recently image recognition, Atari 2600 arcade games, and the Asian boardgame Go. By contrast, AI progress in popular video games which often feature large action spaces, real-time constraints, multiple players, and hidden information, has been slow, and in many cases human experts can still easily outperform the best AI systems.***The human advantage in these domains can in part be attributed to our abilities to simplify problems while maintaining solutions, to search at different abstraction levels (e.g., looking into details only when high-level solution concepts do not seem to work), to infer intentions from observed actions, and to quickly adjust to opponents and partners. The methods that have been instrumental to creating strong AI systems listed above. For example, training policy networks and using Monte Carlo search to determine good low-level actions are currently not powerful enough to achieve human expert level performance in domains featuring large action spaces and long playing episodes consisting of actions with microscopic effects.***To overcome these problems, we propose to investigate how to better integrate heuristic search (which can evaluate the merit of actions by looking ahead) with machine learning to deal with large combinatorial action spaces, uncertainty, and agent cooperation. The main long-term research objectives are: 1) learning hierarchical policies from self-play using deep neural networks, 2) understanding the role of heuristic search vs. learned policies in domains for which forward-models are not available, 3) learning strategies in cooperative multi-agent domains, and 4) data efficient agent modelling for cooperation and exploitation. We approach these long-term goals by starting with simpler tasks involving supervised learning from human training data, studying reinforcement learning in medium-sized action space domains, and integrating human cooperation strategies into existing search-based AI systems.***Making substantial progress in the target domains of this proposal will have a profound impact on technology and society. In a world in which machines can learn to perform well in multi-agent settings and can formulate and execute effective high-level action plans, we may just be a step away from general human-like intelligence.

人工智能（AI）研究已经走过了漫长的道路，创造了挑战人类在决策领域的霸权的系统，如国际象棋，危险，股票交易，以及最近的图像识别，Atari 2600街机游戏和亚洲棋盘游戏Go。相比之下，流行的视频游戏中的人工智能进展缓慢，这些游戏通常具有大的动作空间，实时约束，多个玩家和隐藏信息，在许多情况下，人类专家仍然可以轻松地超越最好的人工智能系统。人类在这些领域的优势部分归因于我们在保持解决方案的同时简化问题的能力，在不同抽象层次上搜索（例如，只有在高级解决方案概念似乎不起作用时才查看细节），从观察到的行动中推断意图，并迅速适应对手和合作伙伴。这些方法有助于创建上面列出的强大AI系统。例如，训练策略网络和使用蒙特卡洛搜索来确定良好的低级动作目前还不足以在具有大动作空间和由微观效果动作组成的长播放片段的领域中实现人类专家级别的性能。为了克服这些问题，我们建议研究如何更好地结合启发式搜索（它可以通过前瞻性的行动评估的优点）与机器学习来处理大的组合动作空间，不确定性和代理合作。主要的长期研究目标是：1）使用深度神经网络从自我游戏中学习分层策略，2）理解启发式搜索与学习策略在前向模型不可用的领域中的作用，3）合作多智能体领域中的学习策略，以及4）用于合作和开发的数据高效代理建模。我们从更简单的任务开始，包括从人类训练数据中进行监督学习，研究中型动作空间域中的强化学习，并将人类合作策略整合到现有的基于搜索的AI系统中，从而实现这些长期目标。在该提案的目标领域取得实质性进展将对技术和社会产生深远影响。在一个机器可以学习在多智能体环境中表现良好，并可以制定和执行有效的高级行动计划的世界中，我们可能离一般的类人智能只有一步之遥。