权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty

具有大型动作集和不确定性的决策域中的学习和搜索

基本信息

批准号：
RGPIN-2018-06677
负责人：
Buro, Michael
金额：
$ 2.99万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2018
资助国家：
加拿大
起止时间：
2018-01-01 至 2019-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=646576
关键词：
Learning Search Decision Domains Featuring

项目摘要

Artificial Intelligence (AI) research has come a long way creating systems that challenge human supremacy in decision domains such as Chess, Jeopardy, stock trading, and recently image recognition, Atari 2600 arcade games, and the Asian boardgame Go. By contrast, AI progress in popular video games which often feature large action spaces, real-time constraints, multiple players, and hidden information, has been slow, and in many cases human experts can still easily outperform the best AI systems.***The human advantage in these domains can in part be attributed to our abilities to simplify problems while maintaining solutions, to search at different abstraction levels (e.g., looking into details only when high-level solution concepts do not seem to work), to infer intentions from observed actions, and to quickly adjust to opponents and partners. The methods that have been instrumental to creating strong AI systems listed above. For example, training policy networks and using Monte Carlo search to determine good low-level actions are currently not powerful enough to achieve human expert level performance in domains featuring large action spaces and long playing episodes consisting of actions with microscopic effects.***To overcome these problems, we propose to investigate how to better integrate heuristic search (which can evaluate the merit of actions by looking ahead) with machine learning to deal with large combinatorial action spaces, uncertainty, and agent cooperation. The main long-term research objectives are: 1) learning hierarchical policies from self-play using deep neural networks, 2) understanding the role of heuristic search vs. learned policies in domains for which forward-models are not available, 3) learning strategies in cooperative multi-agent domains, and 4) data efficient agent modelling for cooperation and exploitation. We approach these long-term goals by starting with simpler tasks involving supervised learning from human training data, studying reinforcement learning in medium-sized action space domains, and integrating human cooperation strategies into existing search-based AI systems.***Making substantial progress in the target domains of this proposal will have a profound impact on technology and society. In a world in which machines can learn to perform well in multi-agent settings and can formulate and execute effective high-level action plans, we may just be a step away from general human-like intelligence.

人工智能(AI)研究已经取得了长足的进步，创造出了挑战人类在决策领域的霸主地位的系统，如国际象棋、危险游戏、股票交易，以及最近的图像识别、雅达利2600街机游戏和亚洲棋类游戏围棋。相比之下，在流行的视频游戏中，人工智能进展缓慢，通常以大动作空间、实时约束、多玩家和隐藏信息为特色，在许多情况下，人类专家仍然可以轻松超越最好的人工智能系统。*这些领域的人类优势可以部分归因于我们在维护解决方案的同时简化问题、在不同抽象级别进行搜索(例如，仅在高级解决方案概念似乎不起作用时才查看细节)、从观察到的动作推断意图，以及快速适应对手和合作伙伴的能力。这些方法有助于创建上面列出的强大的人工智能系统。例如，训练策略网络和使用蒙特卡罗搜索来确定好的低级别动作目前还不够强大，不足以在具有大动作空间和由具有微观影响的动作组成的长播放情节的领域达到人类专家级的性能。*为了克服这些问题，我们建议研究如何更好地将启发式搜索(它可以通过向前看来评估动作的优点)与机器学习相结合来处理大型组合动作空间、不确定性和代理合作。主要的长期研究目标是：1)使用深度神经网络从自我发挥中学习分层策略，2)理解启发式搜索与学习策略在无法使用正向模型的领域中的作用，3)合作多代理领域中的学习策略，以及4)用于合作和开发的数据高效代理建模。为了实现这些长期目标，我们从更简单的任务开始，包括从人类训练数据中进行监督学习，研究中型动作空间领域的强化学习，并将人类合作战略整合到现有的基于搜索的人工智能系统中。*在该提议的目标领域取得实质性进展将对技术和社会产生深远影响。在这样一个世界里，机器可以学习在多智能体环境中很好地运行，并可以制定和执行有效的高级行动计划，我们可能距离一般的人类智能只有一步之遥。