Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
基本信息
- 批准号:RGPIN-2018-06677
- 负责人:
- 金额:$ 2.99万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Artificial Intelligence (AI) research has come a long way creating systems that challenge human supremacy in decision domains such as Chess, Jeopardy, stock trading, and recently image recognition, Atari 2600 arcade games, and the Asian boardgame Go. By contrast, AI progress in popular video games which often feature large action spaces, real-time constraints, multiple players, and hidden information, has been slow, and in many cases human experts can still easily outperform the best AI systems.***The human advantage in these domains can in part be attributed to our abilities to simplify problems while maintaining solutions, to search at different abstraction levels (e.g., looking into details only when high-level solution concepts do not seem to work), to infer intentions from observed actions, and to quickly adjust to opponents and partners. The methods that have been instrumental to creating strong AI systems listed above. For example, training policy networks and using Monte Carlo search to determine good low-level actions are currently not powerful enough to achieve human expert level performance in domains featuring large action spaces and long playing episodes consisting of actions with microscopic effects.***To overcome these problems, we propose to investigate how to better integrate heuristic search (which can evaluate the merit of actions by looking ahead) with machine learning to deal with large combinatorial action spaces, uncertainty, and agent cooperation. The main long-term research objectives are: 1) learning hierarchical policies from self-play using deep neural networks, 2) understanding the role of heuristic search vs. learned policies in domains for which forward-models are not available, 3) learning strategies in cooperative multi-agent domains, and 4) data efficient agent modelling for cooperation and exploitation. We approach these long-term goals by starting with simpler tasks involving supervised learning from human training data, studying reinforcement learning in medium-sized action space domains, and integrating human cooperation strategies into existing search-based AI systems.***Making substantial progress in the target domains of this proposal will have a profound impact on technology and society. In a world in which machines can learn to perform well in multi-agent settings and can formulate and execute effective high-level action plans, we may just be a step away from general human-like intelligence.
人工智能(AI)研究已经走过了漫长的道路,创造了挑战人类在决策领域的霸权的系统,如国际象棋,危险,股票交易,以及最近的图像识别,Atari 2600街机游戏和亚洲棋盘游戏Go。相比之下,流行的视频游戏中的人工智能进展缓慢,这些游戏通常具有大的动作空间,实时约束,多个玩家和隐藏信息,在许多情况下,人类专家仍然可以轻松地超越最好的人工智能系统。人类在这些领域的优势部分归因于我们在保持解决方案的同时简化问题的能力,在不同抽象层次上搜索(例如,只有在高级解决方案概念似乎不起作用时才查看细节),从观察到的行动中推断意图,并迅速适应对手和合作伙伴。这些方法有助于创建上面列出的强大AI系统。例如,训练策略网络和使用蒙特卡洛搜索来确定良好的低级动作目前还不足以在具有大动作空间和由微观效果动作组成的长播放片段的领域中实现人类专家级别的性能。为了克服这些问题,我们建议研究如何更好地结合启发式搜索(它可以通过前瞻性的行动评估的优点)与机器学习来处理大的组合动作空间,不确定性和代理合作。主要的长期研究目标是:1)使用深度神经网络从自我游戏中学习分层策略,2)理解启发式搜索与学习策略在前向模型不可用的领域中的作用,3)合作多智能体领域中的学习策略,以及4)用于合作和开发的数据高效代理建模。我们从更简单的任务开始,包括从人类训练数据中进行监督学习,研究中型动作空间域中的强化学习,并将人类合作策略整合到现有的基于搜索的AI系统中,从而实现这些长期目标。在该提案的目标领域取得实质性进展将对技术和社会产生深远影响。在一个机器可以学习在多智能体环境中表现良好,并可以制定和执行有效的高级行动计划的世界中,我们可能离一般的类人智能只有一步之遥。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Buro, Michael其他文献
Real-Time Strategy Game Competitions
- DOI:
10.1609/aimag.v33i3.2419 - 发表时间:
2012-09-01 - 期刊:
- 影响因子:0.9
- 作者:
Buro, Michael;Churchill, David - 通讯作者:
Churchill, David
Buro, Michael的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Buro, Michael', 18)}}的其他基金
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2018-06677 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2018-06677 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2018-06677 - 财政年份:2020
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2018-06677 - 财政年份:2018
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2017-05271 - 财政年份:2017
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
- 批准号:
261531-2012 - 财政年份:2016
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
- 批准号:
261531-2012 - 财政年份:2015
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
- 批准号:
261531-2012 - 财政年份:2014
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
- 批准号:
261531-2012 - 财政年份:2013
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
- 批准号:
261531-2012 - 财政年份:2012
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Improving Optimization-Based Scheduling and Path Planning Decision Support: An Artificial Intelligence and Operations Research Approach With Applications to Surveillance and Search
改进基于优化的调度和路径规划决策支持:一种应用于监视和搜索的人工智能和运筹学方法
- 批准号:
RGPIN-2021-03495 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
- 批准号:
RGPAS-2020-00080 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
- 批准号:
RGPIN-2020-04665 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2018-06677 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
In Search of Legal Capacity: Law, Guardianship, and Supported Decision-Making in Intellectually Disabled People's Lives in Turkey
寻求法律能力:土耳其智障人士生活中的法律、监护权和辅助决策
- 批准号:
ES/W003945/1 - 财政年份:2022
- 资助金额:
$ 2.99万 - 项目类别:
Research Grant
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
- 批准号:
RGPIN-2020-04665 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
- 批准号:
RGPAS-2020-00080 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Improving Optimization-Based Scheduling and Path Planning Decision Support: An Artificial Intelligence and Operations Research Approach With Applications to Surveillance and Search
改进基于优化的调度和路径规划决策支持:一种应用于监视和搜索的人工智能和运筹学方法
- 批准号:
RGPIN-2021-03495 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
- 批准号:
RGPIN-2018-06677 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Grants Program - Individual
Improving Optimization-Based Scheduling and Path Planning Decision Support: An Artificial Intelligence and Operations Research Approach With Applications to Surveillance and Search
改进基于优化的调度和路径规划决策支持:一种应用于监视和搜索的人工智能和运筹学方法
- 批准号:
DGECR-2021-00189 - 财政年份:2021
- 资助金额:
$ 2.99万 - 项目类别:
Discovery Launch Supplement