Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty

具有大型动作集和不确定性的决策域中的学习和搜索

基本信息

  • 批准号:
    RGPIN-2018-06677
  • 负责人:
  • 金额:
    $ 2.99万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2018
  • 资助国家:
    加拿大
  • 起止时间:
    2018-01-01 至 2019-12-31
  • 项目状态:
    已结题

项目摘要

Artificial Intelligence (AI) research has come a long way creating systems that challenge human supremacy in decision domains such as Chess, Jeopardy, stock trading, and recently image recognition, Atari 2600 arcade games, and the Asian boardgame Go. By contrast, AI progress in popular video games which often feature large action spaces, real-time constraints, multiple players, and hidden information, has been slow, and in many cases human experts can still easily outperform the best AI systems.***The human advantage in these domains can in part be attributed to our abilities to simplify problems while maintaining solutions, to search at different abstraction levels (e.g., looking into details only when high-level solution concepts do not seem to work), to infer intentions from observed actions, and to quickly adjust to opponents and partners. The methods that have been instrumental to creating strong AI systems listed above. For example, training policy networks and using Monte Carlo search to determine good low-level actions are currently not powerful enough to achieve human expert level performance in domains featuring large action spaces and long playing episodes consisting of actions with microscopic effects.***To overcome these problems, we propose to investigate how to better integrate heuristic search (which can evaluate the merit of actions by looking ahead) with machine learning to deal with large combinatorial action spaces, uncertainty, and agent cooperation. The main long-term research objectives are: 1) learning hierarchical policies from self-play using deep neural networks, 2) understanding the role of heuristic search vs. learned policies in domains for which forward-models are not available, 3) learning strategies in cooperative multi-agent domains, and 4) data efficient agent modelling for cooperation and exploitation. We approach these long-term goals by starting with simpler tasks involving supervised learning from human training data, studying reinforcement learning in medium-sized action space domains, and integrating human cooperation strategies into existing search-based AI systems.***Making substantial progress in the target domains of this proposal will have a profound impact on technology and society. In a world in which machines can learn to perform well in multi-agent settings and can formulate and execute effective high-level action plans, we may just be a step away from general human-like intelligence.
人工智能(AI)研究已经取得了长足的进步,创造出了挑战人类在决策领域的霸主地位的系统,如国际象棋、危险游戏、股票交易,以及最近的图像识别、雅达利2600街机游戏和亚洲棋类游戏围棋。相比之下,在流行的视频游戏中,人工智能进展缓慢,通常以大动作空间、实时约束、多玩家和隐藏信息为特色,在许多情况下,人类专家仍然可以轻松超越最好的人工智能系统。*这些领域的人类优势可以部分归因于我们在维护解决方案的同时简化问题、在不同抽象级别进行搜索(例如,仅在高级解决方案概念似乎不起作用时才查看细节)、从观察到的动作推断意图,以及快速适应对手和合作伙伴的能力。这些方法有助于创建上面列出的强大的人工智能系统。例如,训练策略网络和使用蒙特卡罗搜索来确定好的低级别动作目前还不够强大,不足以在具有大动作空间和由具有微观影响的动作组成的长播放情节的领域达到人类专家级的性能。*为了克服这些问题,我们建议研究如何更好地将启发式搜索(它可以通过向前看来评估动作的优点)与机器学习相结合来处理大型组合动作空间、不确定性和代理合作。主要的长期研究目标是:1)使用深度神经网络从自我发挥中学习分层策略,2)理解启发式搜索与学习策略在无法使用正向模型的领域中的作用,3)合作多代理领域中的学习策略,以及4)用于合作和开发的数据高效代理建模。为了实现这些长期目标,我们从更简单的任务开始,包括从人类训练数据中进行监督学习,研究中型动作空间领域的强化学习,并将人类合作战略整合到现有的基于搜索的人工智能系统中。*在该提议的目标领域取得实质性进展将对技术和社会产生深远影响。在这样一个世界里,机器可以学习在多智能体环境中很好地运行,并可以制定和执行有效的高级行动计划,我们可能距离一般的人类智能只有一步之遥。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Buro, Michael其他文献

Real-Time Strategy Game Competitions
  • DOI:
    10.1609/aimag.v33i3.2419
  • 发表时间:
    2012-09-01
  • 期刊:
  • 影响因子:
    0.9
  • 作者:
    Buro, Michael;Churchill, David
  • 通讯作者:
    Churchill, David

Buro, Michael的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Buro, Michael', 18)}}的其他基金

Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2018-06677
  • 财政年份:
    2022
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2018-06677
  • 财政年份:
    2021
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2018-06677
  • 财政年份:
    2020
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2018-06677
  • 财政年份:
    2019
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2017-05271
  • 财政年份:
    2017
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
  • 批准号:
    261531-2012
  • 财政年份:
    2016
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
  • 批准号:
    261531-2012
  • 财政年份:
    2015
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
  • 批准号:
    261531-2012
  • 财政年份:
    2014
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
  • 批准号:
    261531-2012
  • 财政年份:
    2013
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
"Search, Opponent Modelling, Cooperation, and State Inference in Complex Imperfect Information Domains."
“复杂不完美信息域中的搜索、对手建模、合作和状态推理。”
  • 批准号:
    261531-2012
  • 财政年份:
    2012
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual

相似海外基金

Improving Optimization-Based Scheduling and Path Planning Decision Support: An Artificial Intelligence and Operations Research Approach With Applications to Surveillance and Search
改进基于优化的调度和路径规划决策支持:一种应用于监视和搜索的人工智能和运筹学方法
  • 批准号:
    RGPIN-2021-03495
  • 财政年份:
    2022
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
  • 批准号:
    RGPAS-2020-00080
  • 财政年份:
    2022
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
  • 批准号:
    RGPIN-2020-04665
  • 财政年份:
    2022
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2018-06677
  • 财政年份:
    2022
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
In Search of Legal Capacity: Law, Guardianship, and Supported Decision-Making in Intellectually Disabled People's Lives in Turkey
寻求法律能力:土耳其智障人士生活中的法律、监护权和辅助决策
  • 批准号:
    ES/W003945/1
  • 财政年份:
    2022
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Research Grant
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
  • 批准号:
    RGPIN-2020-04665
  • 财政年份:
    2021
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Accurate User Modeling of Search and Decision Making Tasks for Improved Offline Evaluation of Information Retrieval
准确的搜索和决策任务用户建模,以改进信息检索的离线评估
  • 批准号:
    RGPAS-2020-00080
  • 财政年份:
    2021
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Improving Optimization-Based Scheduling and Path Planning Decision Support: An Artificial Intelligence and Operations Research Approach With Applications to Surveillance and Search
改进基于优化的调度和路径规划决策支持:一种应用于监视和搜索的人工智能和运筹学方法
  • 批准号:
    RGPIN-2021-03495
  • 财政年份:
    2021
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Learning and Search in Decision Domains Featuring Large Action Sets and Uncertainty
具有大型动作集和不确定性的决策域中的学习和搜索
  • 批准号:
    RGPIN-2018-06677
  • 财政年份:
    2021
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Grants Program - Individual
Improving Optimization-Based Scheduling and Path Planning Decision Support: An Artificial Intelligence and Operations Research Approach With Applications to Surveillance and Search
改进基于优化的调度和路径规划决策支持:一种应用于监视和搜索的人工智能和运筹学方法
  • 批准号:
    DGECR-2021-00189
  • 财政年份:
    2021
  • 资助金额:
    $ 2.99万
  • 项目类别:
    Discovery Launch Supplement
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了