EAGER: Developing data and evaluation methods to assess the generality and robustness of AI systems for abstraction and analogy-making

EAGER:开发数据和评估方法来评估人工智能系统进行抽象和类比的通用性和鲁棒性

基本信息

  • 批准号:
    2139983
  • 负责人:
  • 金额:
    $ 19.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-09-01 至 2024-02-29
  • 项目状态:
    已结题

项目摘要

The ability of humans to make conceptual abstractions and analogies is at the root of many of our most important cognitive capabilities, such as learning new concepts from small numbers of examples, flexibly adapting our prior knowledge and experience to new situations, and communicating our knowledge to others. While AI has made dramatic progress over the last decade in areas such as vision, natural language processing, and robotics, current AI systems almost entirely lack the ability to form humanlike abstractions and analogies. The lack of such abilities is in part responsible for the lack of robustness in current AI systems, as well as their difficulties with extrapolating what they have learned to diverse situations. While there have been many efforts in past AI research on this topic, each individual effort has generally focused on a specific problem domain, without careful evaluation of the AI system’s robustness within its domain or its generality across different domains. In this project we will promote progress in AI by creating a web-based platform that offers a diverse set of abstraction and analogy-making challenges for the research community as well as new evaluation methods that test for generality and robustness within and across different challenge domains. We will use our platform to evaluate selected existing AI approaches and to measure human performance on our challenges in order to compare with AI systems’ performance. Our work will contribute to the AI research community by spurring new approaches and evaluation methods for abstraction and analogy-making in machines, and will contribute more broadly via the development of methods for robust and generalizable AI systems. Our specific research plan is to (1) curate an initial suite of idealized challenge domains inspired by Hofstadter’s letter-string analogies, Raven’s progressive matrices, Bongard problems, and Chollet’s Abstraction and Reasoning Corpus; (2) develop evaluation methods along dimensions such as robustness to variations on a particular concept, generality across domains, and scalability to more complex instances of a problem; (3) evaluate selected AI methods for abstraction and analogy using our evaluation methods; and (4) measure human benchmarks on our challenge suite using paid participants on the Amazon Mechanical Turk platform. At the end of the project period, we will have demonstrated the utility and promise of our challenge problems and evaluations, and will have gained insight into their limitations. This work will set the stage for future efforts on expanding our challenge suite, improving our evaluation metrics, and developing and evaluating novel AI approaches to abstraction and analogy.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人类进行概念抽象和类比的能力是我们许多最重要的认知能力的根源,例如从少量的例子中学习新概念,灵活地调整我们先前的知识和经验以适应新的情况,并将我们的知识传达给他人。虽然人工智能在过去十年中在视觉、自然语言处理和机器人等领域取得了巨大进展,但目前的人工智能系统几乎完全缺乏形成类似人类的抽象和类比的能力。 缺乏这种能力是当前人工智能系统缺乏鲁棒性的部分原因,也是它们难以将学到的知识外推到不同情况的原因。 虽然在过去的人工智能研究中已经有很多关于这个主题的努力,但每个单独的努力通常都集中在特定的问题领域,而没有仔细评估人工智能系统在其领域内的鲁棒性或其在不同领域的通用性。 在这个项目中,我们将通过创建一个基于网络的平台来促进人工智能的进步,该平台为研究界提供了一系列不同的抽象和类比挑战,以及新的评估方法,这些方法可以测试不同挑战领域内的通用性和鲁棒性。 我们将使用我们的平台来评估选定的现有人工智能方法,并衡量人类在挑战中的表现,以便与人工智能系统的表现进行比较。 我们的工作将通过促进机器中抽象和类比的新方法和评估方法为人工智能研究社区做出贡献,并将通过开发强大和可推广的人工智能系统的方法做出更广泛的贡献。 我们的具体研究计划是:(1)策划一套最初的理想化挑战领域,灵感来自Hofstadter的字母串类比,Raven的渐进矩阵,Bongard问题和Chollet的抽象和推理语料库;(2)开发评估方法沿着维度,如对特定概念变化的鲁棒性,跨领域的通用性,以及对更复杂问题实例的可扩展性;(3)使用我们的评估方法评估选定的AI方法的抽象和类比;(4)使用Amazon Mechanical Turk平台上的付费参与者在我们的挑战套件上测量人类基准。 在项目结束时,我们将展示我们的挑战问题和评估的实用性和前景,并将深入了解它们的局限性。这项工作将为未来的努力奠定基础,扩大我们的挑战套件,改善我们的评估指标,并开发和评估新的人工智能方法来抽象和类比。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluating Understanding on Conceptual Abstraction Benchmarks
  • DOI:
    10.48550/arxiv.2206.14187
  • 发表时间:
    2022-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Victor Vikram Odouard;M. Mitchell
  • 通讯作者:
    Victor Vikram Odouard;M. Mitchell
How do we know how smart AI systems are?
我们如何知道人工智能系统有多智能?
  • DOI:
    10.1126/science.adj5957
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    56.9
  • 作者:
    Mitchell, Melanie
  • 通讯作者:
    Mitchell, Melanie
Rethink reporting of evaluation results in AI
重新思考人工智能评估结果的报告
  • DOI:
    10.1126/science.adf6369
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    56.9
  • 作者:
    Burnell, Ryan;Schellaert, Wout;Burden, John;Ullman, Tomer D.;Martinez-Plumed, Fernando;Tenenbaum, Joshua B.;Rutar, Danaja;Cheke, Lucy G.;Sohl-Dickstein, Jascha;Mitchell, Melanie
  • 通讯作者:
    Mitchell, Melanie
The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain
ConceptARC 基准:评估 ARC 领域的理解和泛化
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Melanie Mitchell其他文献

Title: Evolving Cellular Automata with Genetic Algorithms: Analyzing Asynchronous Updates and Small World Topologies
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Melanie Mitchell
  • 通讯作者:
    Melanie Mitchell
Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models
使用反事实任务评估大型语言模型中类比推理的通用性
  • DOI:
    10.48550/arxiv.2402.08955
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Martha Lewis;Melanie Mitchell
  • 通讯作者:
    Melanie Mitchell
Mitchell, M. (2019). Artificial Intelligence Hits the Barrier of Meaning. Information, 10(2), 51.
米切尔,M.(2019)。
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Melanie Mitchell
  • 通讯作者:
    Melanie Mitchell
Ubiquity symposium: Biological Computation
无处不在的研讨会:生物计算
  • DOI:
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Melanie Mitchell
  • 通讯作者:
    Melanie Mitchell
Prospects for prenatal gene therapy in disorders causing mental retardation.
产前基因治疗导致智力低下的疾病的前景。

Melanie Mitchell的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Melanie Mitchell', 18)}}的其他基金

AI Institute: Planning: Foundations of Intelligence in Natural and Artificial Systems
人工智能研究所:规划:自然和人工系统中的智能基础
  • 批准号:
    2020103
  • 财政年份:
    2020
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Workshop on Artificial Intelligence and the "Barrier of Meaning"
人工智能与“意义障碍”研讨会
  • 批准号:
    1832717
  • 财政年份:
    2018
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
RI: Small: Visual Situation Recognition: An Integration of Deep Networks and Analogy-Making
RI:小:视觉情境识别:深度网络和类比的集成
  • 批准号:
    1423651
  • 财政年份:
    2014
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: A Scalable Architecture for Image Interpretation
RI:小型:协作研究:图像解释的可扩展架构
  • 批准号:
    1018967
  • 财政年份:
    2010
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Evolving Cellular Automata: With Genetic Algorithms
进化元胞自动机:使用遗传算法
  • 批准号:
    9705830
  • 财政年份:
    1998
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Continuing Grant
Postdoc: Automatic Programming of Decentralized Parallel Architectures
博士后:去中心化并行架构的自动编程
  • 批准号:
    9503162
  • 财政年份:
    1995
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324714
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
RAPID: Developing an Interactive Dashboard for Collecting and Curating Traffic Data after the March 26, 2024 Francis Scott Key Bridge Collapse
RAPID:开发交互式仪表板,用于收集和管理 2024 年 3 月 26 日 Francis Scott Key 大桥倒塌后的交通数据
  • 批准号:
    2426947
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324709
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324713
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324710
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324711
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
CAS: Developing Data-Driven, Automated Methodology to Understand and Control Light-Driven Catalytic Processes
CAS:开发数据驱动的自动化方法来理解和控制光驱动的催化过程
  • 批准号:
    2350257
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Continuing Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324712
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Standard Grant
Developing statistical methods for structural change analysis using panel data
使用面板数据开发结构变化分析的统计方法
  • 批准号:
    24K16343
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Developing Real-world Understanding of Medical Music therapy using the Electronic Health Record (DRUMMER)
使用电子健康记录 (DRUMMER) 培养对医学音乐治疗的真实理解
  • 批准号:
    10748859
  • 财政年份:
    2024
  • 资助金额:
    $ 19.97万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了