RR: CompCog: A challenge suite for statistical word segmentation

RR:CompCog:统计分词挑战套件

基本信息

  • 批准号:
    1918813
  • 负责人:
  • 金额:
    $ 70.65万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-09-01 至 2024-08-31
  • 项目状态:
    已结题

项目摘要

A central scientific puzzle is how children manage to acquire language despite limited and inconsistent explicit feedback. Numerous mathematical results seem to suggest that acquiring a language should be impossible; the fact that children do it every day reveals a deep gap in the science of learning. Some research suggests that children make considerable headway by detecting patterns in what they hear even without any explicit teaching or even knowing what is being talked about ("statistical" or "unsupervised" learning). Indeed, much of the recent progress in "teaching" computers to understand language has made use of just this strategy. Even more compelling: numerous experiments have shown that both adults and infants are able to learn at least a little bit about language this way. How much they can learn remains unclear. A central difficulty is that mathematically, there are many different methods for pattern-detection and it is unclear which one(s) humans use. This is important because some work better than others; and whether unsupervised pattern-detection can help solve the mystery of language learning depends on which method is used. The purpose of this project is to put together a "challenge suite": a dataset that can be used to systematically evaluate and compare the possibilities. Such challenge suites have been instrumental in advancing artificial intelligence. This project also serves as a proof-of-concept to determine whether challenge suites are similarly beneficial for the science of learning, and at the same time provide valuable resources and training to the research community. To develop the challenge suite, the investigators will first conduct a comprehensive, quantitative literature review (meta-analysis) focusing on the largest body of work on unsupervised pattern-detection: adult statistical word segmentation. Aided by outside experimenters, the meta-analysis will be used to identify 10-15 key experiments. As a group, these experiments will establish a basic set of facts about adult statistical word segmentation that any theory must account for. For these reasons, the project will focus particularly on theoretically-central phenomena that distinguish different theories. To measure different aspects of linguistic pattern-detection, each experiment will involve large numbers of subjects (approx. 1,200 each) and a subset of 3-5 experiments with an even larger number (approx. 24,000 each). A tool will be developed to enable researchers to compare any mathematical theory of learning against these data, determining how well it matches human performance. In order to determine how the mathematical theory could learn language, a database of transcripts of child-directed speech in 3-5 languages will be developed. Each theory will also be tested/trained on the database to see how much it could learn about those languages. The challenge suite will be made available to all researchers as a download and also through a website where researchers can submit their models and compare results against those of other models. This work will be publicized to the scientific community through a closing workshop focused on models of unsupervised word segmentation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
一个核心的科学难题是,儿童如何在有限和不一致的明确反馈下获得语言。大量的数学结果似乎表明,习得一门语言是不可能的;儿童每天都在学习的事实揭示了学习科学中的一个巨大缺口。一些研究表明,即使没有任何明确的教学,甚至不知道正在谈论什么,儿童也能通过检测他们所听到的模式(“统计”或“无监督”学习)取得相当大的进展。事实上,最近在“教”计算机理解语言方面的许多进展都利用了这种策略。更令人信服的是:大量的实验表明,成年人和婴儿都能通过这种方式学习至少一点语言。他们能学到多少还不清楚。 一个核心的困难是,在数学上,有许多不同的方法来检测模式,目前还不清楚人类使用哪一种。这一点很重要,因为有些方法比其他方法效果更好;无监督模式检测是否有助于解决语言学习之谜取决于使用哪种方法。该项目的目的是建立一个“挑战套件”:一个可用于系统评估和比较各种可能性的数据集。此类挑战套件在推进人工智能方面发挥了重要作用。该项目还可以作为概念验证,以确定挑战套件是否对学习科学同样有益,同时为研究界提供宝贵的资源和培训。 为了开发挑战套件,研究人员将首先进行全面的定量文献综述(荟萃分析),重点关注无监督模式检测方面最大的工作:成人统计分词。在外部实验者的帮助下, 元分析将用于确定10-15项关键实验。作为一个群体,这些实验将建立一套关于成人统计分词的基本事实,任何理论都必须考虑到这一点。由于这些原因,该项目将特别关注区分不同理论的理论中心现象。 为了测量语言模式检测的不同方面,每个实验都将涉及大量的受试者(约100名)。1,200每个)和一个子集的3-5个实验,甚至更大的数字(约。24,000)。将开发一种工具,使研究人员能够将任何学习的数学理论与这些数据进行比较,确定它与人类表现的匹配程度。为了确定数学理论如何学习语言,将开发一个3-5种语言的儿童指导语音转录数据库。每个理论还将在数据库上进行测试/训练,看看它可以对这些语言了解多少。 挑战套件将以下载的形式提供给所有研究人员,也可以通过一个网站提供,研究人员可以提交他们的模型并将结果与其他模型进行比较。这项工作将通过一个专注于无监督分词模型的闭幕研讨会向科学界宣传。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的知识价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Evaluating unsupervised word segmentation in adults: a meta-analysis
评估成人无监督分词:荟萃分析
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Joshua Hartshorne其他文献

Joshua Hartshorne的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Joshua Hartshorne', 18)}}的其他基金

CAREER: How Many Intuitive Physics Systems are There, and What Do They Mean for Physics Education
职业:有多少直观的物理系统,它们对物理教育意味着什么
  • 批准号:
    2238912
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Continuing Grant
HNDS-I: Pushkin: Enabling large-scale citizen science data collection for the social, behavioral, and economic sciences
HNDS-I:普希金:为社会、行为和经济科学实现大规模公民科学数据收集
  • 批准号:
    2318474
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
POSE: Phase I: An open-source ecosystem for massive online experiments and citizen science
POSE:第一阶段:用于大规模在线实验和公民科学的开源生态系统
  • 批准号:
    2229631
  • 财政年份:
    2022
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
RAPID: Collaborative Research: A "Citizen Science" approach to COVID-19 social distancing effects on children's language development
RAPID:合作研究:采用“公民科学”方法研究 COVID-19 社交距离对儿童语言发展的影响
  • 批准号:
    2030106
  • 财政年份:
    2020
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: A virtual workshop on conducting language research online: Enhancing the resilience of the language sciences in a time of social distancing
合作研究:在线进行语言研究的虚拟研讨会:在社会疏远时期增强语言科学的弹性
  • 批准号:
    2029637
  • 财政年份:
    2020
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF2026: EAGER: A Playground and Proposal for Growing an AGI.
合作研究:NSF2026:EAGER:发展 AGI 的游乐场和提案。
  • 批准号:
    2033938
  • 财政年份:
    2020
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Workshop on Events in Language and Cognition 2016
2016年语言与认知活动研讨会
  • 批准号:
    1606285
  • 财政年份:
    2016
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
CompCog: Large-scale, empirically based, publicly accessible database of argument structure to support experimental and computational research
CompCog:大规模、基于经验、可公开访问的论证结构数据库,支持实验和计算研究
  • 批准号:
    1551834
  • 财政年份:
    2016
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant

相似海外基金

CompCog: Deep causal inference grounds the perception of cognitive objects in speech
CompCog:深层因果推理为语音中认知对象的感知奠定了基础
  • 批准号:
    2240349
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312374
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312373
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
CompCog: Towards a unified account of when external cues are beneficial or detrimental during memory search
CompCog:对记忆搜索过程中外部线索何时有益或有害进行统一解释
  • 批准号:
    2316716
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
CompCog: RI: Small: Human-like semantic grammar induction through knowledge distillation from pre-trained language models
CompCog:RI:Small:通过预训练语言模型的知识蒸馏进行类人语义语法归纳
  • 批准号:
    2313140
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: Modeling Search within the Mental Lexicon
合作研究:CompCog:心理词典中的建模搜索
  • 批准号:
    2235362
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: Modeling Search within the Mental Lexicon
合作研究:CompCog:心理词典中的建模搜索
  • 批准号:
    2235363
  • 财政年份:
    2023
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
CompCog: HNDS-R: Self-Supervision of Visual Learning From Spatiotemporal Context
CompCog:HNDS-R:时空背景下视觉学习的自我监督
  • 批准号:
    2216127
  • 财政年份:
    2022
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
CompCog: Computational Models of Plasticity and Learning in Speech Perception
CompCog:语音感知中可塑性和学习的计算模型
  • 批准号:
    2120834
  • 财政年份:
    2021
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: Psychological, Computational, and Neural Adequacy in a Deep Learning Model of Human Speech Recognition
合作研究:CompCog:人类语音识别深度学习模型中的心理、计算和神经充分性
  • 批准号:
    2043903
  • 财政年份:
    2021
  • 资助金额:
    $ 70.65万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了