CI-P: Toward Unified Tool Support for Linguistic Corpus Annotation

CI-P:走向语言语料库标注的统一工具支持

基本信息

  • 批准号:
    1405863
  • 负责人:
  • 金额:
    $ 10万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2014
  • 资助国家:
    美国
  • 起止时间:
    2014-05-01 至 2015-05-31
  • 项目状态:
    已结题

项目摘要

Development of the computer processing of language is a key scientific and technological capability that is funded by the NSF. In support of these efforts, thousands of texts, comprising hundreds of thousands of words, are hand-processed every year to provide data to train computer algorithms. This annotation is time-consuming, expensive, and difficult, and it is hampered by a long-standing problem: the lack of unified, specialized software tools to assist in annotation and annotation management. Researchers at MIT envision creating a new software infrastructure, called a Unified Annotation Workbench (UAW), that is an off-the-shelf solution to this problem. A UAW will significantly the effectiveness of every dollar spent on annotation. Importantly, a UAW will be useful not only to linguistic annotation community: it will also benefit many scientific and engineering fields that depend on people to annotation-like work. As a small selection, this includes human-computer interaction, cognitive science, cognitive psychology, sociology, psychiatry, and any field related to the digital humanities.Computational linguistics and statistical natural language processing (NLP) are important areas of study, both scientifically and technologically. Advances in these fields are fed by a universal hunger for the analysis of language data for information processing tasks. Large annotated corpora are a key resource that enables these advances. But despite the widely-recognized importance of annotated corpora, the field has a major lack: there is no off-the-shelf, general, unified tool for performing text annotation. Faced with this lack, many language researchers create their own tools from scratch, at significant cost. These tools are usually hastily designed, not released for general use, not maintained, and often redundant with capabilities implemented by others. This leads to lost opportunities, as researchers forego projects that present too many difficulties in tool design; it reduces the ability of researchers to build upon and replicate other?s work, as a critical part of the infrastructure is not available; and this duplication of effort represents a significant waste of resources. In this infrastructure planning project, the MIT team will take three steps toward a Unified Annotation Workbench (UAW): a general, unified, off-the-shelf infrastructure to support corpus annotation. First, they will comprehensively review the state-of-the-art of annotation tools. Second, they will identify potential implementation technologies for a UAW and create software mockups. Third, they will organize a workshop to engage the annotation community as to the best form of a UAW.
语言的计算机处理的发展是一个关键的科学和技术能力,是由美国国家科学基金会资助。为了支持这些努力,每年手工处理数千个文本,包括数十万个单词,以提供训练计算机算法的数据。 这种注释耗时、昂贵且困难,并且受到一个长期存在的问题的阻碍:缺乏统一的、专门的软件工具来辅助注释和注释管理。麻省理工学院的研究人员设想创建一个新的软件基础设施,称为统一注释引擎(UAW),这是一个现成的解决方案,这个问题。一个UAW将显着的有效性,每一美元花在注释。 重要的是,UAW不仅对语言注释社区有用:它还将使许多依赖于人们进行类似注释工作的科学和工程领域受益。 作为一个小的选择,这包括人机交互,认知科学,认知心理学,社会学,精神病学,以及与数字人文相关的任何领域。计算语言学和统计自然语言处理(NLP)是科学和技术的重要研究领域。 这些领域的进步是由对信息处理任务的语言数据分析的普遍渴望所推动的。 大型注释语料库是实现这些进步的关键资源。但是,尽管注释语料库的重要性得到了广泛的认可,该领域仍然存在一个主要的不足:没有现成的、通用的、统一的工具来执行文本注释。 面对这种缺乏,许多语言研究人员从头开始创建自己的工具,花费巨大。这些工具通常是匆忙设计的,没有发布用于一般用途,没有维护,并且经常与其他人实现的功能冗余。这导致失去的机会,因为研究人员放弃项目,目前在工具设计太多的困难,它降低了研究人员的能力,建立和复制其他?作为基础设施的一个关键部分,秘书处的工作是不可用的;这种重复工作是对资源的严重浪费。在这个基础设施规划项目中,麻省理工学院的团队将采取三个步骤来实现统一注释引擎(UAW):一个通用的,统一的,现成的基础设施来支持语料库注释。 首先,他们将全面审查注释工具的最新技术水平。其次,他们将确定UAW的潜在实施技术并创建软件模型。 第三,他们将组织一个研讨会,让注释社区参与到UAW的最佳形式中。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mark Finlayson其他文献

Mark Finlayson的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Mark Finlayson', 18)}}的其他基金

EAGER: SaTC-EDU: Designing and Evaluating Curricular Modules for Inclusive Integration of Artificial Intelligence into Cybersecurity
EAGER:SaTC-EDU:设计和评估课程模块,以将人工智能全面融入网络安全
  • 批准号:
    2039606
  • 财政年份:
    2020
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
CAREER: Learning Multi-Level Narrative Structure
职业:学习多层次叙事结构
  • 批准号:
    1749917
  • 财政年份:
    2018
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
CI-P: Toward Unified Tool Support for Linguistic Corpus Annotation
CI-P:走向语言语料库标注的统一工具支持
  • 批准号:
    1536043
  • 财政年份:
    2014
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant

相似国自然基金

Toward a general theory of intermittent aeolian and fluvial nonsuspended sediment transport
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    55 万元
  • 项目类别:

相似海外基金

Implantable Optoelectronic Devices for Unified Early Diagnosis and Treatment: Toward Creation of Optoelectronic Pharmacolog
用于统一早期诊断和治疗的植入式光电装置:迈向光电药理学的创建
  • 批准号:
    23H05450
  • 财政年份:
    2023
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (S)
A New Avenue toward a Unified Model of Elementary Particles Pioneered by the Mathematical Structure of the Singular Spacetime of Superstrings
超弦奇异时空数学结构开创的基本粒子统一模型新途径
  • 批准号:
    23K03401
  • 财政年份:
    2023
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
AF: Small: Toward A Unified Model of Parallelism And Locality
AF:小:走向并行性和局部性的统一模型
  • 批准号:
    1911245
  • 财政年份:
    2019
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Construction of Nonideal Theory in Ethics: Toward a Unified Theory of Metaethics, Normative ethics, and Applied Ethics
伦理学非理想理论的构建:迈向元伦理学、规范伦理学和应用伦理学的统一理论
  • 批准号:
    19K00034
  • 财政年份:
    2019
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Development of strategy toward the unified total synthesis of natural and artificial daphnan/tigrian diterpenoids and discovery of new functional molecules
天然和人工瑞香/虎香二萜统一全合成策略的制定和新功能分子的发现
  • 批准号:
    19K15554
  • 财政年份:
    2019
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Toward a unified description of fusion reactions around the Coulomb barrier
库仑势垒周围聚变反应的统一描述
  • 批准号:
    18J20565
  • 财政年份:
    2018
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
CSR: Small: A Unified Approach Toward User-specific Improvements of Quality of Experience for Video Streaming
CSR:小:针对特定用户改进视频流体验质量的统一方法
  • 批准号:
    1618931
  • 财政年份:
    2016
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Granularity in brains and cognition: toward a unified model of ASD
大脑和认知的粒度:走向自闭症谱系障碍的统一模型
  • 批准号:
    15K04078
  • 财政年份:
    2015
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Toward a unified account of root phenomena: with focus on topicalization
对根本现象进行统一解释:重点关注主题化
  • 批准号:
    15K02488
  • 财政年份:
    2015
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CI-P: Toward Unified Tool Support for Linguistic Corpus Annotation
CI-P:走向语言语料库标注的统一工具支持
  • 批准号:
    1536043
  • 财政年份:
    2014
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了