Testing and improving methods for efficient annotation through the construction of a large parsed corpus

通过构建大型解析语料库来测试和改进有效注释的方法

基本信息

  • 批准号:
    1147499
  • 负责人:
  • 金额:
    $ 28.74万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2012
  • 资助国家:
    美国
  • 起止时间:
    2012-07-15 至 2016-06-30
  • 项目状态:
    已结题

项目摘要

Electronic corpora annotated with linguistic information play a crucial role in natural language processing (NLP) and in linguistic research. Treebanks (corpora annotated with syntactic information) are especially important since they mark the grammatical structure necessary for understanding sentence and discourse meaning. For NLP, treebanks provide testbeds for developing language understanding systems. For linguistic research, they provide the basis for precise and replicable studies of the patterns of use of syntactic forms. Unfortunately, accurate annotation is difficult. Automatic parsers have relatively high error rates and the correction of these errors by human annotators is both slow and itself error-prone. Based on recent advances that Dr. Kroch and his collaborators have made in the creation and quality control of three large treebanks for different languages, Dr. Kroch proposes a major effort to improve corpus construction through the creation of a two-million-word English treebank. Along with this useful and substantial result, the project will develop and test hypotheses on speeding up treebank construction. The work will be guided by two complementary strategies. The first aims to reduce the parser's error rate by enhancing the part-of-speech (POS) tagged input to the parser while the second aims to make the correction of residual errors more efficient by shifting some of the burden from human to automatic error detection and correction. Speeding up the construction of accurate, consistent treebanks will improve the size and quality of training data for parsers, leading to improved performance in real-world NLP applications that rely on parsing. The availability of larger treebanks and of better methods for constructing them will also improve linguistic research. Moreover, as treebanks grow in size, they will become more useful in literary and historical studies, where the rhetorical structure of texts will become investigable in a more precise way than is currently possible.In addition to the intellectual merit of the proposed research and the impact it can be expected to have on text-based research that relies on automated processing techniques, the project will provide valuable training opportunities for graduate and undergraduate students. In contributing to improvements in automated techniques for language processing, this project may also benefit the analytic needs in industry and government security.
标注语言信息的电子语料库在自然语言处理和语言学研究中起着至关重要的作用。树库(标注了句法信息的语料库)尤其重要,因为它们标记了理解句子和话语意义所必需的语法结构。对于NLP,树库为开发语言理解系统提供了测试平台。对于语言学研究来说,它们为精确和可复制地研究句法形式的使用模式提供了基础。不幸的是,准确的注释是困难的。自动解析器的错误率相对较高,人工注释器对这些错误的纠正既缓慢又容易出错。基于Kroch博士和他的合作者在三个不同语言的大型树库的创建和质量控制方面取得的最新进展,Kroch博士建议通过创建一个200万单词的英语树库来改进语料库的构建。随着这一有用和实质性的结果,该项目将开发和测试加速树库建设的假设。这项工作将以两项相互补充的战略为指导。第一个目标是通过增强给解析器的词性标记输入来降低解析器的错误率,而第二个目标是通过将一些负担从人工转移到自动错误检测和纠正来提高残余错误的纠正效率。加快构建准确、一致的树库将提高解析器训练数据的大小和质量,从而提高依赖解析的现实世界NLP应用程序的性能。更大的树库的可用性和更好的构建树库的方法也将促进语言学研究。此外,随着树库规模的增长,它们将在文学和历史研究中变得更有用,在这些研究中,文本的修辞结构将以比目前可能的更精确的方式进行调查。除了提出的研究的智力价值和它可以对依赖于自动化处理技术的基于文本的研究产生预期的影响外,该项目还将为研究生和本科生提供宝贵的培训机会。在为语言处理的自动化技术的改进做出贡献的同时,这个项目也可能有利于工业和政府安全的分析需求。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Anthony Kroch其他文献

Anthony Kroch的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Anthony Kroch', 18)}}的其他基金

Diachronic Generative Syntax Conference XIII: Support for meeting and associated workshop
历时生成语法会议第十三届:支持会议和相关研讨会
  • 批准号:
    1104768
  • 财政年份:
    2011
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Standard Grant
U.S.-Iceland Linguistics Workshop on Change and Variation in Icelandic Syntax
美国-冰岛语言学研讨会关于冰岛句法的变化和变异
  • 批准号:
    0639066
  • 财政年份:
    2006
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Standard Grant
SGER: Enriching Parser Output for Treebank Construction
SGER:丰富树库构建的解析器输出
  • 批准号:
    0527116
  • 财政年份:
    2005
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Standard Grant
A Parsed Historical Corpus of Modern English
现代英语历史语料库解析
  • 批准号:
    0418061
  • 财政年份:
    2004
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Standard Grant
The Emergence of Modern English Syntax
现代英语语法的出现
  • 批准号:
    9905488
  • 财政年份:
    1999
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Standard Grant
The Historical Syntax of Middle English from a Comparative Perspective
比较视角下的中古英语历史句法
  • 批准号:
    9511368
  • 财政年份:
    1996
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Continuing grant
Head/Complement Order in the History of the West Germanic Clause
西日耳曼语子句历史中的主语/补语顺序
  • 批准号:
    8919701
  • 财政年份:
    1990
  • 资助金额:
    $ 28.74万
  • 项目类别:
    Standard Grant

相似国自然基金

Improving modelling of compact binary evolution.
  • 批准号:
    10903001
  • 批准年份:
    2009
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Improving rapid phenotypic drug susceptibility testing for drug resistant tuberculosis in high-burden areas
完善高负担地区耐药结核病快速表型药敏检测
  • 批准号:
    10658013
  • 财政年份:
    2023
  • 资助金额:
    $ 28.74万
  • 项目类别:
Oxytocin sensitivity and postpartum hemorrhage: testing genetic and epigenetic biomarkers for improving maternal morbidity
催产素敏感性和产后出血:测试遗传和表观遗传生物标志物以改善孕产妇发病率
  • 批准号:
    10750619
  • 财政年份:
    2023
  • 资助金额:
    $ 28.74万
  • 项目类别:
Pilot Testing a Virtual Mindfulness-Based Intervention Aimed at Improving Reintegrating Veterans' Health Outcomes
试点测试基于虚拟正念的干预措施,旨在改善退伍军人重返社会的健康结果
  • 批准号:
    10749770
  • 财政年份:
    2023
  • 资助金额:
    $ 28.74万
  • 项目类别:
Improving outcomes from cardiac rehabilitation among older adults through exercise testing and individualized exercise intensity prescriptions
通过运动测试和个性化运动强度处方改善老年人心脏康复的结果
  • 批准号:
    10672281
  • 财政年份:
    2022
  • 资助金额:
    $ 28.74万
  • 项目类别:
Improving HIV testing, linkage, and retention in care for men through U=U messaging
通过 U=U 信息传递改善男性的 HIV 检测、联系和护理保留
  • 批准号:
    10626959
  • 财政年份:
    2022
  • 资助金额:
    $ 28.74万
  • 项目类别:
An Adaptive Intervention Trial of Home Testing with Behavioral Nudges for Improving COVID-19 Testing and Prevention among People Affected by Diabetes
通过行为助推进行家庭检测的适应性干预试验,以改善糖尿病患者的 COVID-19 检测和预防
  • 批准号:
    10447445
  • 财政年份:
    2022
  • 资助金额:
    $ 28.74万
  • 项目类别:
Improving HIV testing, linkage, and retention in care for men through U=U messaging
通过 U=U 信息传递改善男性的 HIV 检测、联系和护理保留
  • 批准号:
    10483486
  • 财政年份:
    2022
  • 资助金额:
    $ 28.74万
  • 项目类别:
An Adaptive Intervention Trial of Home Testing with Behavioral Nudges for Improving COVID-19 Testing and Prevention among People Affected by Diabetes
通过行为助推进行家庭检测的适应性干预试验,以改善糖尿病患者的 COVID-19 检测和预防
  • 批准号:
    10548235
  • 财政年份:
    2022
  • 资助金额:
    $ 28.74万
  • 项目类别:
The ADELANTE Trial: Testing a multi-level approach for improving household food insecurity and glycemic control among Latinos with diabetes
ADELANTE 试验:测试一种多层次方法,以改善拉丁裔糖尿病患者的家庭粮食不安全和血糖控制
  • 批准号:
    10461912
  • 财政年份:
    2021
  • 资助金额:
    $ 28.74万
  • 项目类别:
The ADELANTE Trial: Testing a multi-level approach for improving household food insecurity and glycemic control among Latinos with diabetes
ADELANTE 试验:测试一种多层次方法,以改善拉丁裔糖尿病患者的家庭粮食不安全和血糖控制
  • 批准号:
    10309810
  • 财政年份:
    2021
  • 资助金额:
    $ 28.74万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了