Collaborative Research: CCRI: New: Building a Broad Infrastructure for Uniform Meaning Representations

合作研究:CCRI:新:为统一含义表示构建广泛的基础设施

基本信息

  • 批准号:
    2213804
  • 负责人:
  • 金额:
    $ 99.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-01 至 2025-07-31
  • 项目状态:
    未结题

项目摘要

When humans attempt to talk with a computer, our language needs to be translated into a meaning representation that can be processed and understood by the computer. Currently, such translation is done on a task-by-task and language-by-language basis. Such a fragmented approach introduces redundancy and repetition, and is thus inefficient. Uniform Meaning Representation (UMR) is designed as a machine-readable language that all languages, from high-resource languages such as English and Chinese, to low-resource languages like Arapaho, can be translated into. UMR can also be extended to multi-modal settings to represent the content of videos and images, allowing computers to better process and understand the content of these media forms. This project aims to build the necessary infrastructure for translating languages and other media into UMRs. This infrastructure includes tools used to facilitate the translation of human language to UMRs, metrics that can be used to evaluate the quality of UMRs, and an initial collection of UMRs for five languages that have very different linguistic properties: English, Chinese, Arabic, Arapaho, and Quechua, as well as video content that includes both language and gestures for two of those languages.. The project also includes outreach efforts to engage fellow researchers to produce UMRs for additional languages and genres with tutorials, workshops, summer schools, as well as online training materials. Once a sufficient amount of UMRs are created for a language, computer models and algorithms can be trained on these UMRs to automatically produce more UMRs for new data in that language. They can then be used to advance the state of the art for a wide range of downstream human language technologies, ranging from human robot interaction to dialogue systems, from information extraction to question answering, from machine translation to text summarization. The project will also produce UMRs for under-resourced languages and help bring modern language technologies to speakers of those languages, as well as people working on the documentation and/or revitalization of the languages. This project brings together an interdisciplinary team of linguists and computer scientists to jointly buildan infrastructure for Uniform Meaning Representation (UMR), a practical, formal, computationally tractable, and cross-linguistically valid document-level meaning representation of natural language that can impact a wide range of downstream applications that require “deep” natural language understanding (NLU). The UMR infrastructure will consist of UMR-annotated data sets for five languages, including multimodal data sets for two of those languages, English and Arapaho, a UMR annotation interface and relevant training materials, baseline UMR parsing models that fellow NLP researchers can use as a point of comparison when developing more advanced UMR parsing models, metrics for evaluating document-level meaning representations, and a platform for disseminating the UMR data sets, tools and resources to users of the infrastructure. This project also includes a broad range of outreach efforts consisting of workshops, tutorials, summer schools, and a shared task at the end of the project to involve fellow researchers in the NLP community to produce UMRs for additional languages and promote the use of the UMR infrastructure in meaning representation parsing research and downstream applications. The UMR infrastructure promotes the development of general purpose multilingual and multimodal applications in an effort to move away from both language-specific and task-specific models that require repetitive and often conflicting semantic annotation efforts. The ultimate goal of the project is to build a community of NLP researchers that will contribute to the development of UMR-based data and tools, and adopt UMR in downstream applications to advance the state of the art in Natural Language Processing (NLP) in particular and Artificial Intelligence (AI) in general. In particular, the proposed infrastructure promotes access to information technology in languages for traditionally underrepresented groups by providing the necessary tools and resources to develop AI technologies for these languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当人类尝试与计算机交谈时,我们的语言需要翻译成计算机可以处理和理解的含义表示。目前,这种翻译是在逐个任务和逐个语言的基础上完成的。 这种分散的方法引入了冗余和重复,因此效率低下。统一含义表示 (UMR) 被设计为一种机器可读语言,所有语言(从英语和汉语等高资源语言到阿拉帕霍语等低资源语言)都可以翻译成。 UMR还可以扩展到多模态设置来表示视频和图像的内容,使计算机能够更好地处理和理解这些媒体形式的内容。 该项目旨在建立将语言和其他媒体翻译成 UMR 所需的基础设施。该基础设施包括用于促进将人类语言翻译为 UMR 的工具、可用于评估 UMR 质量的指标,以及具有截然不同语言属性的五种语言的 UMR 初始集合:英语、中文、阿拉伯语、阿拉帕霍语和盖丘亚语,以及包含其中两种语言的语言和手势的视频内容。该项目还包括吸引其他研究人员制作 UMR 的外展工作。 通过教程、研讨会、暑期学校以及在线培训材料了解其他语言和流派。一旦为某种语言创建了足够数量的 UMR,就可以在这些 UMR 上训练计算机模型和算法,以自动为该语言的新数据生成更多 UMR。然后,它们可以用于推进各种下游人类语言技术的最新技术,从人类机器人交互到对话系统,从信息提取到问答,从机器翻译到文本摘要。该项目还将为资源贫乏的语言制作 UMR,并帮助将现代语言技术带给这些语言的使用者以及从事这些语言的记录和/或振兴工作的人员。 该项目汇集了由语言学家和计算机科学家组成的跨学科团队,共同构建统一含义表示(UMR)的基础设施,这是一种实用的、形式化的、计算上可处理的、跨语言有效的自然语言文档级含义表示,可以影响需要“深度”自然语言理解(NLU)的广泛下游应用。 UMR 基础设施将由五种语言的 UMR 注释数据集组成,包括其中两种语言(英语和阿拉帕霍语)的多模态数据集、UMR 注释界面和相关培训材料、NLP 研究人员在开发更先进的 UMR 解析模型时可以用作比较点的基线 UMR 解析模型、评估文档级含义表示的指标以及传播平台 向基础设施用户提供 UMR 数据集、工具和资源。该项目还包括广泛的外展工作,包括研讨会、教程、暑期学校以及项目结束时的一项共同任务,即让 NLP 社区的研究人员参与为其他语言生成 UMR,并促进 UMR 基础设施在意义表示解析研究和下游应用中的使用。 UMR 基础设施促进通用多语言和多模式应用程序的开发,以摆脱需要重复且经常冲突的语义注释工作的特定于语言和特定于任务的模型。该项目的最终目标是建立一个 NLP 研究人员社区,为基于 UMR 的数据和工具的开发做出贡献,并在下游应用中采用 UMR,以推进自然语言处理 (NLP) 特别是人工智能 (AI) 领域的最新技术。特别是,拟议的基础设施通过提供必要的工具和资源来开发这些语言的人工智能技术,促进传统上代表性不足的群体获得语言信息技术。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Mapping AMR to UMR: Resources for Adapting Existing Corpora for Cross-Lingual Compatibility
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Julia Bonn;Skatje Myers;Jens E. L. Van Gysel;Lukas Denk;Meagan Vigus;Jin Zhao;Andrew Cowell;W. Bruce Croft;Jan Hajic;James H Martin;Alexis Palmer;Martha Palmer;J. Pustejovsky;Zdenka Uresová;Rosa Vallejos;Nianwen Xue
  • 通讯作者:
    Julia Bonn;Skatje Myers;Jens E. L. Van Gysel;Lukas Denk;Meagan Vigus;Jin Zhao;Andrew Cowell;W. Bruce Croft;Jan Hajic;James H Martin;Alexis Palmer;Martha Palmer;J. Pustejovsky;Zdenka Uresová;Rosa Vallejos;Nianwen Xue
UMR annotation of Multiword Expressions
多词表达式的 UMR 注释
UMR-Writer 2.0: Incorporating a New Keyboard Interface and Workflow into UMR-Writer
UMR annotation of Chinese Verb compounds and related constructions
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haibo Sun;Yifan Zhu;Jin Zhao;Nianwen Xue
  • 通讯作者:
    Haibo Sun;Yifan Zhu;Jin Zhao;Nianwen Xue
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Nianwen Xue其他文献

SMART: A Stratified Machine Reading Test
SMART:分层机器阅读测试
22nd International Conference on Computational Linguistics Proceedings of the Workshop on Cross-Framework and Cross-Domain
第22届国际计算语言学会议跨框架跨领域研讨会论文集
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Johan Bos;E. Briscoe;A. Cahill;John A. Carroll;S. Clark;Ann A. Copestake;D. Flickinger;Josef van Genabith;J. Hockenmaier;A. Joshi;R. Kaplan;Tracy Holloway King;K. Sandra;Dekang Lin;Jan Tore Lønning;Christopher D. Manning;Yusuke Miyao;Joakim Nivre;S. Oepen;Kenji Sagae;Nianwen Xue;Yi Zhang
  • 通讯作者:
    Yi Zhang
Electronic Health Records in Oncology Natural Language Processing and the Oncologic History: Is There a Match?
肿瘤学中的电子健康记录自然语言处理和肿瘤病史:是否匹配?
  • DOI:
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. Warner;Peter G. Anick;Pengyu Hong;Nianwen Xue
  • 通讯作者:
    Nianwen Xue
Proposition Bank II: Delving Deeper
命题库 II:深入研究
  • DOI:
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    0
  • 作者:
    O. Babko;Martha Palmer;Nianwen Xue;A. Joshi;S. Kulick
  • 通讯作者:
    S. Kulick
Towards Overcoming Practical Obstacles to Deploying Deep Active Learning
克服部署深度主动学习的实际障碍
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rong Hu;Brian Mac Namee;Sarah Jane Delany;David Lowell;Zachary Chase Lipton;Byron C. Wal;Xuezhe Ma;Eduard Hovy. 2016;Jeffrey Pennington;R. Socher;Matthew E. Peters;Mohit Iyyer Matt Mark Neumann;Christopher Gardner;Kenton Clark;Lee Luke;Ameya Prabhu;Charles Dognin;Maneesh Singh;Sameer Pradhan;Alessandro Moschitti;Nianwen Xue;Erik F. Tjong;Kim Sang;Fien De
  • 通讯作者:
    Fien De

Nianwen Xue的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Nianwen Xue', 18)}}的其他基金

RI:Medium:Collaborative Research:Developing a uniform meaning representation for natural language processing
RI:中:协作研究:为自然语言处理开发统一的含义表示
  • 批准号:
    1763926
  • 财政年份:
    2018
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
The 2016 NAACL Student Research Workshop
2016年NAACL学生研究研讨会
  • 批准号:
    1616950
  • 财政年份:
    2015
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
CRI CI-P: Building a Community Resource for Temporal Inference in Chinese
CRI CI-P:建立中文时间推理社区资源
  • 批准号:
    0855184
  • 财政年份:
    2009
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
RI: Large: Collaborative Research: Richer Representations for Machine Translation
RI:大型:协作研究:更丰富的机器翻译表示
  • 批准号:
    0910532
  • 财政年份:
    2009
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CISE-MSI: RCBP-ED: CCRI: TechHouse Partnership to Increase the Computer Engineering Research Expansion at Morehouse College
合作研究:CISE-MSI:RCBP-ED:CCRI:TechHouse 合作伙伴关系,以促进莫尔豪斯学院计算机工程研究扩展
  • 批准号:
    2318703
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: A Scalable Hardware and Software Environment Enabling Secure Multi-party Learning
协作研究:CCRI:新:可扩展的硬件和软件环境支持安全的多方学习
  • 批准号:
    2347617
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: NEW: Building a Batteryless Computing Community through Access to Education, Testbeds, and Tools
合作研究:CCRI:新:通过获得教育、测试平台和工具构建无电池计算社区
  • 批准号:
    2235002
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
  • 批准号:
    2235160
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: Syntactic Differencing Infrastructure for Software Evolution Research
合作研究:CCRI:新:软件进化研究的句法差异基础设施
  • 批准号:
    2232594
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: CoMIC: A Collaborative Mobile Immersive Computing Research Infrastructure for Multi-user XR
协作研究:CCRI:新:CoMIC:用于多用户 XR 的协作移动沉浸式计算研究基础设施
  • 批准号:
    2235050
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: New: Distributed Space and Terrestrial Networking Infrastructure for Multi-Constellation Coexistence
合作研究:研究基础设施:CCRI:新:用于多星座共存的分布式空间和地面网络基础设施
  • 批准号:
    2235140
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CISE-MSI: RCBP-ED: CCRI: TechHouse Partnership to Increase the Computer Engineering Research Expansion at Morehouse College
合作研究:CISE-MSI:RCBP-ED:CCRI:TechHouse 合作伙伴关系,以促进莫尔豪斯学院计算机工程研究扩展
  • 批准号:
    2318704
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: Grand: Quori 2.0: Uniting, Broadening, and Sustaining a Research Community Around a Modular Social Robot Platform
协作研究:CCRI:盛大:Quori 2.0:围绕模块化社交机器人平台联合、扩大和维持研究社区
  • 批准号:
    2235042
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Continuing Grant
Collaborative Research: CCRI: Planning-C: A Community for Configurability Open Research and Development (ACCORD)
合作研究:CCRI:Planning-C:可配置性开放研究与开发社区 (ACCORD)
  • 批准号:
    2234909
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了