Collaborative Research: CCRI: New: Building a Broad Infrastructure for Uniform Meaning Representations

合作研究:CCRI:新:为统一含义表示构建广泛的基础设施

基本信息

  • 批准号:
    2213804
  • 负责人:
  • 金额:
    $ 99.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-08-01 至 2025-07-31
  • 项目状态:
    未结题

项目摘要

When humans attempt to talk with a computer, our language needs to be translated into a meaning representation that can be processed and understood by the computer. Currently, such translation is done on a task-by-task and language-by-language basis. Such a fragmented approach introduces redundancy and repetition, and is thus inefficient. Uniform Meaning Representation (UMR) is designed as a machine-readable language that all languages, from high-resource languages such as English and Chinese, to low-resource languages like Arapaho, can be translated into. UMR can also be extended to multi-modal settings to represent the content of videos and images, allowing computers to better process and understand the content of these media forms. This project aims to build the necessary infrastructure for translating languages and other media into UMRs. This infrastructure includes tools used to facilitate the translation of human language to UMRs, metrics that can be used to evaluate the quality of UMRs, and an initial collection of UMRs for five languages that have very different linguistic properties: English, Chinese, Arabic, Arapaho, and Quechua, as well as video content that includes both language and gestures for two of those languages.. The project also includes outreach efforts to engage fellow researchers to produce UMRs for additional languages and genres with tutorials, workshops, summer schools, as well as online training materials. Once a sufficient amount of UMRs are created for a language, computer models and algorithms can be trained on these UMRs to automatically produce more UMRs for new data in that language. They can then be used to advance the state of the art for a wide range of downstream human language technologies, ranging from human robot interaction to dialogue systems, from information extraction to question answering, from machine translation to text summarization. The project will also produce UMRs for under-resourced languages and help bring modern language technologies to speakers of those languages, as well as people working on the documentation and/or revitalization of the languages. This project brings together an interdisciplinary team of linguists and computer scientists to jointly buildan infrastructure for Uniform Meaning Representation (UMR), a practical, formal, computationally tractable, and cross-linguistically valid document-level meaning representation of natural language that can impact a wide range of downstream applications that require “deep” natural language understanding (NLU). The UMR infrastructure will consist of UMR-annotated data sets for five languages, including multimodal data sets for two of those languages, English and Arapaho, a UMR annotation interface and relevant training materials, baseline UMR parsing models that fellow NLP researchers can use as a point of comparison when developing more advanced UMR parsing models, metrics for evaluating document-level meaning representations, and a platform for disseminating the UMR data sets, tools and resources to users of the infrastructure. This project also includes a broad range of outreach efforts consisting of workshops, tutorials, summer schools, and a shared task at the end of the project to involve fellow researchers in the NLP community to produce UMRs for additional languages and promote the use of the UMR infrastructure in meaning representation parsing research and downstream applications. The UMR infrastructure promotes the development of general purpose multilingual and multimodal applications in an effort to move away from both language-specific and task-specific models that require repetitive and often conflicting semantic annotation efforts. The ultimate goal of the project is to build a community of NLP researchers that will contribute to the development of UMR-based data and tools, and adopt UMR in downstream applications to advance the state of the art in Natural Language Processing (NLP) in particular and Artificial Intelligence (AI) in general. In particular, the proposed infrastructure promotes access to information technology in languages for traditionally underrepresented groups by providing the necessary tools and resources to develop AI technologies for these languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当人类试图与计算机交谈时,我们的语言需要被翻译成计算机可以处理和理解的意义表示。目前,这种翻译是在逐个任务和逐个语言的基础上进行的。 这种支离破碎的方法会造成冗余和重复,因此效率低下。统一意义表示(Uniform Meaning Representation,UMR)是一种机器可读的语言,从英语和汉语等高资源语言到阿拉帕霍等低资源语言,所有语言都可以翻译成它。 UMR还可以扩展到多模态设置,以表示视频和图像的内容,让计算机更好地处理和理解这些媒体形式的内容。 该项目旨在建立必要的基础设施,将各种语言和其他媒体翻译成统一报告。该基础设施包括用于促进人类语言到UMR的翻译的工具、可用于评估UMR的质量的度量、以及具有非常不同的语言属性的五种语言的UMR的初始集合:英语、汉语、阿拉伯语、阿拉帕霍和盖丘亚语,以及包括这些语言中的两种的语言和手势的视频内容。该项目还包括外展工作,以吸引研究人员为其他语言和体裁制作UMR,包括教程、讲习班、暑期学校以及在线培训材料。一旦为一种语言创建了足够数量的UMR,就可以在这些UMR上训练计算机模型和算法,以自动为该语言的新数据生成更多的UMR。然后,它们可以用于推进广泛的下游人类语言技术的最新发展,从人机交互到对话系统,从信息提取到问答,从机器翻译到文本摘要。该项目还将为资源不足的语言制作统一语言报告,并帮助将现代语言技术带给这些语言的使用者以及从事这些语言的文献工作和/或振兴工作的人员。 该项目汇集了语言学家和计算机科学家的跨学科团队,共同构建统一意义表示(UMR)的基础设施,这是一种实用,正式,计算上易于处理,跨语言有效的自然语言文档级意义表示,可以影响广泛的下游应用程序,需要“深度”自然语言理解(NLU)。UMR基础设施将由五种语言的UMR注释数据集组成,包括其中两种语言(英语和阿拉帕霍)的多模式数据集、UMR注释界面和相关培训材料、NLP研究人员可以使用的基线UMR解析模型开发更高级的UMR解析模型时用作比较点、评估文档级含义表示的指标以及向基础设施用户传播UMR数据集、工具和资源的平台。该项目还包括广泛的外展工作,包括研讨会,教程,暑期学校,以及在项目结束时的共享任务,让NLP社区的研究人员参与为其他语言制作UMR,并促进在意义表示解析研究和下游应用中使用UMR基础设施。UMR基础设施促进了通用多语言和多模式应用程序的开发,以努力摆脱需要重复且经常冲突的语义注释工作的语言特定和任务特定模型。该项目的最终目标是建立一个NLP研究人员社区,为基于UMR的数据和工具的开发做出贡献,并在下游应用中采用UMR,以推动自然语言处理(NLP)和人工智能(AI)的发展。特别是,通过为传统上代表性不足的群体提供必要的工具和资源,为这些语言开发人工智能技术,促进这些群体获得信息技术。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Mapping AMR to UMR: Resources for Adapting Existing Corpora for Cross-Lingual Compatibility
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Julia Bonn;Skatje Myers;Jens E. L. Van Gysel;Lukas Denk;Meagan Vigus;Jin Zhao;Andrew Cowell;W. Bruce Croft;Jan Hajic;James H Martin;Alexis Palmer;Martha Palmer;J. Pustejovsky;Zdenka Uresová;Rosa Vallejos;Nianwen Xue
  • 通讯作者:
    Julia Bonn;Skatje Myers;Jens E. L. Van Gysel;Lukas Denk;Meagan Vigus;Jin Zhao;Andrew Cowell;W. Bruce Croft;Jan Hajic;James H Martin;Alexis Palmer;Martha Palmer;J. Pustejovsky;Zdenka Uresová;Rosa Vallejos;Nianwen Xue
UMR annotation of Multiword Expressions
多词表达式的 UMR 注释
UMR-Writer 2.0: Incorporating a New Keyboard Interface and Workflow into UMR-Writer
UMR annotation of Chinese Verb compounds and related constructions
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Haibo Sun;Yifan Zhu;Jin Zhao;Nianwen Xue
  • 通讯作者:
    Haibo Sun;Yifan Zhu;Jin Zhao;Nianwen Xue
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Nianwen Xue其他文献

SMART: A Stratified Machine Reading Test
SMART:分层机器阅读测试
22nd International Conference on Computational Linguistics Proceedings of the Workshop on Cross-Framework and Cross-Domain
第22届国际计算语言学会议跨框架跨领域研讨会论文集
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Johan Bos;E. Briscoe;A. Cahill;John A. Carroll;S. Clark;Ann A. Copestake;D. Flickinger;Josef van Genabith;J. Hockenmaier;A. Joshi;R. Kaplan;Tracy Holloway King;K. Sandra;Dekang Lin;Jan Tore Lønning;Christopher D. Manning;Yusuke Miyao;Joakim Nivre;S. Oepen;Kenji Sagae;Nianwen Xue;Yi Zhang
  • 通讯作者:
    Yi Zhang
Electronic Health Records in Oncology Natural Language Processing and the Oncologic History: Is There a Match?
肿瘤学中的电子健康记录自然语言处理和肿瘤病史:是否匹配?
  • DOI:
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. Warner;Peter G. Anick;Pengyu Hong;Nianwen Xue
  • 通讯作者:
    Nianwen Xue
Proposition Bank II: Delving Deeper
命题库 II:深入研究
  • DOI:
  • 发表时间:
    2004
  • 期刊:
  • 影响因子:
    0
  • 作者:
    O. Babko;Martha Palmer;Nianwen Xue;A. Joshi;S. Kulick
  • 通讯作者:
    S. Kulick
Towards Overcoming Practical Obstacles to Deploying Deep Active Learning
克服部署深度主动学习的实际障碍
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rong Hu;Brian Mac Namee;Sarah Jane Delany;David Lowell;Zachary Chase Lipton;Byron C. Wal;Xuezhe Ma;Eduard Hovy. 2016;Jeffrey Pennington;R. Socher;Matthew E. Peters;Mohit Iyyer Matt Mark Neumann;Christopher Gardner;Kenton Clark;Lee Luke;Ameya Prabhu;Charles Dognin;Maneesh Singh;Sameer Pradhan;Alessandro Moschitti;Nianwen Xue;Erik F. Tjong;Kim Sang;Fien De
  • 通讯作者:
    Fien De

Nianwen Xue的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Nianwen Xue', 18)}}的其他基金

RI:Medium:Collaborative Research:Developing a uniform meaning representation for natural language processing
RI:中:协作研究:为自然语言处理开发统一的含义表示
  • 批准号:
    1763926
  • 财政年份:
    2018
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
The 2016 NAACL Student Research Workshop
2016年NAACL学生研究研讨会
  • 批准号:
    1616950
  • 财政年份:
    2015
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
CRI CI-P: Building a Community Resource for Temporal Inference in Chinese
CRI CI-P:建立中文时间推理社区资源
  • 批准号:
    0855184
  • 财政年份:
    2009
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
RI: Large: Collaborative Research: Richer Representations for Machine Translation
RI:大型:协作研究:更丰富的机器翻译表示
  • 批准号:
    0910532
  • 财政年份:
    2009
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant

相似国自然基金

Research on Quantum Field Theory without a Lagrangian Description
  • 批准号:
    24ZR1403900
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
Cell Research
  • 批准号:
    31224802
  • 批准年份:
    2012
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research
  • 批准号:
    31024804
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Cell Research (细胞研究)
  • 批准号:
    30824808
  • 批准年份:
    2008
  • 资助金额:
    24.0 万元
  • 项目类别:
    专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
  • 批准号:
    10774081
  • 批准年份:
    2007
  • 资助金额:
    45.0 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: CISE-MSI: RCBP-ED: CCRI: TechHouse Partnership to Increase the Computer Engineering Research Expansion at Morehouse College
合作研究:CISE-MSI:RCBP-ED:CCRI:TechHouse 合作伙伴关系,以促进莫尔豪斯学院计算机工程研究扩展
  • 批准号:
    2318703
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: A Scalable Hardware and Software Environment Enabling Secure Multi-party Learning
协作研究:CCRI:新:可扩展的硬件和软件环境支持安全的多方学习
  • 批准号:
    2347617
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: NEW: Building a Batteryless Computing Community through Access to Education, Testbeds, and Tools
合作研究:CCRI:新:通过获得教育、测试平台和工具构建无电池计算社区
  • 批准号:
    2235002
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: ENS: Enhanced Open Networked Airborne Computing Platform
合作研究:研究基础设施:CCRI:ENS:增强型开放网络机载计算平台
  • 批准号:
    2235160
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: Syntactic Differencing Infrastructure for Software Evolution Research
合作研究:CCRI:新:软件进化研究的句法差异基础设施
  • 批准号:
    2232594
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: CoMIC: A Collaborative Mobile Immersive Computing Research Infrastructure for Multi-user XR
协作研究:CCRI:新:CoMIC:用于多用户 XR 的协作移动沉浸式计算研究基础设施
  • 批准号:
    2235050
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: Research Infrastructure: CCRI: New: Distributed Space and Terrestrial Networking Infrastructure for Multi-Constellation Coexistence
合作研究:研究基础设施:CCRI:新:用于多星座共存的分布式空间和地面网络基础设施
  • 批准号:
    2235140
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: Grand: Quori 2.0: Uniting, Broadening, and Sustaining a Research Community Around a Modular Social Robot Platform
协作研究:CCRI:盛大:Quori 2.0:围绕模块化社交机器人平台联合、扩大和维持研究社区
  • 批准号:
    2235042
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Continuing Grant
Collaborative Research: CCRI: Planning-C: A Community for Configurability Open Research and Development (ACCORD)
合作研究:CCRI:Planning-C:可配置性开放研究与开发社区 (ACCORD)
  • 批准号:
    2234909
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CCRI: New: A Research News Recommender Infrastructure with Live Users for Algorithm and Interface Experimentation
合作研究:CCRI:新:研究新闻推荐基础设施与实时用户进行算法和界面实验
  • 批准号:
    2232554
  • 财政年份:
    2023
  • 资助金额:
    $ 99.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了