权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: CCRI: New: Building a Broad Infrastructure for Uniform Meaning Representations

合作研究：CCRI：新：为统一含义表示构建广泛的基础设施

基本信息

批准号：
2213804
负责人：
Nianwen Xue
金额：
$ 99.97万
依托单位：
Brandeis University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-08-01 至 2025-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2213804&HistoricalAwards=false
关键词：
Collaborative Research CCRI New Building

项目摘要

When humans attempt to talk with a computer, our language needs to be translated into a meaning representation that can be processed and understood by the computer. Currently, such translation is done on a task-by-task and language-by-language basis. Such a fragmented approach introduces redundancy and repetition, and is thus inefficient. Uniform Meaning Representation (UMR) is designed as a machine-readable language that all languages, from high-resource languages such as English and Chinese, to low-resource languages like Arapaho, can be translated into. UMR can also be extended to multi-modal settings to represent the content of videos and images, allowing computers to better process and understand the content of these media forms. This project aims to build the necessary infrastructure for translating languages and other media into UMRs. This infrastructure includes tools used to facilitate the translation of human language to UMRs, metrics that can be used to evaluate the quality of UMRs, and an initial collection of UMRs for five languages that have very different linguistic properties: English, Chinese, Arabic, Arapaho, and Quechua, as well as video content that includes both language and gestures for two of those languages.. The project also includes outreach efforts to engage fellow researchers to produce UMRs for additional languages and genres with tutorials, workshops, summer schools, as well as online training materials. Once a sufficient amount of UMRs are created for a language, computer models and algorithms can be trained on these UMRs to automatically produce more UMRs for new data in that language. They can then be used to advance the state of the art for a wide range of downstream human language technologies, ranging from human robot interaction to dialogue systems, from information extraction to question answering, from machine translation to text summarization. The project will also produce UMRs for under-resourced languages and help bring modern language technologies to speakers of those languages, as well as people working on the documentation and/or revitalization of the languages. This project brings together an interdisciplinary team of linguists and computer scientists to jointly buildan infrastructure for Uniform Meaning Representation (UMR), a practical, formal, computationally tractable, and cross-linguistically valid document-level meaning representation of natural language that can impact a wide range of downstream applications that require “deep” natural language understanding (NLU). The UMR infrastructure will consist of UMR-annotated data sets for five languages, including multimodal data sets for two of those languages, English and Arapaho, a UMR annotation interface and relevant training materials, baseline UMR parsing models that fellow NLP researchers can use as a point of comparison when developing more advanced UMR parsing models, metrics for evaluating document-level meaning representations, and a platform for disseminating the UMR data sets, tools and resources to users of the infrastructure. This project also includes a broad range of outreach efforts consisting of workshops, tutorials, summer schools, and a shared task at the end of the project to involve fellow researchers in the NLP community to produce UMRs for additional languages and promote the use of the UMR infrastructure in meaning representation parsing research and downstream applications. The UMR infrastructure promotes the development of general purpose multilingual and multimodal applications in an effort to move away from both language-specific and task-specific models that require repetitive and often conflicting semantic annotation efforts. The ultimate goal of the project is to build a community of NLP researchers that will contribute to the development of UMR-based data and tools, and adopt UMR in downstream applications to advance the state of the art in Natural Language Processing (NLP) in particular and Artificial Intelligence (AI) in general. In particular, the proposed infrastructure promotes access to information technology in languages for traditionally underrepresented groups by providing the necessary tools and resources to develop AI technologies for these languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

当人类试图与计算机交谈时，我们的语言需要被翻译成计算机可以处理和理解的意义表示法。目前，这种翻译是逐个任务和逐个语言进行的。这种支离破碎的方法带来了冗余和重复，因此效率低下。统一意义表示(UMR)被设计为一种机器可读语言，从英语和汉语等高资源语言到阿拉帕霍等低资源语言，所有语言都可以被翻译成。UMR还可以扩展到多模式设置，以表示视频和图像的内容，使计算机能够更好地处理和理解这些媒体形式的内容。该项目旨在建立必要的基础设施，以便将语言和其他媒体翻译成普遍定期报告。该基础设施包括用于促进人类语言到UMR的翻译的工具、可用于评估UMR质量的指标、具有非常不同的语言特性的五种语言的UMR的初始集合：英语、汉语、阿拉伯语、阿拉帕霍语和盖丘亚语，以及包括其中两种语言的语言和手势的视频内容。该项目还包括通过辅导、讲习班、暑期学校以及在线培训材料，促使其他研究人员编写更多语文和体裁的普遍定期报告的外联工作。一旦为一种语言创建了足够数量的UMR，就可以对这些UMR进行计算机模型和算法训练，以自动为该语言的新数据生成更多的UMR。然后，它们可以用于推动广泛的下游人类语言技术的最新水平，从人机交互到对话系统，从信息提取到问题回答，从机器翻译到文本摘要。该项目还将为资源不足的语言编写普遍定期报告，并帮助将现代语言技术带给讲这些语言的人，以及从事这些语言的记录和/或振兴工作的人。该项目汇集了一个由语言学家和计算机科学家组成的跨学科团队，共同构建统一意义表示(UMR)的基础设施，UMR是一种实用的、形式化的、易于计算的、跨语言有效的自然语言文档级意义表示，可以影响需要“深度”自然语言理解(NLU)的广泛下游应用。普遍定期报告基础设施将包括五种语言的普遍定期报告注释数据集，包括其中两种语言--英语和阿拉帕霍语的多式数据集、普遍定期报告注释界面和相关培训材料、NLP其他研究人员在开发更高级普遍定期报告分析模型时可用作比较的基准普遍定期报告分析模型、评估文件一级意义表示的指标以及向基础设施用户传播普遍定期报告数据集、工具和资源的平台。该项目还包括广泛的外联工作，包括讲习班、教程、暑期学校，以及在项目结束时共同承担的一项任务，即让自然语言规划社区的其他研究人员参与编制更多语文的普遍适用报告，并促进在意义表示句法分析研究和下游应用中使用普遍适用报告基础设施。普遍定期报告基础设施促进通用、多语种和多模式应用程序的开发，以努力摆脱需要重复且往往相互冲突的语义注释工作的特定语言和特定任务模式。该项目的最终目标是建立一个自然语言处理研究人员社区，这将有助于开发基于UMR的数据和工具，并在下游应用程序中采用UMR，以促进特别是自然语言处理(NLP)和一般人工智能(AI)的最新水平。特别是，拟议的基础设施通过提供必要的工具和资源来开发这些语言的人工智能技术，促进传统上代表性不足的群体获得信息技术。这一奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Mapping AMR to UMR: Resources for Adapting Existing Corpora for Cross-Lingual Compatibility

DOI：
发表时间：
2023
期刊：
影响因子：
0
作者：
Julia Bonn;Skatje Myers;Jens E. L. Van Gysel;Lukas Denk;Meagan Vigus;Jin Zhao;Andrew Cowell;W. Bruce Croft;Jan Hajic;James H Martin;Alexis Palmer;Martha Palmer;J. Pustejovsky;Zdenka Uresová;Rosa Vallejos;Nianwen Xue
通讯作者：
Julia Bonn;Skatje Myers;Jens E. L. Van Gysel;Lukas Denk;Meagan Vigus;Jin Zhao;Andrew Cowell;W. Bruce Croft;Jan Hajic;James H Martin;Alexis Palmer;Martha Palmer;J. Pustejovsky;Zdenka Uresová;Rosa Vallejos;Nianwen Xue

UMR annotation of Multiword Expressions

多词表达式的 UMR 注释

DOI：
发表时间：
2023
期刊：
The 4th International Workshop on Designing Meaning Representations
影响因子：
0
作者：
Bonn, Julia;Cowell, Andrew;Hajic, Jan;Palmer, Alexis;Palmer, Martha;Pustejovsky, James;Sun, Haibo;Uresova Zdenka;Wein, Shira;Xue, Nianwen
通讯作者：
Xue, Nianwen

UMR annotation of Chinese Verb compounds and related constructions

DOI：
发表时间：
2023
期刊：
影响因子：
0
作者：
Haibo Sun;Yifan Zhu;Jin Zhao;Nianwen Xue
通讯作者：
Haibo Sun;Yifan Zhu;Jin Zhao;Nianwen Xue

UMR-Writer 2.0: Incorporating a New Keyboard Interface and Workflow into UMR-Writer

DOI：
10.18653/v1/2023.law-1.21
发表时间：
2023
期刊：
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)
影响因子：
0
作者：
Sijia Ge;Jin Zhao;Kristin Wright-Bettner;Skatje Myers;Nianwen Xue;Martha Palmer
通讯作者：
Sijia Ge;Jin Zhao;Kristin Wright-Bettner;Skatje Myers;Nianwen Xue;Martha Palmer

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Nianwen Xue其他文献

SMART: A Stratified Machine Reading Test

SMART：分层机器阅读测试

DOI：
10.1007/978-3-030-32233-5_6
发表时间：
2019
期刊：
Natural Language Processing and Chinese Computing
影响因子：
0
作者：
Jiarui Yao;Minxuan Feng;Haixia Feng;Zhiguo Wang;Yuchen Zhang;Nianwen Xue
通讯作者：
Nianwen Xue

22nd International Conference on Computational Linguistics Proceedings of the Workshop on Cross-Framework and Cross-Domain

第22届国际计算语言学会议跨框架跨领域研讨会论文集

DOI：
发表时间：
2008
期刊：
影响因子：
0
作者：
Johan Bos;E. Briscoe;A. Cahill;John A. Carroll;S. Clark;Ann A. Copestake;D. Flickinger;Josef van Genabith;J. Hockenmaier;A. Joshi;R. Kaplan;Tracy Holloway King;K. Sandra;Dekang Lin;Jan Tore Lønning;Christopher D. Manning;Yusuke Miyao;Joakim Nivre;S. Oepen;Kenji Sagae;Nianwen Xue;Yi Zhang
通讯作者：
Yi Zhang