权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Discovering and Demonstrating Linguistic Features for Language Documentation

发现和展示语言文档的语言特征

基本信息

批准号：
1761548
负责人：
Graham Neubig
金额：
$ 45万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-15 至 2023-01-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1761548&HistoricalAwards=false
关键词：
Discovering Demonstrating Linguistic Features Language

项目摘要

Documenting endangered languages is matter of great urgency, but is also a time-consuming process. Annotation and curation of speech and text data, searching for interesting, prototypical, or atypical entries, and marshalling examples for pedagogy or publication are still mostly manual processes. This project aims to speed these processes by (1) creating better tools for the automated analysis ofsmaller and partially-annotated corpora, which have potential to reduce the amount of time required by linguists to manually annotate these corpora, and (2) creating better methods for linguists to browse their collected data and answer questions about the characteristics of the language at hand. The intellectual contribution of this proposal will lie in the development of new computational methods for natural language processing (NLP) for endangered languages, and their evaluation, both in controlled environments and as a tool for linguists in the field. It will also have broader impact in the creation of new tools and standards for linguistic documentation,increased collaboration between linguists and computer scientists, and training of a graduate student in the technologies and practices necessary to move this collaboration forward. The training component will increase the STEM workforce capacity in computational linguistics, important given the need for more advanced tools in working on languages that are underdocumented and spoken in countries that are key to national interests. As a specific methodology to realize this vision, this project focuses on recent development of massively multilingual NLP models based on neural networks. These methods work by creating NLP using data from a large number of languages, then using the information gleaned from these languages to improve the accuracy of processing on a new language with a paucity of training data. Within this framework, three major research questions will be examined: (1) How can these techniques be efficiently applied to very-low-resource languages,especially those in the early stages of text collection? (2) What methods can be used to move beyond sentence-by-sentence analyses, and synthesize information about the entirety of the language to propose a simple grammatical specification? (3) Is it possible to provide examples that support typological predictions for a linguist to read and learn more about the nuances of the language they are analyzing? All three of these research questions will be examined in a rigorous process of devising methods, testing on existing data sets for well-resourced languages, and finally deployment to field linguists to examine how they improve the efficiency or accuracy of the language documentation process.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

记录濒危语言是一件非常紧迫的事情，但也是一个耗时的过程。对语音和文本数据进行注释和整理，搜索有趣的、典型的或非典型的条目，以及为教学或出版整理范例，仍然主要是人工过程。该项目旨在通过以下方式加快这些进程：(1)创建更好的工具，用于对较小的和部分注释的语料库进行自动分析，这有可能减少语言学家手动注释这些语料库所需的时间，以及(2)为语言学家浏览其收集的数据并回答有关手头语言特征的问题创造更好的方法。这一建议的智力贡献将在于为濒危语言开发新的自然语言处理(NLP)计算方法，并在受控环境中和作为该领域语言学家的工具进行评估。它还将在创建新的语言文件工具和标准、加强语言学家和计算机科学家之间的合作以及对研究生进行必要的技术和实践培训以推动这一合作方面产生更广泛的影响。培训部分将提高STEM在计算语言学方面的劳动力能力，这一点很重要，因为需要更先进的工具来处理对国家利益至关重要的国家中记录不足和使用的语言。作为实现这一愿景的具体方法，本项目关注基于神经网络的大规模多语言自然语言处理模型的最新发展。这些方法的工作原理是使用来自大量语言的数据创建NLP，然后使用从这些语言收集的信息来提高在缺乏训练数据的情况下对新语言进行处理的准确性。在这个框架内，我们将考察三个主要的研究问题：(1)如何将这些技术有效地应用于资源非常少的语言，特别是那些处于文本收集的早期阶段的语言？(2)什么方法可以用来超越逐句分析，并综合关于整个语言的信息来提出简单的语法规范？(3)是否有可能提供支持类型预测的例子，以便语言学家阅读和了解更多关于他们正在分析的语言的细微差别？所有这三个研究问题都将在一个严格的过程中进行审查，设计方法，测试现有的资源丰富的语言的数据集，最后部署到现场语言学家，检查它们如何提高语言文件编制过程的效率或准确性。这一奖项反映了NSF的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（31）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

通过微调语音表示进行音素识别：Luhya 语言变体的案例研究

DOI：
发表时间：
2021
期刊：
22nd Annual Conference of the International Speech Communication Association (InterSpeech 2021
影响因子：
0
作者：
Siminyu, Kathleen;Li, Xinjian;Anastasopoulos, Antonios;Mortensen, David R.;Marlo, Michael;Neubig, Graham
通讯作者：
Neubig, Graham

Practical Comparable Data Collection for Low-Resource Languages via Images

DOI：
发表时间：
2020-04
期刊：
ArXiv
影响因子：
0
作者：
Aman Madaan;Shruti Rijhwani;Antonios Anastasopoulos;Yiming Yang;Graham Neubig
通讯作者：
Aman Madaan;Shruti Rijhwani;Antonios Anastasopoulos;Yiming Yang;Graham Neubig

An Analysis of Source-Side Grammatical Errors in NMT

DOI：
10.18653/v1/w19-4822
发表时间：
2019-05
期刊：
影响因子：
0
作者：
Antonios Anastasopoulos
通讯作者：
Antonios Anastasopoulos

It’s Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information

DOI：
10.18653/v1/2020.acl-main.149
发表时间：
2020-05
期刊：
ArXiv
影响因子：
0
作者：
Emanuele Bugliarello;Sabrina J. Mielke;Antonios Anastasopoulos;Ryan Cotterell;Naoaki Okazaki
通讯作者：
Emanuele Bugliarello;Sabrina J. Mielke;Antonios Anastasopoulos;Ryan Cotterell;Naoaki Okazaki

Transliteration for Cross-Lingual Morphological Inflection

跨语言形态变化的音译

DOI：
发表时间：
2020
期刊：
and Morphology
影响因子：
0
作者：
Murikinati, Nikitha;Anastasopoulos, Antonios;Neubig, Graham
通讯作者：
Neubig, Graham

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Graham Neubig其他文献

Attentive Interaction Model: Modeling Changes in View in Argumentation

注意力交互模型：对论证中观点的变化进行建模

DOI：
10.18653/v1/n18-1010
发表时间：
2018
期刊：
Proceedings of the Web Conference 2021
影响因子：
0
作者：
Yohan Jo;Shivani Poddar;Byungsoo Jeon;Qinlan Shen;C. Rosé;Graham Neubig
通讯作者：
Graham Neubig

Simple , Correct Parallelization for Blocked Gibbs Sampling Graham Neubig November

分块吉布斯采样的简单、正确并行化 Graham Neubig

DOI：
发表时间：
2014
期刊：
影响因子：
0
作者：
Graham Neubig
通讯作者：
Graham Neubig

Discriminative Language Models as a Tool for Machine Translation Error Analysis

判别性语言模型作为机器翻译错误分析的工具

DOI：
发表时间：
2014
期刊：
Proceedings of the 25th International Conference on Computational Linguistics (COLING)
影响因子：
0
作者：
Koichi Akabe;Graham Neubig;Sakriani Sakti;Tomoki Toda;Satoshi Nakamura
通讯作者：
Satoshi Nakamura

関連尺度に基づいた負の相関ルール抽出手法の高機能化

基于相关措施改进负关联规则提取方法的功能

DOI：
发表时间：
2014
期刊：
影响因子：
0
作者：
Koichi Akabe;Graham Neubig;Sakriani Sakti;Tomoki Toda;Satoshi Nakamura;宮城智輝，山本泰生，岩沼宏治;Graham Neubig;黒岩健歩，岩沼宏治，山本泰生
通讯作者：
黒岩健歩，岩沼宏治，山本泰生