Discovering and Demonstrating Linguistic Features for Language Documentation
发现和展示语言文档的语言特征
基本信息
- 批准号:1761548
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-15 至 2023-01-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Documenting endangered languages is matter of great urgency, but is also a time-consuming process. Annotation and curation of speech and text data, searching for interesting, prototypical, or atypical entries, and marshalling examples for pedagogy or publication are still mostly manual processes. This project aims to speed these processes by (1) creating better tools for the automated analysis ofsmaller and partially-annotated corpora, which have potential to reduce the amount of time required by linguists to manually annotate these corpora, and (2) creating better methods for linguists to browse their collected data and answer questions about the characteristics of the language at hand. The intellectual contribution of this proposal will lie in the development of new computational methods for natural language processing (NLP) for endangered languages, and their evaluation, both in controlled environments and as a tool for linguists in the field. It will also have broader impact in the creation of new tools and standards for linguistic documentation,increased collaboration between linguists and computer scientists, and training of a graduate student in the technologies and practices necessary to move this collaboration forward. The training component will increase the STEM workforce capacity in computational linguistics, important given the need for more advanced tools in working on languages that are underdocumented and spoken in countries that are key to national interests. As a specific methodology to realize this vision, this project focuses on recent development of massively multilingual NLP models based on neural networks. These methods work by creating NLP using data from a large number of languages, then using the information gleaned from these languages to improve the accuracy of processing on a new language with a paucity of training data. Within this framework, three major research questions will be examined: (1) How can these techniques be efficiently applied to very-low-resource languages,especially those in the early stages of text collection? (2) What methods can be used to move beyond sentence-by-sentence analyses, and synthesize information about the entirety of the language to propose a simple grammatical specification? (3) Is it possible to provide examples that support typological predictions for a linguist to read and learn more about the nuances of the language they are analyzing? All three of these research questions will be examined in a rigorous process of devising methods, testing on existing data sets for well-resourced languages, and finally deployment to field linguists to examine how they improve the efficiency or accuracy of the language documentation process.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
记录濒危语言是一件非常紧迫的事情,但也是一个耗时的过程。对语音和文本数据进行注释和整理,搜索有趣的、典型的或非典型的条目,以及为教学或出版整理范例,仍然主要是人工过程。该项目旨在通过以下方式加快这些进程:(1)创建更好的工具,用于对较小的和部分注释的语料库进行自动分析,这有可能减少语言学家手动注释这些语料库所需的时间,以及(2)为语言学家浏览其收集的数据并回答有关手头语言特征的问题创造更好的方法。这一建议的智力贡献将在于为濒危语言开发新的自然语言处理(NLP)计算方法,并在受控环境中和作为该领域语言学家的工具进行评估。它还将在创建新的语言文件工具和标准、加强语言学家和计算机科学家之间的合作以及对研究生进行必要的技术和实践培训以推动这一合作方面产生更广泛的影响。培训部分将提高STEM在计算语言学方面的劳动力能力,这一点很重要,因为需要更先进的工具来处理对国家利益至关重要的国家中记录不足和使用的语言。作为实现这一愿景的具体方法,本项目关注基于神经网络的大规模多语言自然语言处理模型的最新发展。这些方法的工作原理是使用来自大量语言的数据创建NLP,然后使用从这些语言收集的信息来提高在缺乏训练数据的情况下对新语言进行处理的准确性。在这个框架内,我们将考察三个主要的研究问题:(1)如何将这些技术有效地应用于资源非常少的语言,特别是那些处于文本收集的早期阶段的语言?(2)什么方法可以用来超越逐句分析,并综合关于整个语言的信息来提出简单的语法规范?(3)是否有可能提供支持类型预测的例子,以便语言学家阅读和了解更多关于他们正在分析的语言的细微差别?所有这三个研究问题都将在一个严格的过程中进行审查,设计方法,测试现有的资源丰富的语言的数据集,最后部署到现场语言学家,检查它们如何提高语言文件编制过程的效率或准确性。这一奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(31)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties
通过微调语音表示进行音素识别:Luhya 语言变体的案例研究
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Siminyu, Kathleen;Li, Xinjian;Anastasopoulos, Antonios;Mortensen, David R.;Marlo, Michael;Neubig, Graham
- 通讯作者:Neubig, Graham
Practical Comparable Data Collection for Low-Resource Languages via Images
- DOI:
- 发表时间:2020-04
- 期刊:
- 影响因子:0
- 作者:Aman Madaan;Shruti Rijhwani;Antonios Anastasopoulos;Yiming Yang;Graham Neubig
- 通讯作者:Aman Madaan;Shruti Rijhwani;Antonios Anastasopoulos;Yiming Yang;Graham Neubig
An Analysis of Source-Side Grammatical Errors in NMT
- DOI:10.18653/v1/w19-4822
- 发表时间:2019-05
- 期刊:
- 影响因子:0
- 作者:Antonios Anastasopoulos
- 通讯作者:Antonios Anastasopoulos
It’s Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information
- DOI:10.18653/v1/2020.acl-main.149
- 发表时间:2020-05
- 期刊:
- 影响因子:0
- 作者:Emanuele Bugliarello;Sabrina J. Mielke;Antonios Anastasopoulos;Ryan Cotterell;Naoaki Okazaki
- 通讯作者:Emanuele Bugliarello;Sabrina J. Mielke;Antonios Anastasopoulos;Ryan Cotterell;Naoaki Okazaki
Transliteration for Cross-Lingual Morphological Inflection
跨语言形态变化的音译
- DOI:
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Murikinati, Nikitha;Anastasopoulos, Antonios;Neubig, Graham
- 通讯作者:Neubig, Graham
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Graham Neubig其他文献
Attentive Interaction Model: Modeling Changes in View in Argumentation
注意力交互模型:对论证中观点的变化进行建模
- DOI:
10.18653/v1/n18-1010 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Yohan Jo;Shivani Poddar;Byungsoo Jeon;Qinlan Shen;C. Rosé;Graham Neubig - 通讯作者:
Graham Neubig
Simple , Correct Parallelization for Blocked Gibbs Sampling Graham Neubig November
分块吉布斯采样的简单、正确并行化 Graham Neubig
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Graham Neubig - 通讯作者:
Graham Neubig
Discriminative Language Models as a Tool for Machine Translation Error Analysis
判别性语言模型作为机器翻译错误分析的工具
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Koichi Akabe;Graham Neubig;Sakriani Sakti;Tomoki Toda;Satoshi Nakamura - 通讯作者:
Satoshi Nakamura
関連尺度に基づいた負の相関ルール抽出手法の高機能化
基于相关措施改进负关联规则提取方法的功能
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Koichi Akabe;Graham Neubig;Sakriani Sakti;Tomoki Toda;Satoshi Nakamura;宮城 智輝,山本 泰生,岩沼 宏治;Graham Neubig;黒岩 健歩,岩沼 宏治,山本 泰生 - 通讯作者:
黒岩 健歩,岩沼 宏治,山本 泰生
フーリエ変換を用いた命題論理式の充足可能性に関する考察
用傅立叶变换考虑命题逻辑公式的可满足性
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
赤部 晃一;Graham Neubig;工藤 拓;John Richardson;中澤 敏明;星野 翔;宮城 智輝,山本 泰生,岩沼 宏治 - 通讯作者:
宮城 智輝,山本 泰生,岩沼 宏治
Graham Neubig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Graham Neubig', 18)}}的其他基金
FAI: Quantifying and Mitigating Disparities in Language Technologies
FAI:量化和减轻语言技术方面的差异
- 批准号:
2040926 - 财政年份:2021
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: Open-domain, Data-driven Code Synthesis from Natural Language
SHF:小型:开放域、数据驱动的自然语言代码合成
- 批准号:
1815287 - 财政年份:2018
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RI: EAGER: Collaborative Research: Adaptive Heads-up Displays for Simultaneous Interpretation
RI:EAGER:协作研究:用于同声传译的自适应平视显示器
- 批准号:
1748642 - 财政年份:2017
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似海外基金
Recycling of platinum electrodes demonstrating particulate electrochemical printing - PEP 3d Pt
铂电极的回收展示了颗粒电化学印刷 - PEP 3d Pt
- 批准号:
2905755 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Studentship
Net Zero Ports of the Future: Demonstrating the Integration of Green Hydrogen Shore Power with Water Reuse
未来的净零港口:展示绿色氢岸电与水回用的集成
- 批准号:
10098442 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Collaborative R&D
Demonstrating the potential for portable detection of bird flu.
展示便携式检测禽流感的潜力。
- 批准号:
10090901 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Collaborative R&D
Demonstrating ocean acidification-driven changes in the ecological role of benthic macroherbivores in controlling algal habitats
展示海洋酸化驱动的底栖大型食草动物在控制藻类栖息地中的生态作用的变化
- 批准号:
23K26924 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Project Zephattan: Demonstrating three wind-generator technologies to power e-mobility charging in West Africa and the Pacific
Zephattan 项目:展示三种风力发电机技术,为西非和太平洋地区的电动汽车充电提供动力
- 批准号:
10107747 - 财政年份:2024
- 资助金额:
$ 45万 - 项目类别:
Demonstrator
Low Carbon Acoustic Barriers - Demonstrating Innovation in Railway Construction
低碳声屏障——展示铁路建设创新
- 批准号:
10062090 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Collaborative R&D
Demonstrating ocean acidification-driven changes in the ecological role of benthic macroherbivores in controlling algal habitats
展示海洋酸化驱动的底栖大型食草动物在控制藻类栖息地中的生态作用的变化
- 批准号:
23H02231 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Open Clasp, Open Archive: Preserving the company's legacy, demonstrating its impact and value, and opening access to its unique archive of feminist th
Open Clasp、Open Archive:保护公司的遗产,展示其影响力和价值,并开放对其独特的女权主义档案的访问
- 批准号:
2870460 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Studentship
Feasibility of delivering and demonstrating a human-in-the-loop digital twin in the construction and maintenance of GCRE (Athena)
在 GCRE (Athena) 的建设和维护中交付和演示人机交互数字孪生的可行性
- 批准号:
10063263 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Collaborative R&D
Demonstrating the feasibility of applying machine learning models to railway condition data: Engine condition monitoring and failure prediction
展示将机器学习模型应用于铁路状况数据的可行性:发动机状况监测和故障预测
- 批准号:
10080979 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Collaborative R&D