Collaborative Research: RI: Medium: From Acoustic Signal to Morphosyntactic Analysis in One End-to-End Neural System
合作研究:RI:媒介:从声学信号到端到端神经系统中的形态句法分析
基本信息
- 批准号:2211951
- 负责人:
- 金额:$ 89.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-08-01 至 2026-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
There are approximately 7,000 languages in the world today, but this number is declining precipitously.Even many languages that currently have thousands upon thousands of speakers are likely to fall outof use within a generation. For the speakers of these languages, this represents a tragic loss of culturaland linguistic heritage, which are important anchors of their social identity. Each language also carriesirreplaceable data about language as a phenomenon of human behavior—the limits of its variation andthe patterns in its structure and development. Linguists and language activists are currently working toquickly and comprehensively document as many languages as possible. In the unfortunate event that alanguage fades from use, documentation ensures that its data will remain available for future cultural orscientific analysis. This project partially automates the process of language documentation using toolsfrom Natural Language Processing and Machine Learning. It differs from similar projects in using oneintegrated system to process the sounds of speech and the structure of words, instead of using two ormore separate components. With the collaboration of native speaker scholars, the researchers are applyingtheir methodology to four languages: Highland Puebla Nahuatl, Yoloxóchitl Mixtec, San Pedro AmuzgosAmuzgo, and North Slope Iñupiaq.The proposed research will dramatically transform the landscape of automatic morphosyntactic andmorphophonological analysis by introducing an end-to-end system that consumes speech as an input andproduces interlinear annotations as an output. The research team proposes to build an end-to-end system,a single neural net that, with small amounts of labeled data produced by native speaker linguists, candirectly convert recorded speech to analyzed text, producing four outputs: (1) surface transcription, (2)morphological segmentation of surface forms, (3) an underlying or canonical form for each morpheme,and (4) a gloss or standardized label for each morpheme. The proposed single end-to-end neural networkrepresents the first attempt to integrate the four aforementioned tasks into a single neural network, avoidingthe error-propagation problems that have plagued earlier attempts at creating a pipeline and mitigating thecomplexity of the technology for end-users. The researchers also propose innovative ways to incorporate linguisticknowledge into neural networks, including the use of differentiable weighted finite-state transducers,which are independently motivated by an iterative self-training architecture. This approach to iterative self training,in its own right, will represent an advance in machine learning — a new algorithm for upweightingwords and morphemes. The research also makes significant contributions to computational morphology.It includes a simple but expressive modification to existing schemes for segmentation and glossing, specificallyfor the representation of discontinuous morphemes. Furthermore, the proposal extends popularapproaches to morphological analysis (e.g., UniMorph) by systematically addressing derivation as well asinflection. This proposal addresses glossing of reduplication and noun-incorporation, which earlier workhas not.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当今世界大约有7000种语言,但这一数字正在急剧下降。即使是目前有成千上万人使用的许多语言,也可能在一代人内淘汰。对于说这些语言的人来说,这是对文化和语言遗产的悲剧性丧失,而文化和语言遗产是他们社会认同的重要支柱。每种语言还携带了关于作为人类行为现象的语言的可替换数据--其变异的限度以及其结构和发展的模式。语言学家和语言活动家目前正在努力快速、全面地记录尽可能多的语言。不幸的是,如果一种语言逐渐不再使用,文档将确保其数据仍可用于未来的文化或科学分析。这个项目使用自然语言处理和机器学习的工具部分地自动化了语言文档的过程。它与类似项目的不同之处在于,它使用一个集成的系统来处理语音和单词的结构,而不是使用两个或更多独立的组件。在以英语为母语的学者的合作下,研究人员将他们的方法应用于四种语言:Highland Puebla Nahuatl,Yoloxóitl Mixtec,San Pedro Amuzos Amuzgo和North Slope Iñupiq。拟议中的研究将极大地改变自动形态句法和形态音位分析的格局,引入一种端到端的系统,将语音作为输入,产生相互关联的注释作为输出。研究小组建议建立一个端到端系统,一个单一的神经网络,利用母语语言学家产生的少量标记数据,可以直接将记录的语音转换为分析文本,产生四种输出:(1)表层转录,(2)表层形式的形态分割,(3)每个语素的潜在或规范形式,以及(4)每个语素的光泽或标准化标签。拟议的单一端到端神经网络是将上述四项任务整合为单一神经网络的第一次尝试,避免了错误传播问题,这些问题曾困扰着早先创建管道的尝试,并减轻了终端用户对该技术的复杂性。研究人员还提出了将语言学知识融入神经网络的创新方法,包括使用可区分的加权有限状态换能器,该换能器由迭代的自我训练架构独立激发。这种迭代自我训练的方法本身将代表着机器学习的进步--一种提高单词和语素权重的新算法。这项研究还对计算形态学做出了重大贡献。它包括对现有切分和注解方案的简单但具有表现力的修改,特别是对于不连续语素的表示。此外,该提案通过系统地处理派生和屈折,将流行的方法扩展到形态分析(例如,UniMorph)。这项建议解决了重复和名词合并的问题,这是早期的工作所没有的。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Syntax and Semantics Meet in the “Middle”: Probing the Syntax-Semantics Interface of LMs Through Agentivity
语法和语义在“中间”相遇:通过主体性探索 LM 的语法-语义接口
- DOI:10.18653/v1/2023.starsem-1.14
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Tjuatja, Lindia;Liu, Emmy;Levin, Lori;Neubig, Graham
- 通讯作者:Neubig, Graham
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing
- DOI:10.18653/v1/2023.sigmorphon-1.22
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Taiqi He;Lindia Tjuatja;Nathaniel R. Robinson;Shinji Watanabe;David R. Mortensen;Graham Neubig;Lori Levin
- 通讯作者:Taiqi He;Lindia Tjuatja;Nathaniel R. Robinson;Shinji Watanabe;David R. Mortensen;Graham Neubig;Lori Levin
Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation
通用注释指南:用于形态注释的明确的、人类和机器可读的项目和进程约定
- DOI:10.18653/v1/2023.sigmorphon-1.7
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Mortensen, David R.;Gulsen, Ela;He, Taiqi;Robinson, Nathaniel;Amith, Jonathan;Tjuatja, Lindia;Levin, Lori
- 通讯作者:Levin, Lori
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lorraine Levin其他文献
Lorraine Levin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lorraine Levin', 18)}}的其他基金
Conference: Training the US Computational Linguistics Team
会议:培训美国计算语言学团队
- 批准号:
2329963 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Conference: International Linguistics Olympiad (2022)
会议:国际语言学奥林匹克(2022)
- 批准号:
2141334 - 财政年份:2022
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
North American Computational Linguistics Olympiad (NACLO) 2020
2020 年北美计算语言学奥林匹克竞赛 (NACLO)
- 批准号:
1946109 - 财政年份:2020
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Workshop: International Linguistics Olympiad (ILO) July 2019; Yongin, South Korea
研讨会:国际语言学奥林匹克(ILO)2019 年 7 月;
- 批准号:
1851142 - 财政年份:2019
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
International Linguistics Olympiad (ILO) 2018: Prague, CZ, July 26 - August 1, 2018
2018 年国际语言学奥林匹克 (ILO):捷克布拉格,2018 年 7 月 26 日至 8 月 1 日
- 批准号:
1757042 - 财政年份:2018
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Workshop: International Computational Linguistics Olympiad 2017
研讨会:2017 年国际计算语言学奥林匹克竞赛
- 批准号:
1654253 - 财政年份:2017
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
The International Linguistics Olympiad: Preparing High School Students for the Study of Human Language and Computation
国际语言学奥林匹克:为高中生学习人类语言和计算做好准备
- 批准号:
1137828 - 财政年份:2011
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
SGER: Collaborative Research: New Problem Genres for the North American Computational Linguistics Olympiad
SGER:协作研究:北美计算语言学奥林匹克竞赛的新问题类型
- 批准号:
0838848 - 财政年份:2008
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Active Selection of Data for Machine Translation
主动选择机器翻译数据
- 批准号:
0713292 - 财政年份:2007
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Planning Workshop for a Computational Linguistics Olympiad
计算语言学奥林匹克竞赛规划研讨会
- 批准号:
0633871 - 财政年份:2006
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312841 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312842 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313151 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Continuing Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312840 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
- 批准号:
2345528 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232298 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232055 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313149 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Continuing Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
- 批准号:
2312374 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant