Collaborative Research: RI: Medium: From Acoustic Signal to Morphosyntactic Analysis in One End-to-End Neural System
合作研究:RI:媒介:从声学信号到端到端神经系统中的形态句法分析
基本信息
- 批准号:2211951
- 负责人:
- 金额:$ 89.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-08-01 至 2026-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
There are approximately 7,000 languages in the world today, but this number is declining precipitously.Even many languages that currently have thousands upon thousands of speakers are likely to fall outof use within a generation. For the speakers of these languages, this represents a tragic loss of culturaland linguistic heritage, which are important anchors of their social identity. Each language also carriesirreplaceable data about language as a phenomenon of human behavior—the limits of its variation andthe patterns in its structure and development. Linguists and language activists are currently working toquickly and comprehensively document as many languages as possible. In the unfortunate event that alanguage fades from use, documentation ensures that its data will remain available for future cultural orscientific analysis. This project partially automates the process of language documentation using toolsfrom Natural Language Processing and Machine Learning. It differs from similar projects in using oneintegrated system to process the sounds of speech and the structure of words, instead of using two ormore separate components. With the collaboration of native speaker scholars, the researchers are applyingtheir methodology to four languages: Highland Puebla Nahuatl, Yoloxóchitl Mixtec, San Pedro AmuzgosAmuzgo, and North Slope Iñupiaq.The proposed research will dramatically transform the landscape of automatic morphosyntactic andmorphophonological analysis by introducing an end-to-end system that consumes speech as an input andproduces interlinear annotations as an output. The research team proposes to build an end-to-end system,a single neural net that, with small amounts of labeled data produced by native speaker linguists, candirectly convert recorded speech to analyzed text, producing four outputs: (1) surface transcription, (2)morphological segmentation of surface forms, (3) an underlying or canonical form for each morpheme,and (4) a gloss or standardized label for each morpheme. The proposed single end-to-end neural networkrepresents the first attempt to integrate the four aforementioned tasks into a single neural network, avoidingthe error-propagation problems that have plagued earlier attempts at creating a pipeline and mitigating thecomplexity of the technology for end-users. The researchers also propose innovative ways to incorporate linguisticknowledge into neural networks, including the use of differentiable weighted finite-state transducers,which are independently motivated by an iterative self-training architecture. This approach to iterative self training,in its own right, will represent an advance in machine learning — a new algorithm for upweightingwords and morphemes. The research also makes significant contributions to computational morphology.It includes a simple but expressive modification to existing schemes for segmentation and glossing, specificallyfor the representation of discontinuous morphemes. Furthermore, the proposal extends popularapproaches to morphological analysis (e.g., UniMorph) by systematically addressing derivation as well asinflection. This proposal addresses glossing of reduplication and noun-incorporation, which earlier workhas not.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
当今世界上大约有7,000种语言,但是这个数字正好在下降。几千语言在一代人的一代人中可能会失去数千句话。对于这些语言的演讲者来说,这代表了文化和语言遗产的悲惨丧失,这是其社会认同的重要锚点。每种语言还携带有关语言作为人类行为现象的可替代数据,这是其变异的局限性以及其结构和发展中的模式。语言学家和语言活动家目前正在快速,全面地记录尽可能多的语言。在不幸的情况下,流动从使用中逐渐消失,文档可确保其数据仍可用于未来的文化或学科分析。该项目通过自然语言处理和机器学习中的工具部分自动化语言文档的过程。在使用一个集成系统来处理语音和单词结构的情况下,它与类似项目不同,而不是使用两个或更多单独的组件。通过以母语为母语的学者的合作,研究人员将其方法应用于四种语言:高地Puebla Nahuatl,YoloxóchitlMixtec,San Pedro Amuzgosamuzgo和NorthSlopeIñupiaq。提议的研究将通过以下方式进行自动分析,以启动自动分析,以进行自动分析。作为输入并产生与输出的中间性注释。 The research team proposals to build an end-to-end system,a single neutral net that, with small amounts of labeled data produced by native speaker linguists, candirectly convert recorded speech to analyzed text, producing four outputs: (1) surface transcription, (2)morphological segmentation of surface forms, (3) an underlying or canonical form for each morpheme, and (4) a gloss or standardized label for each morpheme.提出的单一端到端神经元网络表达者首次尝试将四个先前的任务集成到一个单个神经元网络中,避免了错误传播问题,这些问题困扰着早期尝试创建管道并减轻最终用户技术复杂性的尝试。研究人员还提出了将语言学纳入神经网络的创新方法,包括使用可区分的加权有限态传感器,这些换能器是由迭代自我训练架构独立动机的。这种迭代自我训练的方法本身将代表机器学习的进步,这是一种用于上升文字和词素的新算法。这项研究还为计算形态做出了重大贡献。它包括对现有方案进行分割和掩盖方案的简单但表达性的修改,此外,该提案通过系统地解决派生和膨胀来扩展流行的形态分析方法(例如,Unimorph)。该提案涉及重复和名词结构化的掩盖,这不是更早的工作。该奖项反映了NSF的法定任务,并通过使用基金会的知识分子优点和更广泛的影响审查标准来评估被认为是宝贵的支持。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Syntax and Semantics Meet in the “Middle”: Probing the Syntax-Semantics Interface of LMs Through Agentivity
语法和语义在“中间”相遇:通过主体性探索 LM 的语法-语义接口
- DOI:10.18653/v1/2023.starsem-1.14
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Tjuatja, Lindia;Liu, Emmy;Levin, Lori;Neubig, Graham
- 通讯作者:Neubig, Graham
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing
- DOI:10.18653/v1/2023.sigmorphon-1.22
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Taiqi He;Lindia Tjuatja;Nathaniel R. Robinson;Shinji Watanabe;David R. Mortensen;Graham Neubig;Lori Levin
- 通讯作者:Taiqi He;Lindia Tjuatja;Nathaniel R. Robinson;Shinji Watanabe;David R. Mortensen;Graham Neubig;Lori Levin
Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation
通用注释指南:用于形态注释的明确的、人类和机器可读的项目和进程约定
- DOI:10.18653/v1/2023.sigmorphon-1.7
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Mortensen, David R.;Gulsen, Ela;He, Taiqi;Robinson, Nathaniel;Amith, Jonathan;Tjuatja, Lindia;Levin, Lori
- 通讯作者:Levin, Lori
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lorraine Levin其他文献
Lorraine Levin的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lorraine Levin', 18)}}的其他基金
Conference: Training the US Computational Linguistics Team
会议:培训美国计算语言学团队
- 批准号:
2329963 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Conference: International Linguistics Olympiad (2022)
会议:国际语言学奥林匹克(2022)
- 批准号:
2141334 - 财政年份:2022
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
North American Computational Linguistics Olympiad (NACLO) 2020
2020 年北美计算语言学奥林匹克竞赛 (NACLO)
- 批准号:
1946109 - 财政年份:2020
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Workshop: International Linguistics Olympiad (ILO) July 2019; Yongin, South Korea
研讨会:国际语言学奥林匹克(ILO)2019 年 7 月;
- 批准号:
1851142 - 财政年份:2019
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
International Linguistics Olympiad (ILO) 2018: Prague, CZ, July 26 - August 1, 2018
2018 年国际语言学奥林匹克 (ILO):捷克布拉格,2018 年 7 月 26 日至 8 月 1 日
- 批准号:
1757042 - 财政年份:2018
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Workshop: International Computational Linguistics Olympiad 2017
研讨会:2017 年国际计算语言学奥林匹克竞赛
- 批准号:
1654253 - 财政年份:2017
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
The International Linguistics Olympiad: Preparing High School Students for the Study of Human Language and Computation
国际语言学奥林匹克:为高中生学习人类语言和计算做好准备
- 批准号:
1137828 - 财政年份:2011
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
SGER: Collaborative Research: New Problem Genres for the North American Computational Linguistics Olympiad
SGER:协作研究:北美计算语言学奥林匹克竞赛的新问题类型
- 批准号:
0838848 - 财政年份:2008
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Active Selection of Data for Machine Translation
主动选择机器翻译数据
- 批准号:
0713292 - 财政年份:2007
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Planning Workshop for a Computational Linguistics Olympiad
计算语言学奥林匹克竞赛规划研讨会
- 批准号:
0633871 - 财政年份:2006
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
相似国自然基金
跨膜蛋白LRP5胞外域调控膜受体TβRI促钛表面BMSCs归巢、分化的研究
- 批准号:82301120
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
Dectin-2通过促进FcεRI聚集和肥大细胞活化加剧哮喘发作的机制研究
- 批准号:82300022
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
TβRI的UFM化修饰调控TGF-β信号通路和乳腺癌转移的作用及机制研究
- 批准号:32200568
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
藏药甘肃蚤缀β-咔啉生物碱类TβRI抑制剂的发现及其抗肺纤维化作用机制研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
藏药甘肃蚤缀β-咔啉生物碱类TβRI抑制剂的发现及其抗肺纤维化作用机制研究
- 批准号:82204762
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312841 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312842 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313151 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Continuing Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312840 - 财政年份:2023
- 资助金额:
$ 89.8万 - 项目类别:
Standard Grant