权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: Estimating Articulatory Constriction Place and Timing from Speech Acoustics

合作研究：从语音声学估计发音收缩位置和时间

基本信息

批准号：
2141275
负责人：
Mark Tiede
金额：
$ 15.4万
依托单位：
Haskins Laboratories, Inc.
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-06-15 至 2023-10-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2141275&HistoricalAwards=false
关键词：
Collaborative Research Estimating Articulatory Constriction

项目摘要

This collaborative project focuses on a new approach for using speech recordings to study speaker pronunciation habits--that is, the way speakers systematically coordinate the articulatory movements of their lips, jaw, tongue, glottis and soft palate to produce words and sentences. These articulatory habits differ between individuals, and across languages and dialects of the same language, accounting for many aspects of foreign accent, speech disorders and speaking style. Whereas previous studies of these habits have required specialized equipment for the immediate observation of articulator movements, the aim of this project is to develop and improve a tool for "speech inversion"--that is, a tool that can accurately recover articulatory movements directly from the acoustic speech signal using machine learning methods. To date, the tool developed by the project team has successfully recovered movements of the tongue and lips; the current project extends the tool’s functionality to encompass nasality (soft palate) and voicing (glottis). Training and validation of the extended system will proceed using a newly collected corpus of acoustic and articulatory data drawn from speakers of American English. This corpus, comprising co-collected audio, nasal, voicing, and articulatory movement, will serve as 'ground truth' for training and assessing the capabilities of the fully trained speech inversion system. As a further test, we will test it against ground truth data from speakers of languages with patterns of articulatory habits known to differ from English.The goal of this project is to develop and refine a Speech Inversion Tool that 'reads' acoustic recordings of speech and 'recovers' details of the magnitude and timing of articulatory movements. The project aims to accomplish this goal by training specialized Neural Network models to relate features of the acoustic signal to separately acquired ground-truth nasal vs. oral outflow signals and concurrent electroglottography. Training data derives from native speakers of English; validation and tests for generalization include productions of speakers of Canadian French and Russian. When successfully validated, the resulting speech inversion tool will be useful for identifying medical issues that affect speech movement organization, such as the well-known disruption of oral/laryngeal timing in speakers with dysarthria. In addition, incorporating estimates of articulation may also aid in the tracking of changes resulting from medical conditions such as depression and schizophrenia. More generally, the ability to rapidly and easily analyze articulatory movements obtained from audio recordings alone has the potential substantially improve Automated Speech Recognition (ASR) systems, and to assist scholars, forensic scientists, and clinical professionals studying the speech of communities under field conditions in rural or under-resourced areas, and to help in the documentation of endangered languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

这个合作项目的重点是使用语音录音来研究说话者发音习惯的新方法——也就是说，说话者系统地协调他们的嘴唇、下巴、舌头、声门和软腭的发音运动来产生单词和句子的方式。这些发音习惯在个体之间，在同一语言的不同语言和方言之间都是不同的，这是外国口音、语言障碍和说话风格的许多方面的原因。鉴于之前对这些习惯的研究需要专门的设备来立即观察发音运动，本项目的目的是开发和改进一种“语音反转”工具——即一种可以使用机器学习方法直接从声学语音信号中准确恢复发音运动的工具。迄今为止，项目组开发的工具已经成功地恢复了舌头和嘴唇的运动；目前的项目扩展了该工具的功能，以涵盖鼻音（软腭）和发声（声门）。扩展系统的训练和验证将使用新收集的来自美国英语使用者的声学和发音数据语料库进行。该语料库包括共同收集的音频、鼻音、发声和发音运动，将作为训练和评估完全训练的语音反转系统能力的“基础事实”。作为进一步的测试，我们将测试它的基础事实数据，这些数据来自那些发音习惯模式与英语不同的语言的使用者。该项目的目标是开发和完善语音反转工具，该工具可以“读取”语音录音，并“恢复”发音运动的大小和时间细节。该项目旨在通过训练专门的神经网络模型来实现这一目标，该模型将声学信号的特征与分别获得的真实鼻部和口腔流出信号以及并发声门电图联系起来。训练数据来源于英语母语者；对泛化的验证和测试包括加拿大法语和俄语使用者的作品。当成功验证后，生成的语音反转工具将用于识别影响语音运动组织的医学问题，例如患有构音障碍的说话者的口腔/喉部时间中断。此外，纳入对发音的估计也可能有助于追踪抑郁症和精神分裂症等医疗状况造成的变化。更广泛地说，从音频记录中快速轻松地分析发音运动的能力有可能大大改善自动语音识别（ASR）系统，并协助学者，法医科学家和临床专业人员在农村或资源不足地区的野外条件下研究社区的语言，并帮助记录濒危语言。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。