权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EAGER: Linguistic Event Extraction and Integration (LEXI): A New Approach to Speech Analysis

EAGER：语言事件提取和集成 (LEXI)：语音分析的新方法

基本信息

批准号：
1651190
负责人：
Stefanie Shattuck-Hufnagel
金额：
$ 21.44万
依托单位：
Massachusetts Institute of Technology
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-09-01 至 2019-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1651190&HistoricalAwards=false
关键词：
EAGER Linguistic Event Extraction Integration

项目摘要

This exploratory project develops a new system for speech signal analysis that can be used to improve automatic speech recognition (ASR) systems, and provide a testable model of human speech perception. The system is based on finding important events in the speech signal, i.e. 'acoustic edges' where the signal changes suddenly because the mouth closes or opens during the formation of a consonant (like /p/ or /s/), or a vowel (like /a/ or /u/). These abrupt changes, called Landmarks, are especially informative, because they (and the parts of the signal near them) are richly informative about the speaker's intended words and their sounds. Focusing on these events results in greater computational efficiency, by identifying the linguistically relevant information in the speech signal, rather than measuring every part of the signal. This focus on individual cues to speech sounds also means that the system can deal with non-typical speech produced by children, older people, speakers with foreign accents, or those with clinical speech disabilities. As a result, this system will bring the benefits of ASR to speakers who are not well served by current recognition systems, making it possible for more people to use cell phones, tablets and laptops. While existing systems work well for typical speakers by using statistical analysis of large samples of typical speech, they leave many people underserved. The Landmark-based system will also provide a tool for testing whether human speech recognition depends on finding the individual cues to the sounds of words, even when those cues are very different in different contexts, and so can lead to the development of a new model of human speech perception.The system works by extracting speech-related measurements from the signal, such as fundamental frequency, formant frequencies, spectral band energies and their derivatives, and interpreting these measures as acoustic cues for distinctive features. Innovative aspects of the system include the use of Landmarks, which are the most robust of the acoustic feature cues and are related to articulatory manner features. Once the landmark acoustic cues are found, other acoustic cues related to place and voicing features, and to prosodic structure, can also be found. The extraction of distinctive features and prosodic structure provides the first abstract linguistic units that can be extracted from the physical continuous signal, and this information is used to identify words, and to construct a representation of the entire utterance. To develop and evaluate the performance of this innovative system, speech databases consisting of isolated vowel-consonant-vowel sequences, read continuous speech, read radio-style speech, and spontaneous speech will be hand-labeled with Landmarks and other acoustic cues. Results of this basic speech research project will support the development of new approaches to ASR, will provide a testable computational model of human speech production, and will produce material suitable for development of a tutorial to train students in engineering, linguistics and cognitive science to label acoustic feature cues.

这个探索性项目开发了一个新的语音信号分析系统，可用于改进自动语音识别（ASR）系统，并提供一个可测试的人类语音感知模型。该系统基于发现语音信号中的重要事件，即：“声学边缘”，在发辅音（如/p/或/s/）或元音（如/a/或/u/）时，由于嘴巴闭合或张开，信号突然改变。这些突然的变化被称为“地标”，尤其具有信息量，因为它们（以及它们附近的信号部分）提供了关于说话者想要表达的单词及其发音的丰富信息。通过识别语音信号中的语言相关信息，而不是测量信号的每个部分，关注这些事件可以提高计算效率。这种对语音个体线索的关注也意味着该系统可以处理由儿童、老年人、外国口音的说话者或临床语言障碍患者发出的非典型语音。因此，该系统将把ASR的好处带给目前识别系统无法很好地服务的说话者，使更多人使用手机，平板电脑和笔记本电脑成为可能。虽然现有的系统通过对典型语音的大量样本进行统计分析，对典型的说话者很好地工作，但它们让许多人得不到充分的服务。基于landmark的系统还将提供一种工具，用于测试人类语音识别是否依赖于寻找单词声音的单个线索，即使这些线索在不同的环境中非常不同，因此可以导致人类语音感知新模型的发展。该系统的工作原理是从信号中提取与语音相关的测量值，如基频、共振峰频率、频谱带能量及其导数，并将这些测量值解释为独特特征的声学线索。该系统的创新方面包括使用地标，这是最强大的声学特征线索，与发音方式特征相关。一旦找到了标志性的声音线索，其他与地点和发声特征以及韵律结构有关的声音线索也可以找到。独特特征和韵律结构的提取提供了第一个可以从物理连续信号中提取的抽象语言单位，这些信息用于识别单词，并构建整个话语的表示。为了开发和评估这一创新系统的性能，由孤立的元音-辅音-元音序列、可读连续语音、可读广播式语音和自发语音组成的语音数据库将被手工标记为地标和其他声学线索。这个基础语音研究项目的结果将支持ASR新方法的发展，将提供一个可测试的人类语音产生的计算模型，并将产生适合开发教程的材料，以培训工程、语言学和认知科学的学生标记声学特征线索。

项目成果

期刊论文数量（18）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A framework for labeling speech with acoustic cues to linguistic distinctive features

用声音线索标记语音的框架，以表达语言的独特特征

DOI：
10.1121/1.5121717
发表时间：
2019
期刊：
The Journal of the Acoustical Society of America
影响因子：
0
作者：
Huilgol, Shreya;Baik, Jinwoo;Shattuck-Hufnagel, Stefanie
通讯作者：
Shattuck-Hufnagel, Stefanie

Irregular pitch periods as a feature cue in the developing speech of English-learning children

不规则的音高周期是英语学习儿童言语发展的一个特征线索

DOI：
10.1121/1.4988262
发表时间：
2017
期刊：
The Journal of the Acoustical Society of America
影响因子：
0
作者：
Hanson, Helen;Shattuck-Hufnagel, Stefanie;Pereira, John
通讯作者：
Pereira, John

Acoustic cues to distinctive features are modified in the speech of typically-developing versus atypically developing children

典型发育儿童与非典型发育儿童的言语中，显着特征的声音线索会发生变化

DOI：
10.1121/1.4988535
发表时间：
2017
期刊：
The Journal of the Acoustical Society of America
影响因子：
0
作者：
Talkar, Tanya;Zuk, Jennifer;Guerrero, Maria X.;Choi, Jeung-Yoon;Shattuck-Hufnagel, Stefanie
通讯作者：
Shattuck-Hufnagel, Stefanie

Detecting glides and their place of articulation using speech-related measurements in a feature-cue-based model

在基于特征提示的模型中使用与语音相关的测量来检测滑音及其发音位置