权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Collaborative Research: Multilingual Gestural Models for Robust Language-Independent Speech Recognition

RI：媒介：协作研究：用于鲁棒语言无关语音识别的多语言手势模型

基本信息

批准号：
1162046
负责人：
Vikramjit Mitra
金额：
$ 10.87万
依托单位：
SRI International
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-10-01 至 2015-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1162046&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Multilingual

项目摘要

Current state-of-the-art automatic speech recognition (ASR) systems typically model speech as a string of acoustically-defined phones and use contextualized phone units, such as tri-phones or quin-phones to model contextual influences due to coarticulation. Such acoustic models may suffer from data sparsity and may fail to capture coarticulation appropriately because the span of a tri- or quin-phone's contextual influence is not flexible. In a small vocabulary context, however, research has shown that ASR systems which estimate articulatory gestures from the acoustics and incorporate these gestures in the ASR process can better model coarticulation and are more robust to noise. The current project investigates the use of estimated articulatory gestures in large vocabulary automatic speech recognition. Gestural representations of the speech signal are initially created from the acoustic waveform using the Task Dynamic model of speech production. These data are then used to train automatic models for articulatory gesture recognition where the articulatory gestures serve as subword units in the gesture-based ASR system. The main goal of the proposed work is to evaluate the performance of a large-vocabulary gesture-based ASR system using American English (AE). The gesture-based system will be compared to a set of competitive state-of-the-art recognition systems in term of word and phone recognition accuracies, both under clean and noisy acoustic background conditions.The broad impact of this research is threefold: (1) the creation of a large vocabulary American English (AE) speech database containing acoustic waveforms and their articulatory representations, (2) the introduction of novel machine learning techniques to model articulatory representations from acoustic waveforms, and (3) the development of a large vocabulary ASR system that uses articulatory representation as subword units. The robust and accurate ASR system for AE resulting from the proposed project will deal effectively with speech variability, thereby significantly enhancing communication and collaboration between people and machines in AE, and with the promise to generalize the method to multiple languages. The knowledge gained and the systems developed will contribute to the broad application of articulatory features in speech processing, and will have the potential to transform the fields of ASR, speech-mediated person-machine interaction, and automatic translation among languages. The interdisciplinary collaboration will facilitate a cross-disciplinary learning environment for the participating faculty, researchers, graduate students and undergraduate students Thus, this collaboration will result in the broader impact of enhanced training in speech modeling and algorithm development. Finally, the proposed work will result in a set of databases and tools that will be disseminated to serve the research and education community at large.

当前最先进的自动语音识别（ASR）系统通常将语音建模为一串声学定义的音素，并使用上下文化的音素单元（诸如三音素或五音素）来建模由于协同发音而引起的上下文影响。这样的声学模型可能遭受数据稀疏性，并且可能无法适当地捕获协同发音，因为三音素或五音素的上下文影响的跨度不灵活。然而，在小词汇量的情况下，研究表明，从声学中估计发音手势并将这些手势纳入ASR过程的ASR系统可以更好地对协同发音进行建模，并且对噪声更具鲁棒性。目前的项目研究使用估计发音手势在大词汇量的自动语音识别。语音信号的手势表示最初使用语音产生的任务动态模型从声学波形创建。这些数据然后被用来训练用于发音姿势识别的自动模型，其中发音姿势在基于姿势的ASR系统中充当子字单元。所提出的工作的主要目标是使用美国英语（AE）的大词汇量的基于手势的ASR系统的性能进行评估。基于手势的系统将与一组具有竞争力的最先进的识别系统在单词和电话识别精度方面进行比较，无论是在干净的还是嘈杂的声学背景条件下。这项研究的广泛影响是三方面的：（1）创建包含声学波形及其发音表示的大词汇量美式英语（AE）语音数据库，（2）引入新的机器学习技术来根据声波波形对发音表示进行建模，以及（3）开发使用发音表示作为子字单元的大词汇量ASR系统。该项目产生的强大而准确的AE ASR系统将有效地处理语音变异，从而显着增强AE中人与机器之间的沟通和协作，并有望将该方法推广到多种语言。所获得的知识和开发的系统将有助于发音特征在语音处理中的广泛应用，并将有可能改变ASR，语音介导的人机交互和语言之间的自动翻译领域。跨学科合作将促进参与教师，研究人员，研究生和本科生的跨学科学习环境，因此，这种合作将导致语音建模和算法开发的增强培训的更广泛的影响。最后，拟议的工作将产生一套数据库和工具，将分发给广大研究和教育界。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Vikramjit Mitra其他文献

Yield of endoscopic ultrasound (EUS)in patients with dilated common bile duct (CBD)and or pancreatic duct (PD) with normal liver function tests (LFTS)and cross- sectional imaging

DOI：
10.1016/j.pan.2012.12.035
发表时间：
2013-01-01
期刊：
Conference abstract
影响因子：
作者：
Vikramjit Mitra;Manu Nayar;Stuart Bonnington;John Scott;Kirsty Anderson;Richard Charnley;Bryon Jaques;Gourab Sen;Steve White;Derek Manas;Jeremy French;Kofi Oppong
通讯作者：
Kofi Oppong

PTH-013 Development of the upper GI recorded image quality index (UGI-RIQI) score and quality assurance tool

PTH-013 上GI记录图像质量指数（UGI-RIQI）评分和质量保证工具的开发

DOI：
发表时间：
2019
期刊：
影响因子：
0
作者：
T. Dove;E. Hawkes;J. Berrill;B. Lee;P. Neville;M. Elzubier;Debasis Majumdar;Vikramjit Mitra
通讯作者：
Vikramjit Mitra

Diagnostic yield of EUS-FNA in pancreatic neuro-endocrine tumours (PNET) – solid versus cystic PNETs – 9 year experience from a tertiary centre

DOI：
10.1016/j.pan.2012.12.051
发表时间：
2013-01-01
期刊：
Conference abstract
影响因子：
作者：
Vikramjit Mitra;Manu Nayar;Beate Haugk;Viney Wadehra;Richard Charnley;Bryon Jaques;Steve White;Derek Manas;Jeremy French;Kofi Oppong
通讯作者：
Kofi Oppong