权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Collaborative Research: Landmark-based Robust Speech Recognition using Prosody-guided Models of Speech Variability

协作研究：使用韵律引导的语音变异模型进行基于地标的鲁棒语音识别

基本信息

批准号：
0703805
负责人：
Abeer Alwan
金额：
--
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2007
资助国家：
美国
起止时间：
2007-06-01 至 2011-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0703805&HistoricalAwards=false
关键词：
Collaborative Research Landmark based Robust

项目摘要

Proposal ID 0703859 Date 04/11/2007 Despite great strides in the development of automatic speech recognition technology, we do not yet have a system with performance comparable to humans in automatically transcribing unrestricted conversational speech, representing many speakers and dialects, and embedded in adverse acoustic environments. This approach applies new high-dimensional machine learning techniques, constrained by empirical and theoretical studies of speech production and perception, to learn from data the information structures that human listeners extract from speech. To do this, we will develop large-vocabulary psychologically realistic models of speech acoustics, pronunciation variability, prosody, and syntax by deriving knowledge representations that reflect those proposed for human speech production and speech perception, using machine learning techniques to adjust the parameters of all knowledge representations simultaneously in order to minimize the structural risk of the recognizer. The team will develop nonlinear acoustic landmark detectors and pattern classifiers that integrate auditory-based signal processing and acoustic phonetic processing, are invariant to noise, change in speaker characteristics and reverberation, and can be learned in a semi-supervised fashion from labeled and unlabeled data. In addition, they will use variable frame rate analysis, which will allow for multi-resolution analysis, as well as implement lexical access based on gesture, using a variety of training data. The work will improve communication and collaboration between people and machines and also improve understanding of how human produce and perceive speech. The work brings together a team of experts in speech processing, acoustic phonetics, prosody, gestural phonology, statistical pattern matching, language modeling, and speech perception, with faculty across engineering, computer science and linguistics. Support and engagement of students and postdoctoral fellows are part of the project, engaging in speech modeling and algorithm development. Finally, the proposed work will result in a set of databases and tools that will be disseminated to serve the research and education community at large.

提案ID 0703859日期04/11/2007尽管自动语音识别技术的发展取得了长足的进步，但我们还没有一个系统在自动转录不受限制的对话语音方面具有与人类相当的性能，代表许多说话者和方言，并嵌入在恶劣的声学环境中。这种方法应用新的高维机器学习技术，受语音产生和感知的经验和理论研究的约束，从数据中学习人类听众从语音中提取的信息结构。要做到这一点，我们将开发大词汇量的语音声学，发音变化，韵律和语法的心理现实模型，通过推导知识表示，反映那些建议人类的语音生产和语音感知，使用机器学习技术来调整所有知识表示的参数，同时为了最大限度地减少识别器的结构风险。该团队将开发非线性声学地标检测器和模式分类器，这些检测器和模式分类器集成了基于语音的信号处理和声学语音处理，对噪声、说话者特征和混响的变化具有不变性，并且可以以半监督的方式从标记和未标记的数据中学习。此外，他们将使用可变帧速率分析，这将允许多分辨率分析，以及使用各种训练数据实现基于手势的词汇访问。这项工作将改善人与机器之间的沟通和协作，并提高对人类如何产生和感知语音的理解。这项工作汇集了语音处理，声学语音学，韵律学，手势音位学，统计模式匹配，语言建模和语音感知方面的专家团队，以及工程，计算机科学和语言学方面的教师。学生和博士后研究员的支持和参与是该项目的一部分，从事语音建模和算法开发。最后，拟议的工作将产生一套数据库和工具，将分发给广大研究和教育界。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Abeer Alwan其他文献

Modeling auditory perception to improve robust speech recognition

建立听觉感知模型以提高稳健的语音识别能力

DOI：
发表时间：
1997
期刊：
Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136)
影响因子：
0
作者：
B. Strope;Abeer Alwan
通讯作者：
Abeer Alwan

Unraveling the associations between voice pitch and major depressive disorder: a multisite genetic study

揭示声音音调与重度抑郁症之间的关联：一项多站点遗传研究

DOI：
10.1038/s41380-024-02877-y
发表时间：
2024-12-31
期刊：
MOLECULAR PSYCHIATRY
影响因子：
10.100
作者：
Yazheng Di;Elior Rahmani;Joel Mefford;Jinhan Wang;Vijay Ravi;Aditya Gorla;Abeer Alwan;Kenneth S. Kendler;Tingshao Zhu;Jonathan Flint
通讯作者：
Jonathan Flint