权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Small: A New Voice Source Model: From Glottal Areas to Better Speech Synthesis

RI：Small：一种新的语音源模型：从声门区域到更好的语音合成

基本信息

批准号：
1018863
负责人：
Abeer Alwan
金额：
$ 45万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2010
资助国家：
美国
起止时间：
2010-09-01 至 2015-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1018863&HistoricalAwards=false
关键词：
RI Small New Voice Source

项目摘要

The goal of the proposed research is to develop and evaluate a new voicesource model based on physiological observations of the vocal folds of 30 adult speakers. Shortcomings of existing source models can be in part attributed to the way in which they were developed: based on limited data from a few speakers, without direct physiological observations, and without perceptual validation. A larger dataset would help in not only developing a source model that could account for a range of voice qualities within and across speakers, but also result in an understanding of how and which model parameter(s) are speaker and/or gender specific. Model development will consider the perceptual effects of the model's parameters from the earliest stages.A better source model might also improve the performance of speech processing algorithms such as text-to-speech synthesis (TTS). Typically in the development of such algorithms, the emphasis has been on acoustic features related to the speech spectral envelope. The acoustics of the voice source, on the other hand, have received less attention. The proposed work involves: 1) recording high-speed images of vocal foldvibrations with simultaneous audio recordings from 15 male and 15 female speakers, 2) extracting glottal area functions from the images to parameterize a new voice source model, 3) performing perception experiments to uncover which model parameters are perceptually salient, and 4) using the new voice source model in TTS. The project's interdisciplinary team (with expertise in modeling, synthesis, recognition, phonetics, and psycholinguistics) is uniquely qualified to conduct this transformative research.

该研究的目的是开发和评估一种新的基于30名成人扬声器的声带生理观察的声带模型。现有源模型的缺点部分归因于它们的开发方式：基于来自少数说话者的有限数据，没有直接的生理观察，也没有感知验证。一个更大的数据集不仅有助于开发一个可以解释说话者内部和跨说话者的一系列语音质量的源模型，而且还有助于理解模型参数如何以及哪些是说话者和/或性别特定的。模型开发将从最早的阶段考虑模型参数的感知效果。更好的源模型也可能提高语音处理算法的性能，例如文本到语音合成（TTS）。通常在这种算法的开发中，重点是与语音频谱包络相关的声学特征。另一方面，声源的声学特性受到的关注较少。拟议的工作包括：1）记录15名男性和15名女性说话者的声带振动的高速图像和同时的音频记录，2）从图像中提取声门面积函数以参数化新的声源模型，3）进行感知实验以揭示哪些模型参数在感知上是显著的，以及4）将新的声源模型用于TTS。该项目的跨学科团队（具有建模，合成，识别，语音学和心理语言学方面的专业知识）是唯一有资格进行这项变革性研究的团队。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Abeer Alwan其他文献

Modeling auditory perception to improve robust speech recognition

建立听觉感知模型以提高稳健的语音识别能力

DOI：
发表时间：
1997
期刊：
Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136)
影响因子：
0
作者：
B. Strope;Abeer Alwan
通讯作者：
Abeer Alwan

Unraveling the associations between voice pitch and major depressive disorder: a multisite genetic study

揭示声音音调与重度抑郁症之间的关联：一项多站点遗传研究

DOI：
10.1038/s41380-024-02877-y
发表时间：
2024-12-31
期刊：
MOLECULAR PSYCHIATRY
影响因子：
10.100
作者：
Yazheng Di;Elior Rahmani;Joel Mefford;Jinhan Wang;Vijay Ravi;Aditya Gorla;Abeer Alwan;Kenneth S. Kendler;Tingshao Zhu;Jonathan Flint
通讯作者：
Jonathan Flint