权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

RI: Medium: Collaborative Research: Variance and Invariance in Voice Quality: Implications for Machine and Human Speaker Identification

RI：媒介：协作研究：语音质量的方差和不变性：对机器和人类说话人识别的影响

基本信息

批准号：
1704167
负责人：
Abeer Alwan
金额：
$ 85.16万
依托单位：
University of California-Los Angeles
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2017
资助国家：
美国
起止时间：
2017-09-01 至 2023-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1704167&HistoricalAwards=false
关键词：
RI Medium Collaborative Research Variance

项目摘要

A talker's voice quality conveys many kinds of information, including word and utterance prosody, emotional state, and personal identity. Variations in both the voice source and the vocal tract affect voice quality and there can be significant inter- and intra-talker variability. Understanding what aspects of a voice are talker-specific should aid in understanding the human limits in perceiving speaker differences and in developing better speaker identification (SID) algorithms. Despite technological advances, the performance of current SID systems remains far from perfect, and degrades significantly when the training and testing conditions are mismatched especially in terms of speech style (conversational versus read for example), speaker's emotional status, when the utterances are short, and when the task is text-independent. The key questions that the project aims to answer are: under normal daily life variability, how often does a talker sound less like him- or herself and more like someone else? Which acoustic properties account for speaker similarity? Can automatic speaker identification (SID) algorithms be improved by knowledge of which properties are important for human perception of speaker similarity?The project is a transformative one and helps better understand and model variance and invariance in voice quality. It will inform several important issues in human speech perception, especially in the area of talker similarity. Understanding what aspects of the source signal, if any, are talker-specific, should aid in developing better speaker identification and verification algorithms that are able to handle short utterances and are robust to varying affect and styles of speaking. A model of voice quality variations could also improve the naturalness of text-to-speech (TTS) systems. If it were known how much a person could change his or her voice quality without compromising their vocal identity, this knowledge could also inform medical rehab applications and forensics. A better understanding of voice quality will thus be of significant impact scientifically, and for engineering, forensic, and medical applications. The project has strong outreach and dissemination programs and fosters interdisciplinary activities in Electrical Engineering, Linguistics, and Speech and Hearing Science at UCLA and the Center of Excellence at JHU. It trains undergraduate and graduate students in important cross-disciplinary activities of technological and scientific significance. The results will be published in high-quality journals and presented at relevant international conferences. The research results - a set of databases, software tools, and publications will be disseminated freely.The project analyzes and discovers how the speech signal varies within and across talkers under circumstances that introduce variability in everyday life situations. Specifically, it investigates whether an individual talker's speech varies significantly across recording sessions and speech tasks. Most importantly, it examines how intra-talker variability from all these sources of variability compares with inter-talker variability. Understanding these issues requires a high-quality speech database with multiple voice samples from many talkers (in this case 200) which are collected, annotated, and distributed to other researchers. Acoustic analyses reveals inter- and intra-talker variability in the speech signal across different situations by generating a multi- dimensional acoustic profile of each talker that specifies the range of parameter values that are typical in the corpus for that talker, and the likelihood of deviations from that usual profile. Perceptual studies determine the extent to which parameter profiles predict perceived similarity, and how much variability in each parameter can be tolerated before talkers cease to sound like themselves. Insights from the acoustic and perceptual studies guide the development of robust text-dependent and text-independent SID algorithms that are anticipated to be robust to variations in affect, style, and for short utterances.

说话人的音质传达了多种信息，包括词语和话语韵律、情绪状态和个人身份。声源和声道的变化都会影响语音质量，并且说话者之间和内部可能存在显著的变化。了解声音的哪些方面是特定于说话者的，应该有助于理解人类在感知说话者差异和开发更好的说话者识别（SID）算法方面的局限性。尽管技术进步了，但当前SID系统的性能仍然远远不够完美，当训练和测试条件不匹配时，特别是在语音风格（例如会话与阅读）、说话者的情绪状态、话语较短以及任务与文本无关时，性能会显著下降。该项目旨在回答的关键问题是：在正常的日常生活变化下，说话者听起来不像他或她自己而更像别人的频率是多少？哪些声学特性可以解释说话者的相似性？自动说话人识别（SID）算法是否可以通过了解哪些属性对人类对说话人相似性的感知很重要来改进？该项目是一个变革性的项目，有助于更好地理解和建模语音质量的差异和不变性。它将为人类语音感知的几个重要问题提供信息，特别是在说话人相似度领域。了解源信号的哪些方面（如果有的话）是特定于说话者的，应该有助于开发更好的说话者识别和验证算法，这些算法能够处理简短的话语，并且对不同的影响和说话风格具有鲁棒性。语音质量变化的模型也可以提高文本到语音（TTS）系统的自然度。如果知道一个人可以在不损害其声音身份的情况下改变多少音质，这一知识也可以为医疗康复应用和法医学提供信息。因此，更好地理解语音质量将对科学、工程、法医和医学应用产生重大影响。该项目拥有强大的外展和传播计划，并促进加州大学洛杉矶分校和JHU卓越中心在电气工程、语言学、言语和听力科学方面的跨学科活动。它培养本科生和研究生从事具有重要技术和科学意义的跨学科活动。研究结果将发表在高质量的期刊上，并在有关的国际会议上提出。研究成果-一套数据库、软件工具和出版物将免费传播。该项目分析并发现在日常生活中引入变化的情况下，说话者内部和之间的语音信号是如何变化的。具体来说，它调查了说话者的讲话是否在录音过程和演讲任务中有显著差异。最重要的是，它检验了来自所有这些可变性来源的谈话者内部可变性与谈话者内部可变性的比较。理解这些问题需要一个高质量的语音数据库，其中包含来自许多说话者（在本例中为200人）的多个语音样本，这些样本被收集、注释并分发给其他研究人员。声学分析通过生成每个说话者的多维声学特征，揭示了不同情况下说话者之间和内部语音信号的可变性，该特征指定了该说话者语料库中典型参数值的范围，以及偏离该通常特征的可能性。感知研究决定了参数轮廓预测感知相似性的程度，以及在说话者不再像他们自己之前，每个参数的变化程度是可以容忍的。声学和感知研究的见解指导了健壮的依赖文本和不依赖文本的SID算法的发展，这些算法预计对情感、风格和简短话语的变化具有健壮性。

项目成果

期刊论文数量（19）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification

DOI：
10.21437/interspeech.2020-2957
发表时间：
2020-08
期刊：
影响因子：
0
作者：
Vijay Ravi;Ruchao Fan;Amber Afshan;Huanhua Lu;A. Alwan
通讯作者：
Vijay Ravi;Ruchao Fan;Amber Afshan;Huanhua Lu;A. Alwan

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

DOI：
10.21437/interspeech.2020-3006
发表时间：
2020-08
期刊：
ArXiv
影响因子：
0
作者：
Amber Afshan;Jinxi Guo;S. Park;Vijay Ravi;A. McCree;A. Alwan
通讯作者：
Amber Afshan;Jinxi Guo;S. Park;Vijay Ravi;A. McCree;A. Alwan

Target and Non-target Speaker Discrimination by Humans and Machines

人类和机器对目标和非目标说话者的辨别

DOI：
10.1109/icassp.2019.8683362
发表时间：
2019
期刊：
IEEE ICASSP 2019
影响因子：
0
作者：
Park, Soo Jin;Afshan, Amber;Kreiman, Jody;Yeung, Gary;Alwan, Abeer
通讯作者：
Alwan, Abeer

Acoustic voice variation in spontaneous speech

自发言语中的声学语音变化

DOI：
10.1121/10.0011471
发表时间：
2022
期刊：
The Journal of the Acoustical Society of America
影响因子：
0
作者：
Lee, Yoonjeong;Kreiman, Jody
通讯作者：
Kreiman, Jody

Acoustic voice variation within and between speakers

说话者内部和说话者之间的声音变化

DOI：
10.1121/1.5125134
发表时间：
2019
期刊：
The Journal of the Acoustical Society of America
影响因子：
0
作者：
Lee, Yoonjeong;Keating, Patricia;Kreiman, Jody
通讯作者：
Kreiman, Jody

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Abeer Alwan其他文献

Modeling auditory perception to improve robust speech recognition

建立听觉感知模型以提高稳健的语音识别能力

DOI：
发表时间：
1997
期刊：
Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136)
影响因子：
0
作者：
B. Strope;Abeer Alwan
通讯作者：
Abeer Alwan

Unraveling the associations between voice pitch and major depressive disorder: a multisite genetic study

揭示声音音调与重度抑郁症之间的关联：一项多站点遗传研究

DOI：
10.1038/s41380-024-02877-y
发表时间：
2024-12-31
期刊：
MOLECULAR PSYCHIATRY
影响因子：
10.100
作者：
Yazheng Di;Elior Rahmani;Joel Mefford;Jinhan Wang;Vijay Ravi;Aditya Gorla;Abeer Alwan;Kenneth S. Kendler;Tingshao Zhu;Jonathan Flint
通讯作者：
Jonathan Flint