RI: Medium: Collaborative Research: Variance and Invariance in Voice Quality: Implications for Machine and Human Speaker Identification
RI:媒介:协作研究:语音质量的方差和不变性:对机器和人类说话人识别的影响
基本信息
- 批准号:1704167
- 负责人:
- 金额:$ 85.16万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A talker's voice quality conveys many kinds of information, including word and utterance prosody, emotional state, and personal identity. Variations in both the voice source and the vocal tract affect voice quality and there can be significant inter- and intra-talker variability. Understanding what aspects of a voice are talker-specific should aid in understanding the human limits in perceiving speaker differences and in developing better speaker identification (SID) algorithms. Despite technological advances, the performance of current SID systems remains far from perfect, and degrades significantly when the training and testing conditions are mismatched especially in terms of speech style (conversational versus read for example), speaker's emotional status, when the utterances are short, and when the task is text-independent. The key questions that the project aims to answer are: under normal daily life variability, how often does a talker sound less like him- or herself and more like someone else? Which acoustic properties account for speaker similarity? Can automatic speaker identification (SID) algorithms be improved by knowledge of which properties are important for human perception of speaker similarity?The project is a transformative one and helps better understand and model variance and invariance in voice quality. It will inform several important issues in human speech perception, especially in the area of talker similarity. Understanding what aspects of the source signal, if any, are talker-specific, should aid in developing better speaker identification and verification algorithms that are able to handle short utterances and are robust to varying affect and styles of speaking. A model of voice quality variations could also improve the naturalness of text-to-speech (TTS) systems. If it were known how much a person could change his or her voice quality without compromising their vocal identity, this knowledge could also inform medical rehab applications and forensics. A better understanding of voice quality will thus be of significant impact scientifically, and for engineering, forensic, and medical applications. The project has strong outreach and dissemination programs and fosters interdisciplinary activities in Electrical Engineering, Linguistics, and Speech and Hearing Science at UCLA and the Center of Excellence at JHU. It trains undergraduate and graduate students in important cross-disciplinary activities of technological and scientific significance. The results will be published in high-quality journals and presented at relevant international conferences. The research results - a set of databases, software tools, and publications will be disseminated freely.The project analyzes and discovers how the speech signal varies within and across talkers under circumstances that introduce variability in everyday life situations. Specifically, it investigates whether an individual talker's speech varies significantly across recording sessions and speech tasks. Most importantly, it examines how intra-talker variability from all these sources of variability compares with inter-talker variability. Understanding these issues requires a high-quality speech database with multiple voice samples from many talkers (in this case 200) which are collected, annotated, and distributed to other researchers. Acoustic analyses reveals inter- and intra-talker variability in the speech signal across different situations by generating a multi- dimensional acoustic profile of each talker that specifies the range of parameter values that are typical in the corpus for that talker, and the likelihood of deviations from that usual profile. Perceptual studies determine the extent to which parameter profiles predict perceived similarity, and how much variability in each parameter can be tolerated before talkers cease to sound like themselves. Insights from the acoustic and perceptual studies guide the development of robust text-dependent and text-independent SID algorithms that are anticipated to be robust to variations in affect, style, and for short utterances.
说话人的音质传达了多种信息,包括词语和话语韵律、情绪状态和个人身份。声源和声道的变化都会影响语音质量,并且说话者之间和内部可能存在显著的变化。了解声音的哪些方面是特定于说话者的,应该有助于理解人类在感知说话者差异和开发更好的说话者识别(SID)算法方面的局限性。尽管技术进步了,但当前SID系统的性能仍然远远不够完美,当训练和测试条件不匹配时,特别是在语音风格(例如会话与阅读)、说话者的情绪状态、话语较短以及任务与文本无关时,性能会显著下降。该项目旨在回答的关键问题是:在正常的日常生活变化下,说话者听起来不像他或她自己而更像别人的频率是多少?哪些声学特性可以解释说话者的相似性?自动说话人识别(SID)算法是否可以通过了解哪些属性对人类对说话人相似性的感知很重要来改进?该项目是一个变革性的项目,有助于更好地理解和建模语音质量的差异和不变性。它将为人类语音感知的几个重要问题提供信息,特别是在说话人相似度领域。了解源信号的哪些方面(如果有的话)是特定于说话者的,应该有助于开发更好的说话者识别和验证算法,这些算法能够处理简短的话语,并且对不同的影响和说话风格具有鲁棒性。语音质量变化的模型也可以提高文本到语音(TTS)系统的自然度。如果知道一个人可以在不损害其声音身份的情况下改变多少音质,这一知识也可以为医疗康复应用和法医学提供信息。因此,更好地理解语音质量将对科学、工程、法医和医学应用产生重大影响。该项目拥有强大的外展和传播计划,并促进加州大学洛杉矶分校和JHU卓越中心在电气工程、语言学、言语和听力科学方面的跨学科活动。它培养本科生和研究生从事具有重要技术和科学意义的跨学科活动。研究结果将发表在高质量的期刊上,并在有关的国际会议上提出。研究成果-一套数据库、软件工具和出版物将免费传播。该项目分析并发现在日常生活中引入变化的情况下,说话者内部和之间的语音信号是如何变化的。具体来说,它调查了说话者的讲话是否在录音过程和演讲任务中有显著差异。最重要的是,它检验了来自所有这些可变性来源的谈话者内部可变性与谈话者内部可变性的比较。理解这些问题需要一个高质量的语音数据库,其中包含来自许多说话者(在本例中为200人)的多个语音样本,这些样本被收集、注释并分发给其他研究人员。声学分析通过生成每个说话者的多维声学特征,揭示了不同情况下说话者之间和内部语音信号的可变性,该特征指定了该说话者语料库中典型参数值的范围,以及偏离该通常特征的可能性。感知研究决定了参数轮廓预测感知相似性的程度,以及在说话者不再像他们自己之前,每个参数的变化程度是可以容忍的。声学和感知研究的见解指导了健壮的依赖文本和不依赖文本的SID算法的发展,这些算法预计对情感、风格和简短话语的变化具有健壮性。
项目成果
期刊论文数量(19)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification
- DOI:10.21437/interspeech.2020-2957
- 发表时间:2020-08
- 期刊:
- 影响因子:0
- 作者:Vijay Ravi;Ruchao Fan;Amber Afshan;Huanhua Lu;A. Alwan
- 通讯作者:Vijay Ravi;Ruchao Fan;Amber Afshan;Huanhua Lu;A. Alwan
Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
- DOI:10.21437/interspeech.2020-3006
- 发表时间:2020-08
- 期刊:
- 影响因子:0
- 作者:Amber Afshan;Jinxi Guo;S. Park;Vijay Ravi;A. McCree;A. Alwan
- 通讯作者:Amber Afshan;Jinxi Guo;S. Park;Vijay Ravi;A. McCree;A. Alwan
Target and Non-target Speaker Discrimination by Humans and Machines
人类和机器对目标和非目标说话者的辨别
- DOI:10.1109/icassp.2019.8683362
- 发表时间:2019
- 期刊:
- 影响因子:0
- 作者:Park, Soo Jin;Afshan, Amber;Kreiman, Jody;Yeung, Gary;Alwan, Abeer
- 通讯作者:Alwan, Abeer
Acoustic voice variation in spontaneous speech
自发言语中的声学语音变化
- DOI:10.1121/10.0011471
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Lee, Yoonjeong;Kreiman, Jody
- 通讯作者:Kreiman, Jody
Acoustic voice variation within and between speakers
说话者内部和说话者之间的声音变化
- DOI:10.1121/1.5125134
- 发表时间:2019
- 期刊:
- 影响因子:0
- 作者:Lee, Yoonjeong;Keating, Patricia;Kreiman, Jody
- 通讯作者:Kreiman, Jody
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Abeer Alwan其他文献
Modeling auditory perception to improve robust speech recognition
建立听觉感知模型以提高稳健的语音识别能力
- DOI:
- 发表时间:
1997 - 期刊:
- 影响因子:0
- 作者:
B. Strope;Abeer Alwan - 通讯作者:
Abeer Alwan
Unraveling the associations between voice pitch and major depressive disorder: a multisite genetic study
揭示声音音调与重度抑郁症之间的关联:一项多站点遗传研究
- DOI:
10.1038/s41380-024-02877-y - 发表时间:
2024-12-31 - 期刊:
- 影响因子:10.100
- 作者:
Yazheng Di;Elior Rahmani;Joel Mefford;Jinhan Wang;Vijay Ravi;Aditya Gorla;Abeer Alwan;Kenneth S. Kendler;Tingshao Zhu;Jonathan Flint - 通讯作者:
Jonathan Flint
Optical Phonetics and Visual Percep Stress in Eng
英语中的光学语音和视觉感知压力
- DOI:
- 发表时间:
2003 - 期刊:
- 影响因子:0
- 作者:
P. Keating;Marco Baroni;Sven Matty;E. T. Auer;Rebecca Scarborough;Abeer Alwan;E. Bernstein - 通讯作者:
E. Bernstein
Towards Automatically Assessing Children’s Picture Description Tasks
自动评估儿童图片描述任务
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Hariram Veeramani;Natarajan Balaji Shankar;Alexander Johnson;Abeer Alwan - 通讯作者:
Abeer Alwan
An Analysis of Large Language Models for African American English Speaking Children’s Oral Language Assessment
非裔美国英语儿童口语评估大语言模型分析
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Alexander Johnson;Christina Chance;Kaycee Stiemke;Hariram Veeramani;Natarajan Balaji Shankar;Abeer Alwan - 通讯作者:
Abeer Alwan
Abeer Alwan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Abeer Alwan', 18)}}的其他基金
Collaborative Research: Improving speech technology for better learning outcomes: the case of AAE child speakers
协作研究:改进语音技术以获得更好的学习成果:AAE 儿童扬声器的案例
- 批准号:
2202585 - 财政年份:2022
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: From Ultrasound and MRI to articulatory and acoustic models of child speech development
合作研究:RI:小型:从超声和 MRI 到儿童言语发展的发音和声学模型
- 批准号:
2006979 - 财政年份:2020
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Workshop for Undergraduate and MS Female Students in Speech Science and Technology
语音科学与技术本科生和女硕士讲习班
- 批准号:
1745166 - 财政年份:2017
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
NRI: INT: COLLAB: Development, Deployment and Evaluation of Personalized Learning Companion Robots for Early Literacy and Language Learning
NRI:INT:COLLAB:用于早期识字和语言学习的个性化学习伴侣机器人的开发、部署和评估
- 批准号:
1734380 - 财政年份:2017
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
A Workshop for Junior Female Researchers in Speech Science and Technology
语音科学与技术青年女性研究员研讨会
- 批准号:
1637240 - 财政年份:2016
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
The Role of Speech Science in Developing Robust Speech Technology Applications
语音科学在开发强大的语音技术应用中的作用
- 批准号:
1543522 - 财政年份:2015
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
EAGER: Collaborative Research: Models of Child Speech
EAGER:合作研究:儿童言语模型
- 批准号:
1551113 - 财政年份:2015
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
EAGER: Variance and Invariance in Voice Quality
EAGER:语音质量的方差和不变性
- 批准号:
1450992 - 财政年份:2014
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
EAGER: Collaborative Research: Towards Modeling Human Speech Confusions in Noise
EAGER:协作研究:对噪声中的人类语音混乱进行建模
- 批准号:
1247809 - 财政年份:2012
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
RI: Small: A New Voice Source Model: From Glottal Areas to Better Speech Synthesis
RI:Small:一种新的语音源模型:从声门区域到更好的语音合成
- 批准号:
1018863 - 财政年份:2010
- 资助金额:
$ 85.16万 - 项目类别:
Continuing Grant
相似海外基金
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312841 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312842 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313151 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Continuing Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312840 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313149 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Continuing Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
- 批准号:
2312374 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
- 批准号:
2312373 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
- 批准号:
2312955 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Informed, Fair, Efficient, and Incentive-Aware Group Decision Making
协作研究:RI:媒介:知情、公平、高效和具有激励意识的群体决策
- 批准号:
2313137 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313150 - 财政年份:2023
- 资助金额:
$ 85.16万 - 项目类别:
Continuing Grant