RI: Medium: Collaborative Research: Variance and Invariance in Voice Quality: Implications for Machine and Human Speaker Identification

RI:媒介:协作研究:语音质量的方差和不变性:对机器和人类说话人识别的影响

基本信息

  • 批准号:
    1704170
  • 负责人:
  • 金额:
    $ 25.09万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2021-08-31
  • 项目状态:
    已结题

项目摘要

A talker's voice quality conveys many kinds of information, including word and utterance prosody, emotional state, and personal identity. Variations in both the voice source and the vocal tract affect voice quality and there can be significant inter- and intra-talker variability. Understanding what aspects of a voice are talker-specific should aid in understanding the human limits in perceiving speaker differences and in developing better speaker identification (SID) algorithms. Despite technological advances, the performance of current SID systems remains far from perfect, and degrades significantly when the training and testing conditions are mismatched especially in terms of speech style (conversational versus read for example), speaker's emotional status, when the utterances are short, and when the task is text-independent. The key questions that the project aims to answer are: under normal daily life variability, how often does a talker sound less like him- or herself and more like someone else? Which acoustic properties account for speaker similarity? Can automatic speaker identification (SID) algorithms be improved by knowledge of which properties are important for human perception of speaker similarity?The project is a transformative one and helps better understand and model variance and invariance in voice quality. It will inform several important issues in human speech perception, especially in the area of talker similarity. Understanding what aspects of the source signal, if any, are talker-specific, should aid in developing better speaker identification and verification algorithms that are able to handle short utterances and are robust to varying affect and styles of speaking. A model of voice quality variations could also improve the naturalness of text-to-speech (TTS) systems. If it were known how much a person could change his or her voice quality without compromising their vocal identity, this knowledge could also inform medical rehab applications and forensics. A better understanding of voice quality will thus be of significant impact scientifically, and for engineering, forensic, and medical applications. The project has strong outreach and dissemination programs and fosters interdisciplinary activities in Electrical Engineering, Linguistics, and Speech and Hearing Science at UCLA and the Center of Excellence at JHU. It trains undergraduate and graduate students in important cross-disciplinary activities of technological and scientific significance. The results will be published in high-quality journals and presented at relevant international conferences. The research results - a set of databases, software tools, and publications will be disseminated freely.The project analyzes and discovers how the speech signal varies within and across talkers under circumstances that introduce variability in everyday life situations. Specifically, it investigates whether an individual talker's speech varies significantly across recording sessions and speech tasks. Most importantly, it examines how intra-talker variability from all these sources of variability compares with inter-talker variability. Understanding these issues requires a high-quality speech database with multiple voice samples from many talkers (in this case 200) which are collected, annotated, and distributed to other researchers. Acoustic analyses reveals inter- and intra-talker variability in the speech signal across different situations by generating a multi- dimensional acoustic profile of each talker that specifies the range of parameter values that are typical in the corpus for that talker, and the likelihood of deviations from that usual profile. Perceptual studies determine the extent to which parameter profiles predict perceived similarity, and how much variability in each parameter can be tolerated before talkers cease to sound like themselves. Insights from the acoustic and perceptual studies guide the development of robust text-dependent and text-independent SID algorithms that are anticipated to be robust to variations in affect, style, and for short utterances.
说话者的语音质量传达了多种信息,包括单词和话语韵律、情绪状态和个人身份。声源和声道的变化都会影响语音质量,并且说话者之间和内部可能存在显着的差异。了解语音的哪些方面是特定于说话者的,应该有助于理解人类在感知说话者差异方面的局限性,并有助于开发更好的说话者识别 (SID) 算法。尽管技术取得了进步,但当前 SID 系统的性能仍然远非完美,并且当训练和测试条件不匹配时,特别是在语音风格(例如对话与阅读)、说话者的情绪状态、话语较短以及任务与文本无关时,性能会显着下降。该项目旨在回答的关键问题是:在正常的日常生活变化下,说话者有多少频率听起来不太像他或她自己,而更像别人?哪些声学特性可以解释说话者的相似性?通过了解哪些属性对于人类感知说话人相似性很重要,可以改进自动说话人识别 (SID) 算法吗?该项目是一项变革性项目,有助于更好地理解和建模语音质量的方差和不变性。它将揭示人类语音感知中的几个重要问题,特别是在说话者相似性领域。了解源信号的哪些方面(如果有的话)是特定于说话者的,应该有助于开发更好的说话者识别和验证算法,这些算法能够处理简短的话语,并且对不同的情感和说话风格具有鲁棒性。语音质量变化模型还可以提高文本转语音 (TTS) 系统的自然度。如果知道一个人可以在多大程度上改变他或她的声音质量而不损害他们的声音身份,那么这种知识也可以为医疗康复应用和取证提供信息。因此,更好地了解语音质量将对科学、工程、法医和医学应用产生重大影响。该项目拥有强大的外展和传播计划,并促进加州大学洛杉矶分校和约翰霍普金斯大学卓越中心在电气工程、语言学、言语和听力科学方面的跨学科活动。它培养本科生和研究生进行具有技术和科学意义的重要跨学科活动。 研究成果将发表在高质量期刊上,并在相关国际会议上发表。研究结果——一组数据库、软件工具和出版物将免费传播。该项目分析并发现在日常生活中引入可变性的情况下,说话者内部和之间的语音信号如何变化。具体来说,它调查单个说话者的语音在录音会话和语音任务中是否存在显着差异。最重要的是,它检查了所有这些变异源的说话者内部变异性与说话者间变异性的比较。了解这些问题需要一个高质量的语音数据库,其中包含来自许多说话者(在本例中为 200 个)的多个语音样本,这些样本被收集、注释并分发给其他研究人员。 声学分析通过生成每个说话者的多维声学轮廓来揭示不同情况下说话者之间和说话者内部语音信号的变异性,该多维声学轮廓指定了该说话者的语料库中典型的参数值的范围,以及偏离通常轮廓的可能性。感知研究确定参数配置文件预测感知相似性的程度,以及在说话者听起来不再像自己之前可以容忍每个参数的变化程度。声学和感知研究的见解指导鲁棒的文本相关和文本无关 SID 算法的开发,这些算法预计对情感、风格和简短话语的变化具有鲁棒性。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
  • DOI:
    10.21437/interspeech.2020-3006
  • 发表时间:
    2020-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amber Afshan;Jinxi Guo;S. Park;Vijay Ravi;A. McCree;A. Alwan
  • 通讯作者:
    Amber Afshan;Jinxi Guo;S. Park;Vijay Ravi;A. McCree;A. Alwan
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Alan McCree其他文献

Alan McCree的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312841
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312842
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313151
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Continuing Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
  • 批准号:
    2312840
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313149
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Continuing Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312374
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
  • 批准号:
    2312373
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
  • 批准号:
    2312955
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Informed, Fair, Efficient, and Incentive-Aware Group Decision Making
协作研究:RI:媒介:知情、公平、高效和具有激励意识的群体决策
  • 批准号:
    2313137
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
  • 批准号:
    2313150
  • 财政年份:
    2023
  • 资助金额:
    $ 25.09万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了