CCRI: Medium: MSP-Podcast: Creating The Largest Speech Emotional Database By Leveraging Existing Naturalistic Recordings
CCRI:媒介:MSP-Podcast:利用现有的自然主义录音创建最大的语音情感数据库
基本信息
- 批准号:2016719
- 负责人:
- 金额:$ 107.54万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-09-01 至 2025-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award develops the MSP-Podcast corpus, which is to be the largest, publicly available, naturalistic speech emotional database. Affective computing is an important research area aiming to understand, analyze, recognize, and synthesize human emotions. Providing emotion capabilities to current interfaces can facilitate transformative applications in areas related to human computer interaction, healthcare, security and defense, education and entertainment. Speech provides an accessible modality for current interfaces, carrying important information beyond the verbal message. However, automatic emotion recognition from speech in realistic domains is a challenging task given the subtle expressive behaviors that occur during human interactions. Current speech emotional databases are limited in size, number of speakers, inadequate/inconsistent emotional descriptors, lack of naturalistic behaviors, and unbalanced emotional content. This CISE community research infrastructure addresses these key barriers, opening new opportunities to explore novel and powerful machine learning systems. The size, naturalness, and speaker and recording variety in the MSP-Podcast corpus allow the research community to create complex but powerful models with millions of parameters that generalize across environment. The MSP-Podcast corpus will also play a key role on other speech processing and human language understanding tasks. For the first time, the community will have the infrastructure to address automatic speech recognition and speaker verification solutions against variations due to emotional content. These improvements will facilitate the transition of emotionally aware algorithms into practical applications with clear societal benefits. The proposed infrastructure relies on a novel approach based on cross-corpus emotion classification along with crowdsource-based annotations to effectively build a large, naturalistic emotional database with balanced emotional content, reduced cost and reduced manual labor. It relies on existing naturalistic recordings available on audio-sharing websites. The first task consists of selecting audio recordings conveying balanced and rich emotional content. The selected recordings contain natural conversations between many different people over various topics, both positive and negative. The second task is to segment the audio recordings into clean, single speaker segments, removing silence segments, background music, noisy segments, or overlapped speech. This process is automated with algorithms for voice activity detection, speaker diarization, background music detection and noise level estimation. The third task is to identify segments conveying balanced and rich emotional content. This task relies on machine learning models trained with existing corpora to retrieve samples with target emotional behaviors (e.g., detectors of “happy” sentences). This step is important since most of the turns are emotionally neutral so randomly selecting turns will lead to a corpus with unbalanced emotional content. The community also plays an important role in the selection of target sentences to be emotionally annotated, with novel grand challenges and outreach activities to support the collection of similar corpora in different languages. The final task is to annotate the emotional content of the retrieved segments, relying on perceptual evaluations conducted on a crowdsourcing platform using a novel evaluation that tracks the performance of the workers in real-time. This scalable approach provides control over the emotional content, increases the speaker diversity, and maintains the spontaneous nature of the recordings.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项开发了MSP播客语料库,这将是最大的,公开的,自然主义的语音情感数据库。情感计算是一个重要的研究领域,旨在理解,分析,识别和合成人类的情感。为当前界面提供情感功能可以促进人机交互、医疗保健、安全和国防、教育和娱乐等领域的变革性应用。语音为当前的界面提供了一种可访问的方式,承载着口头信息之外的重要信息。然而,自动情感识别语音在现实领域是一个具有挑战性的任务,在人类互动过程中发生的微妙的表达行为。目前的语音情感数据库的规模有限,说话人的数量,不充分/不一致的情感描述,缺乏自然的行为,和不平衡的情感内容。这个CISE社区研究基础设施解决了这些关键障碍,为探索新颖而强大的机器学习系统开辟了新的机会。MSP-Podcast语料库的大小、自然度、扬声器和录音的多样性使研究社区能够创建复杂但功能强大的模型,这些模型具有数百万个参数,可以在整个环境中推广。MSP-Podcast语料库也将在其他语音处理和人类语言理解任务中发挥关键作用。该社区将首次拥有基础设施来解决自动语音识别和说话者验证解决方案,以应对情感内容造成的变化。这些改进将促进情感感知算法向具有明显社会效益的实际应用的过渡。所提出的基础设施依赖于一种新的方法,基于跨语料库的情感分类沿着与基于众包的注释,以有效地建立一个大的,自然的情感数据库,平衡的情感内容,降低成本,减少体力劳动。它依赖于音频共享网站上现有的自然主义录音。 第一项任务包括选择传达平衡和丰富情感内容的音频记录。所选的录音包含许多不同的人之间关于各种主题的自然对话,包括积极和消极的。第二个任务是将音频记录分割成干净的单个扬声器段,去除静音段、背景音乐、嘈杂段或重叠语音。这个过程是自动化的语音活动检测,扬声器日记,背景音乐检测和噪声水平估计的算法。第三个任务是识别传达平衡和丰富情感内容的片段。该任务依赖于用现有语料库训练的机器学习模型来检索具有目标情感行为的样本(例如,“快乐”句子的检测器)。这一步很重要,因为大多数回合都是情感中立的,所以随机选择回合将导致语料库具有不平衡的情感内容。社区在选择要进行情感注释的目标句子方面也发挥着重要作用,通过新颖的重大挑战和推广活动来支持不同语言的类似语料库的收集。最后一项任务是注释检索到的片段的情感内容,依赖于众包平台上进行的感知评估,使用实时跟踪工人表现的新颖评估。这种可扩展的方法提供了对情感内容的控制,增加了演讲者的多样性,并保持了录音的自发性。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(13)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Monologue versus Conversation: Differences in Emotion Perception and Acoustic Expressivity
- DOI:10.1109/acii55700.2022.9953814
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Woan-Shiuan Chien;Shreya G. Upadhyay;Wei-Cheng Lin;Ya-Tse Wu;Bo-Hao Su;C. Busso;Chi-Chun Lee
- 通讯作者:Woan-Shiuan Chien;Shreya G. Upadhyay;Wei-Cheng Lin;Ya-Tse Wu;Bo-Hao Su;C. Busso;Chi-Chun Lee
The MSP-Conversation Corpus
MSP-对话语料库
- DOI:10.21437/interspeech.2020-2444
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Martinez-Lucas, Luz;Abdelwahab, Mohammed;Busso, Carlos
- 通讯作者:Busso, Carlos
Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities
- DOI:10.1109/msp.2021.3105939
- 发表时间:2021-11
- 期刊:
- 影响因子:14.9
- 作者:Chi-Chun Lee;K. Sridhar;Jeng-Lin Li;Wei-Cheng Lin;Bo-Hao Su;C. Busso
- 通讯作者:Chi-Chun Lee;K. Sridhar;Jeng-Lin Li;Wei-Cheng Lin;Bo-Hao Su;C. Busso
Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion Recognition
- DOI:10.1109/icassp49357.2023.10096861
- 发表时间:2023-06
- 期刊:
- 影响因子:0
- 作者:Wei-Cheng Lin;C. Busso
- 通讯作者:Wei-Cheng Lin;C. Busso
Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion
- DOI:10.1109/taslp.2023.3244527
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Wei-Cheng Lin;C. Busso
- 通讯作者:Wei-Cheng Lin;C. Busso
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Carlos Busso其他文献
Enhanced Facial Landmarks Detection for Patients with Repaired Cleft Lip and Palate
增强唇裂和腭裂修复患者的面部标志检测
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Karen Rosero;Ali N. Salman;Berrak Sisman;R. Hallac;Carlos Busso - 通讯作者:
Carlos Busso
SPEECH EMOTION RECOGNITION IN REAL STATIC AND DYNAMIC HUMAN-ROBOT INTERACTION SCENARIOS
真实静态和动态人机交互场景中的语音情感识别
- DOI:
10.1016/j.csl.2024.101666 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Nicolás Grágeda;Carlos Busso;Eduardo Alvarado;Ricardo García;R. Mahú;F. Huenupán;N. B. Yoma - 通讯作者:
N. B. Yoma
Mixed Emotion Modelling for Emotional Voice Conversion
用于情感语音转换的混合情感建模
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Kun Zhou;Berrak Sisman;Carlos Busso;Haizhou Li - 通讯作者:
Haizhou Li
Richness and Density of Birds in Timber Nothofagus pumilio Forests and their Unproductive Associated Environments
- DOI:
10.1007/s10531-004-1665-0 - 发表时间:
2005-09-01 - 期刊:
- 影响因子:3.100
- 作者:
María Vanessa Lencinas;Guillermo Martínez Pastur;Marlin Medina;Carlos Busso - 通讯作者:
Carlos Busso
Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline
迈向自然语音转换:具有自动处理管道的 NaturalVoices 数据集
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Ali N. Salman;Zongyang Du;Shreeram Suresh Chandra;Ismail Rasim Ulgen;Carlos Busso;Berrak Sisman - 通讯作者:
Berrak Sisman
Carlos Busso的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Carlos Busso', 18)}}的其他基金
CRI: CI-P: Creating the Largest Speech Emotional Database by Leveraging Existing Naturalistic Recordings
CRI:CI-P:利用现有的自然录音创建最大的语音情感数据库
- 批准号:
1823166 - 财政年份:2018
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
RI: Small: Integrative, Semantic-Aware, Speech-Driven Models for Believable Conversational Agents with Meaningful Behaviors
RI:小型:集成的、语义感知的、语音驱动的模型,用于具有有意义行为的可信会话代理
- 批准号:
1718944 - 财政年份:2017
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
FG 2015 Doctoral Consortium: Travel Support for Graduate Students
FG 2015 博士联盟:研究生旅行支持
- 批准号:
1540944 - 财政年份:2015
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
CAREER: Advanced Knowledge Extraction of Affective Behaviors During Natural Human Interaction
职业:人类自然互动过程中情感行为的高级知识提取
- 批准号:
1453781 - 财政年份:2015
- 资助金额:
$ 107.54万 - 项目类别:
Continuing Grant
EAGER: Exploring the Use of Synthetic Speech as Reference Model to Detect Salient Emotional Segments in Speech
EAGER:探索使用合成语音作为参考模型来检测语音中的显着情感片段
- 批准号:
1329659 - 财政年份:2013
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
WORKSHOP: Doctoral Consortium for the International Conference on Multimodal Interaction (ICMI 2013)
研讨会:多模式交互国际会议博士联盟 (ICMI 2013)
- 批准号:
1346655 - 财政年份:2013
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
RI: Small: Collaborative Research: Exploring Audiovisual Emotion Perception using Data-Driven Computational Modeling
RI:小型:协作研究:使用数据驱动的计算模型探索视听情感感知
- 批准号:
1217104 - 财政年份:2012
- 资助金额:
$ 107.54万 - 项目类别:
Continuing Grant
Workshop: Doctoral Consortium at the 14th International Conference on Multimodal Interaction
研讨会:第14届多模态交互国际会议博士联盟
- 批准号:
1249319 - 财政年份:2012
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
相似海外基金
Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
- 批准号:
2321102 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
RII Track-4:@NASA: Bluer and Hotter: From Ultraviolet to X-ray Diagnostics of the Circumgalactic Medium
RII Track-4:@NASA:更蓝更热:从紫外到 X 射线对环绕银河系介质的诊断
- 批准号:
2327438 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
Collaborative Research: Topological Defects and Dynamic Motion of Symmetry-breaking Tadpole Particles in Liquid Crystal Medium
合作研究:液晶介质中对称破缺蝌蚪粒子的拓扑缺陷与动态运动
- 批准号:
2344489 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402836 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402851 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Continuing Grant
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403122 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
- 批准号:
2403134 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Enabling Graphics Processing Unit Performance Simulation for Large-Scale Workloads with Lightweight Simulation Methods
合作研究:SHF:中:通过轻量级仿真方法实现大规模工作负载的图形处理单元性能仿真
- 批准号:
2402804 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402815 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Tiny Chiplets for Big AI: A Reconfigurable-On-Package System
合作研究:SHF:中:用于大人工智能的微型芯片:可重新配置的封装系统
- 批准号:
2403408 - 财政年份:2024
- 资助金额:
$ 107.54万 - 项目类别:
Standard Grant