权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CCRI: Medium: MSP-Podcast: Creating The Largest Speech Emotional Database By Leveraging Existing Naturalistic Recordings

CCRI：媒介：MSP-Podcast：利用现有的自然主义录音创建最大的语音情感数据库

基本信息

批准号：
2016719
负责人：
Carlos Busso
金额：
$ 107.54万
依托单位：
University of Texas at Dallas
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-09-01 至 2025-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2016719&HistoricalAwards=false
关键词：
CCRI Medium MSP Podcast Creating

项目摘要

This award develops the MSP-Podcast corpus, which is to be the largest, publicly available, naturalistic speech emotional database. Affective computing is an important research area aiming to understand, analyze, recognize, and synthesize human emotions. Providing emotion capabilities to current interfaces can facilitate transformative applications in areas related to human computer interaction, healthcare, security and defense, education and entertainment. Speech provides an accessible modality for current interfaces, carrying important information beyond the verbal message. However, automatic emotion recognition from speech in realistic domains is a challenging task given the subtle expressive behaviors that occur during human interactions. Current speech emotional databases are limited in size, number of speakers, inadequate/inconsistent emotional descriptors, lack of naturalistic behaviors, and unbalanced emotional content. This CISE community research infrastructure addresses these key barriers, opening new opportunities to explore novel and powerful machine learning systems. The size, naturalness, and speaker and recording variety in the MSP-Podcast corpus allow the research community to create complex but powerful models with millions of parameters that generalize across environment. The MSP-Podcast corpus will also play a key role on other speech processing and human language understanding tasks. For the first time, the community will have the infrastructure to address automatic speech recognition and speaker verification solutions against variations due to emotional content. These improvements will facilitate the transition of emotionally aware algorithms into practical applications with clear societal benefits. The proposed infrastructure relies on a novel approach based on cross-corpus emotion classification along with crowdsource-based annotations to effectively build a large, naturalistic emotional database with balanced emotional content, reduced cost and reduced manual labor. It relies on existing naturalistic recordings available on audio-sharing websites. The first task consists of selecting audio recordings conveying balanced and rich emotional content. The selected recordings contain natural conversations between many different people over various topics, both positive and negative. The second task is to segment the audio recordings into clean, single speaker segments, removing silence segments, background music, noisy segments, or overlapped speech. This process is automated with algorithms for voice activity detection, speaker diarization, background music detection and noise level estimation. The third task is to identify segments conveying balanced and rich emotional content. This task relies on machine learning models trained with existing corpora to retrieve samples with target emotional behaviors (e.g., detectors of “happy” sentences). This step is important since most of the turns are emotionally neutral so randomly selecting turns will lead to a corpus with unbalanced emotional content. The community also plays an important role in the selection of target sentences to be emotionally annotated, with novel grand challenges and outreach activities to support the collection of similar corpora in different languages. The final task is to annotate the emotional content of the retrieved segments, relying on perceptual evaluations conducted on a crowdsourcing platform using a novel evaluation that tracks the performance of the workers in real-time. This scalable approach provides control over the emotional content, increases the speaker diversity, and maintains the spontaneous nature of the recordings.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该奖项开发了MSP播客语料库，这将是最大的，公开的，自然主义的语音情感数据库。情感计算是一个重要的研究领域，旨在理解，分析，识别和合成人类的情感。为当前界面提供情感功能可以促进人机交互、医疗保健、安全和国防、教育和娱乐等领域的变革性应用。语音为当前的界面提供了一种可访问的方式，承载着口头信息之外的重要信息。然而，自动情感识别语音在现实领域是一个具有挑战性的任务，在人类互动过程中发生的微妙的表达行为。目前的语音情感数据库的规模有限，说话人的数量，不充分/不一致的情感描述，缺乏自然的行为，和不平衡的情感内容。这个CISE社区研究基础设施解决了这些关键障碍，为探索新颖而强大的机器学习系统开辟了新的机会。MSP-Podcast语料库的大小、自然度、扬声器和录音的多样性使研究社区能够创建复杂但功能强大的模型，这些模型具有数百万个参数，可以在整个环境中推广。MSP-Podcast语料库也将在其他语音处理和人类语言理解任务中发挥关键作用。该社区将首次拥有基础设施来解决自动语音识别和说话者验证解决方案，以应对情感内容造成的变化。这些改进将促进情感感知算法向具有明显社会效益的实际应用的过渡。所提出的基础设施依赖于一种新的方法，基于跨语料库的情感分类沿着与基于众包的注释，以有效地建立一个大的，自然的情感数据库，平衡的情感内容，降低成本，减少体力劳动。它依赖于音频共享网站上现有的自然主义录音。第一项任务包括选择传达平衡和丰富情感内容的音频记录。所选的录音包含许多不同的人之间关于各种主题的自然对话，包括积极和消极的。第二个任务是将音频记录分割成干净的单个扬声器段，去除静音段、背景音乐、嘈杂段或重叠语音。这个过程是自动化的语音活动检测，扬声器日记，背景音乐检测和噪声水平估计的算法。第三个任务是识别传达平衡和丰富情感内容的片段。该任务依赖于用现有语料库训练的机器学习模型来检索具有目标情感行为的样本（例如，“快乐”句子的检测器）。这一步很重要，因为大多数回合都是情感中立的，所以随机选择回合将导致语料库具有不平衡的情感内容。社区在选择要进行情感注释的目标句子方面也发挥着重要作用，通过新颖的重大挑战和推广活动来支持不同语言的类似语料库的收集。最后一项任务是注释检索到的片段的情感内容，依赖于众包平台上进行的感知评估，使用实时跟踪工人表现的新颖评估。这种可扩展的方法提供了对情感内容的控制，增加了演讲者的多样性，并保持了录音的自发性。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Monologue versus Conversation: Differences in Emotion Perception and Acoustic Expressivity

DOI：
10.1109/acii55700.2022.9953814
发表时间：
2022-10
期刊：
2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII)
影响因子：
0
作者：
Woan-Shiuan Chien;Shreya G. Upadhyay;Wei-Cheng Lin;Ya-Tse Wu;Bo-Hao Su;C. Busso;Chi-Chun Lee
通讯作者：
Woan-Shiuan Chien;Shreya G. Upadhyay;Wei-Cheng Lin;Ya-Tse Wu;Bo-Hao Su;C. Busso;Chi-Chun Lee

The MSP-Conversation Corpus

MSP-对话语料库

DOI：
10.21437/interspeech.2020-2444
发表时间：
2020
期刊：
Interspeech 2020
影响因子：
0
作者：
Martinez-Lucas, Luz;Abdelwahab, Mohammed;Busso, Carlos
通讯作者：
Busso, Carlos

Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities

DOI：
10.1109/msp.2021.3105939
发表时间：
2021-11
期刊：
IEEE Signal Processing Magazine
影响因子：
14.9
作者：
Chi-Chun Lee;K. Sridhar;Jeng-Lin Li;Wei-Cheng Lin;Bo-Hao Su;C. Busso
通讯作者：
Chi-Chun Lee;K. Sridhar;Jeng-Lin Li;Wei-Cheng Lin;Bo-Hao Su;C. Busso

Role of Lexical Boundary Information in Chunk-Level Segmentation for Speech Emotion Recognition

DOI：
10.1109/icassp49357.2023.10096861
发表时间：
2023-06
期刊：
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Wei-Cheng Lin;C. Busso
通讯作者：
Wei-Cheng Lin;C. Busso

Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion

DOI：
10.1109/taslp.2023.3244527
发表时间：
2023
期刊：
IEEE/ACM Transactions on Audio, Speech, and Language Processing
影响因子：
0
作者：
Wei-Cheng Lin;C. Busso
通讯作者：
Wei-Cheng Lin;C. Busso

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Carlos Busso其他文献

Enhanced Facial Landmarks Detection for Patients with Repaired Cleft Lip and Palate

增强唇裂和腭裂修复患者的面部标志检测

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Karen Rosero;Ali N. Salman;Berrak Sisman;R. Hallac;Carlos Busso
通讯作者：
Carlos Busso

SPEECH EMOTION RECOGNITION IN REAL STATIC AND DYNAMIC HUMAN-ROBOT INTERACTION SCENARIOS

真实静态和动态人机交互场景中的语音情感识别

DOI：
10.1016/j.csl.2024.101666
发表时间：
2024
期刊：
Computer Speech & Language
影响因子：
0
作者：
Nicolás Grágeda;Carlos Busso;Eduardo Alvarado;Ricardo García;R. Mahú;F. Huenupán;N. B. Yoma
通讯作者：
N. B. Yoma