Deep Learning Based Complex Spectral Mapping for Multi-Channel Speaker Separation and Speech Enhancement
基于深度学习的复杂频谱映射,用于多通道说话人分离和语音增强
基本信息
- 批准号:2125074
- 负责人:
- 金额:$ 39.06万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-08-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Despite tremendous advances in deep learning based speech separation and automatic speech recognition, a major challenge remains how to separate concurrent speakers and recognize their speech in the presence of room reverberation and background noise. This project will develop a multi-channel complex spectral mapping approach to multi-talker speaker separation and speech enhancement in order to improve speech recognition performance in such conditions. The proposed approach trains deep neural networks to predict the real and imaginary parts of individual talkers from the multi-channel input in the complex domain. After overlapped speakers are separated into simultaneous streams, sequential grouping will be performed for speaker diarization, which is the task of grouping the speech utterances of the same talker over intervals with the utterances of other speakers and pauses. Proposed speaker diarization will integrate spatial and spectral speaker features, which will be extracted through multi-channel speaker localization and single-channel speaker embedding. Recurrent neural networks will be trained to perform classification for the purpose of speaker diarization, which can handle an arbitrary number of speakers in a meeting. The proposed separation system will be evaluated using open, multi-channel speaker separation datasets that contain both room reverberation and background noise. The results from this project are expected to substantially elevate the performance of continuous speaker separation, as well as speaker diarization, in adverse acoustic environments, helping to close the performance gap between recognizing single-talker speech and recognizing multi-talker speech.The overall goal of this project is to develop a deep learning system that can continuously separate individual speakers in a conversational or meeting setting and accurately recognize the utterances of these speakers. Building on recent advances on simultaneous grouping to separate and enhance overlapped speakers in a talker-independent fashion, the project is mainly focused on speaker diarization, which aims to group the speech utterances of the same speaker across time. To achieve speaker diarization, deep learning based sequential grouping will be performed and it will integrate spatial and spectral speaker characteristics. Through sequential organization, simultaneous streams will be grouped with earlier-separated speaker streams to form sequential streams, each of which corresponds to all the utterances of the same speaker up to the current time. Speaker localization and classification will be investigated to make sequential grouping capable of creating new sequential streams and handling an arbitrary number of speakers in a meeting scenario. With the added spatial dimension, the proposed diarization approach provides a solution to the question of who spoke when and where, significantly expanding the traditional scope of who spoke when. The proposed separation system will be evaluated using multi-channel speaker separation datasets that contain highly overlapped speech in recorded conversations, as well as room reverberation and background noise present in real environments. The main evaluation metric will be word error rate in automatic speech recognition. The performance of speaker diarization will be measured using diarization error rate.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
尽管基于深度学习的语音分离和自动语音识别取得了巨大进步,但主要挑战仍然是如何在存在室内混响和背景噪声的情况下分离并发说话者并识别他们的语音。该项目将开发一种多通道复杂谱映射方法来进行多说话者说话者分离和语音增强,以提高此类条件下的语音识别性能。所提出的方法训练深度神经网络,以从复域中的多通道输入预测个体说话者的真实的和虚部。在重叠的说话者被分离成同时的流之后,将执行用于说话者日志化的顺序分组,其是将同一说话者的语音话语在间隔上与其他说话者的话语和停顿分组的任务。提出的说话人日志化将空间和频谱说话人特征相结合,通过多通道说话人定位和单通道说话人嵌入来提取。循环神经网络将被训练来执行分类,以用于扬声器日记化,它可以处理会议中任意数量的扬声器。建议的分离系统将使用开放的,多通道扬声器分离数据集,包含房间混响和背景噪声进行评估。该项目的结果预计将大大提高连续扬声器分离的性能,以及扬声器日记,在恶劣的声学环境,有助于缩小识别单个说话者语音和识别多个说话者语音之间的性能差距,该项目的总体目标是开发一个深度学习系统,该系统可以在对话或会议环境中连续分离单个发言者,并准确地识别每个发言者。识别这些说话者的话语。该项目基于最近同步分组的进展,以独立于说话者的方式分离和增强重叠的说话者,主要集中在说话者日记化上,其目的是将同一说话者的语音话语在不同时间分组。为了实现说话人日志化,将执行基于深度学习的顺序分组,并将整合空间和频谱说话人特征。通过顺序组织,同时流将与较早分离的说话者流分组以形成顺序流,每个顺序流对应于同一说话者直到当前时间的所有话语。扬声器定位和分类将进行调查,使顺序分组能够创建新的顺序流和处理会议场景中的任意数量的扬声器。随着空间维度的增加,所提出的日记化方法为谁在何时何地发言的问题提供了一个解决方案,大大扩展了谁在何时发言的传统范围。所提出的分离系统将使用多声道扬声器分离数据集进行评估,这些数据集包含记录的对话中的高度重叠的语音,以及存在于真实的环境中的房间混响和背景噪声。在自动语音识别中,主要的评价指标是词的错误率。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training
- DOI:10.1109/taslp.2022.3202129
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:H. Taherian;Ke Tan;Deliang Wang
- 通讯作者:H. Taherian;Ke Tan;Deliang Wang
Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation
用于多通道连续语音分离的多分辨率基于位置的训练
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Hassan Taherian;DeLiang Wang
- 通讯作者:DeLiang Wang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eric Fosler-Lussier其他文献
Eric Fosler-Lussier的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eric Fosler-Lussier', 18)}}的其他基金
RI: Small: Early Elementary Reading Verification in Challenging Acoustic Environments
RI:小:具有挑战性的声学环境中的早期小学阅读验证
- 批准号:
2008043 - 财政年份:2020
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
RI: Medium: Deep Neural Networks for Robust Speech Recognition through Integrated Acoustic Modeling and Separation
RI:中:通过集成声学建模和分离实现鲁棒语音识别的深度神经网络
- 批准号:
1409431 - 财政年份:2014
- 资助金额:
$ 39.06万 - 项目类别:
Continuing Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305319 - 财政年份:2013
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
CI-P:Collaborative Research:The Speech Recognition Virtual Kitchen
CI-P:协作研究:语音识别虚拟厨房
- 批准号:
1205424 - 财政年份:2012
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
RI: Medium: Collaborative Research: Explicit Articulatory Models of Spoken Language, with Application to Automatic Speech Recognition
RI:媒介:协作研究:口语显式发音模型及其在自动语音识别中的应用
- 批准号:
0905420 - 财政年份:2009
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
CAREER: Breaking the phonetic code: novel acoustic-lexical modeling techniques for robust automatic speech recognition
职业:打破语音密码:用于鲁棒自动语音识别的新颖声学词汇建模技术
- 批准号:
0643901 - 财政年份:2006
- 资助金额:
$ 39.06万 - 项目类别:
Continuing Grant
Workshop: Student Research in Computational Linguistics, at the HLT/NAACL 2004 Conference
研讨会:计算语言学学生研究,HLT/NAACL 2004 会议
- 批准号:
0422841 - 财政年份:2004
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
- 批准号:61573081
- 批准年份:2015
- 资助金额:64.0 万元
- 项目类别:面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
- 批准号:61572533
- 批准年份:2015
- 资助金额:66.0 万元
- 项目类别:面上项目
E-Learning中学习者情感补偿方法的研究
- 批准号:61402392
- 批准年份:2014
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
CRII: OAC: A Compressor-Assisted Collective Communication Framework for GPU-Based Large-Scale Deep Learning
CRII:OAC:基于 GPU 的大规模深度学习的压缩器辅助集体通信框架
- 批准号:
2348465 - 财政年份:2024
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
SHF: Small: Hardware-Software Co-design for Privacy Protection on Deep Learning-based Recommendation Systems
SHF:小型:基于深度学习的推荐系统的隐私保护软硬件协同设计
- 批准号:
2334628 - 财政年份:2024
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
DeepMARA - Deep Reinforcement Learning based Massive Random Access Toward Massive Machine-to-Machine Communications
DeepMARA - 基于深度强化学习的大规模随机访问实现大规模机器对机器通信
- 批准号:
EP/Y028252/1 - 财政年份:2024
- 资助金额:
$ 39.06万 - 项目类别:
Fellowship
Co-creation between content-generating AI and humans based on deep learning
基于深度学习的内容生成人工智能与人类的共同创造
- 批准号:
23K04201 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Security Evaluation Method Against Deep-Learning-Based Side-Channel Attacks Exploiting Physical Behavior of Cryptographic Hardware
针对利用密码硬件物理行为的基于深度学习的侧信道攻击的安全评估方法
- 批准号:
23K11102 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
TruDetect: Trustworthy Deep-Learning based Hardware Trojan Detection
TruDetect:值得信赖的基于深度学习的硬件木马检测
- 批准号:
EP/X036960/1 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Research Grant
Optimization-based Implicit Deep Learning, Theory and Applications
基于优化的隐式深度学习、理论与应用
- 批准号:
2309810 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Continuing Grant
Spatial Calibration of Head-Mounted Displays Based on Implicit Function Representation of Light Fields Using Deep Learning
基于深度学习光场隐式函数表示的头戴式显示器空间校准
- 批准号:
23K16920 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Deep learning-based prediction model for intraoperative neuromuscular blockade
基于深度学习的术中神经肌肉阻滞预测模型
- 批准号:
23K14406 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Collaborative Research: A Physics-Informed Flood Early Warning System for Agricultural Watersheds with Explainable Deep Learning and Process-Based Modeling
合作研究:基于物理的农业流域洪水预警系统,具有可解释的深度学习和基于过程的建模
- 批准号:
2243776 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant














{{item.name}}会员




