Deep neural networks for multi-channel speaker localization and speech separation

用于多通道说话者定位和语音分离的深度神经网络

基本信息

  • 批准号:
    1808932
  • 负责人:
  • 金额:
    $ 30万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-12-01 至 2022-11-30
  • 项目状态:
    已结题

项目摘要

In recent years, there is a dramatic increase in the deployment of the voice-based interface for human-machine communication. Such devices typically have multiple microphones (or channels), and as they are used in homes, cars, and so on, a major technical challenge is how to reliably localize a target speaker and recognize his/her speech in everyday environments with multiple sound sources and room reverberation. The performance of traditional approaches to localization and separation degrades significantly in the presence of interfering sounds and room reverberation. This project investigates multi-channel speaker localization and speech separation from a deep learning perspective. The innovative approach in this project is to train deep neural networks to perform single-channel speech separation in order to identify the time-frequency regions dominated by the target speaker. Such regions across microphone pairs provide the basis for robust speaker localization and separation. Building on this novel perspective, the proposed research seeks to achieve robust speaker localization and speech separation. For robust speaker localization, time-frequency (T-F) masks will be generated by deep neural networks (DNN) from single-channel noisy speech signals. Across each pair of microphones, an integrated mask will be calculated from the two corresponding single-channel masks and then used to weight a generalized cross-correlation function, from which the direction of the target speaker will be estimated. An alternative method for localization will be based on mask-weighted steered responses. For robust speech separation, masking-based beamforming will be initially performed, where T-F masking and accurate speaker localization are expected to enhance beamforming results substantially. To overcome the limitation of spatial filtering in multi-source reverberant conditions, spectral (monaural) and spatial information will be integrated as DNN input features in order to separate only the target signal with speech characteristics and originating from a specific direction. The proposed approach will be evaluated using automatic speech recognition rate, as well as localization and separation accuracy, on multi-channel noisy and reverberant datasets recorded in real-world environments. This will ensure a broader impact not only in advancing speech processing technology but also in facilitating the design of next-generation hearing aids in the long run.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近年来,用于人机通信的基于语音的界面的部署急剧增加。这样的设备通常具有多个麦克风(或通道),并且当它们在家庭、汽车等中使用时,主要的技术挑战是如何在具有多个声源和房间混响的日常环境中可靠地定位目标说话者并识别他/她的语音。传统的定位和分离方法的性能显着降低干扰声和房间混响的存在。该项目从深度学习的角度研究多通道说话人定位和语音分离。该项目的创新方法是训练深度神经网络来执行单通道语音分离,以识别目标说话人主导的时频区域。跨麦克风对的这样的区域为鲁棒的说话者定位和分离提供了基础。基于这一新颖的视角,本文的研究旨在实现鲁棒的说话人定位和语音分离。为了鲁棒的说话人定位,深度神经网络(DNN)将从单通道噪声语音信号中生成时频(T-F)掩码。在每对麦克风中,将从两个对应的单通道掩模计算集成掩模,然后用于加权广义互相关函数,从广义互相关函数将估计目标说话者的方向。另一种定位方法将基于面罩加权转向响应。对于鲁棒的语音分离,最初将执行基于掩蔽的波束形成,其中T-F掩蔽和准确的说话人定位有望大大增强波束形成结果。为了克服空间滤波在多源混响条件下的限制,频谱(单声道)和空间信息将被集成为DNN输入特征,以便仅分离具有语音特征并源自特定方向的目标信号。所提出的方法将使用自动语音识别率,以及定位和分离精度,在多通道的嘈杂和混响的数据集记录在现实世界的环境中进行评估。这将确保更广泛的影响,不仅在推进语音处理技术,而且在促进下一代助听器的设计,从长远来看。这个奖项反映了NSF的法定使命,并已被认为是值得的支持,通过评估使用基金会的知识价值和更广泛的影响审查标准。

项目成果

期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
Location-based training for multi-channel talker-independent speaker separation
基于位置的多通道独立于说话者分离的训练
Localization based Sequential Grouping for Continuous Speech Separation
Neural Cascade Architecture for Multi-Channel Acoustic Echo Suppression
Count and separate: incorporating speaker counting for continuous speaker separation
计数和分离:结合扬声器计数以实现连续的扬声器分离
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

DeLiang Wang其他文献

Multi-Channel Conversational Speaker Separation via Neural Diarization
通过神经二值化进行多通道会话说话人分离
Leveraging Laryngograph Data for Robust Voicing Detection in Speech
利用喉头图数据进行稳健的语音发声检测
  • DOI:
    10.48550/arxiv.2312.03129
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yixuan Zhang;Heming Wang;DeLiang Wang
  • 通讯作者:
    DeLiang Wang
Time-frequency masking for speech separation and its potential for hearing aid design.
  • DOI:
    10.1177/1084713808326455
  • 发表时间:
    2008-12-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    DeLiang Wang
  • 通讯作者:
    DeLiang Wang
A Neural Model of Synaptic Plasticity Underlying Short-term and Long-term Habituation
  • DOI:
    10.1177/105971239300200201
  • 发表时间:
    1993-09
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    DeLiang Wang
  • 通讯作者:
    DeLiang Wang
Leveraging Sound Localization to Improve Continuous Speaker Separation
利用声音定位来改善连续扬声器分离

DeLiang Wang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('DeLiang Wang', 18)}}的其他基金

Collaborative Research: Separating Speech from Speech Noise to Improve Speech Intelligibility
合作研究:将语音与语音噪声分离以提高语音清晰度
  • 批准号:
    0534707
  • 财政年份:
    2006
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
ITR: Dynamics-based Speech Segregation
ITR:基于动力学的语音分离
  • 批准号:
    0081058
  • 财政年份:
    2000
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
Automated Auditory Scene Analysis Based on Oscillatory Correlation
基于振荡相关性的自动听觉场景分析
  • 批准号:
    9423312
  • 财政年份:
    1995
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
Segmentation and Recognition of Complex Temporal Patterns
复杂时间模式的分割和识别
  • 批准号:
    9211419
  • 财政年份:
    1992
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant

相似国自然基金

脐带间充质干细胞微囊联合低能量冲击波治疗神经损伤性ED的机制研究
  • 批准号:
    82371631
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
亚低温调控颅脑创伤急性期神经干细胞Mpc2/Lactate/H3K9lac通路促进神经修复的研究
  • 批准号:
    82371379
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
基于再生运动神经路径优化Agrin作用促进损伤神经靶向投射的功能研究
  • 批准号:
    82371373
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
Neural Process模型的多样化高保真技术研究
  • 批准号:
    62306326
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
声致离子电流促进小胶质细胞M2极化阻断再生神经瘢痕退变免疫机制
  • 批准号:
    82371973
  • 批准年份:
    2023
  • 资助金额:
    48.00 万元
  • 项目类别:
    面上项目
LIPUS响应的弹性石墨烯多孔导管促进神经再生及其机制研究
  • 批准号:
    82370933
  • 批准年份:
    2023
  • 资助金额:
    48.00 万元
  • 项目类别:
    面上项目
生理/病理应激差异化调控肝再生的“蓝斑—中缝”神经环路机制
  • 批准号:
    82371517
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
弓状核介导慢性疼痛引起动机下降的神经环路机制及rTMS干预研究
  • 批准号:
    82371536
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
听觉刺激特异性调控情绪的神经环路机制研究
  • 批准号:
    82371516
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
TAG1/APP信号通路调控的miRNA及其在神经前体细胞增殖和分化中的作用机制
  • 批准号:
    31171313
  • 批准年份:
    2011
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目

相似海外基金

DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 30万
  • 项目类别:
    Research Grant
Deep neural networks
深度神经网络
  • 批准号:
    2902331
  • 财政年份:
    2024
  • 资助金额:
    $ 30万
  • 项目类别:
    Studentship
CAREER: Reliable and Accelerated Deep Neural Networks via Co-Design of Hardware and Algorithms
职业:通过硬件和算法的协同设计实现可靠且加速的深度神经网络
  • 批准号:
    2340516
  • 财政年份:
    2024
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
Collaborative Research: SHF: Medium: Verifying Deep Neural Networks with Spintronic Probabilistic Computers
合作研究:SHF:中:使用自旋电子概率计算机验证深度神经网络
  • 批准号:
    2311295
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
CAREER: Deep Neural Networks That Can See Shape From Images: Models, Algorithms, and Applications
职业:可以从图像中看到形状的深度神经网络:模型、算法和应用
  • 批准号:
    2239977
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Continuing Grant
Property-Driven Quality Assurance of Adversarial Robustness of Deep Neural Networks
深度神经网络对抗鲁棒性的属性驱动质量保证
  • 批准号:
    23K11049
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Imaging Epilepsy Sources with Biophysically Constrained Deep Neural Networks
使用生物物理约束的深度神经网络对癫痫源进行成像
  • 批准号:
    10655833
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
High-performance deep neural networks for medical image analysis
用于医学图像分析的高性能深度神经网络
  • 批准号:
    10723553
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
Crop design simulator utilizing deep neural networks with crop growth model as knowledge layer
利用深度神经网络以作物生长模型作为知识层的作物设计模拟器
  • 批准号:
    23H02200
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    2311500
  • 财政年份:
    2023
  • 资助金额:
    $ 30万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了