Audio and Video Based Speech Separation for Multiple Moving Sources Within a Room Environment
针对房间环境内多个移动源的基于音频和视频的语音分离
基本信息
- 批准号:EP/H049665/1
- 负责人:
- 金额:$ 38.32万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2010
- 资助国家:英国
- 起止时间:2010 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Human beings have developed a unique ability to communicate within a noisy environment, such as at a cocktail party. This skill is dependent upon the use of both the aural and visual senses together with sophisticated processing within the brain. To mimic this ability within a machine is very challenging, particularly if the humans are moving, such as in a teleconferencing context, when human speakers are walking around a room. In the field of signal processing researchers have developed techniques to separate one speech signal from a mixture of such signals, as would be measured by a number of microphones, on the basis of only audio information with the assumption that the humans are static and typically no more than two humans are within the room. Such approaches have generally been found to fail, however, when the human speakers are moving and when there are more than two in number. Fundamentally new approaches are therefore necessary to advance the state-of-the-art in the field. Professor Chambers and his team at Loughborough University were the first in the UK to propose a new approach on the basis of combined audio and video processing to solve the source separation problem, but their preliminary approach identified major challenges in audio-visual speaker localization, tracking and separation which must be solved to provide a practical solution for speech separation for multiple moving sources within a room environment. These findings motivate this new project in which world-leading teams at the University of Surrey, led by Professor Kittler, and at the GIPSA Lab, Grenoble, France, headed by Professor Jutten, are ready to work with Professor Chambers and his team at Loughborough University to advance the state-of-the-art in the field.In this new project, two postdoctoral researchers will be employed, one at Loughborough and another at Surrey. The first will focus on the development of fundamentally new speech source separation algorithms for moving speakers by using geometrical room acoustic (for example location and number of sources, descriptions of their movement) information provided by the second researcher. The research team at Grenoble will provide technical guidance on the basis of their considerable experience in source separation throughout the project and will work on providing an acoustic noise model for the room environment which will also aid the speech separation process. To achieve these tasks, frequency domain based beamforming algorithms will be developed which exploit microphone arrays having more microphones than speakers so that new data independent superdirective robust beamformer design methods can be exploited using mathematical convex optimization. Additionally, further geometic information will be exploited to introduce robustness to errors in the localization information describing the desired source and the interference. To improve the localization information an array of collaborative cameras will be used and both audio and visual information will be used. Advanced methods from particle filtering and probabilistic data association will be exploited for improving the tracking performance. Finally, visual voice activity detection will be used to determine the active sources within the beamforming operations. We emphasize that this work is not implementation-driven, so computational complexity for real-time realization will not be a focus; this would be the subject of a future project.All of the new algorithms will be evaluated both in terms of objective and subjective performance measures on labelled audio and visual datasets acquired at Loughbourgh and Surrey, and from the CHIL seminar room at the Karlsruhe University (UKA), Germany. To ensure this pioneering work has maximum impact on the UK and international academic and research communities all the algorithms and datasets will be made available through the project website.
人类已经发展出一种独特的能力,可以在嘈杂的环境中进行交流,比如在鸡尾酒会上。这种技能依赖于听觉和视觉的使用以及大脑内部复杂的处理。在机器中模仿这种能力是非常具有挑战性的,特别是如果人类正在移动,例如在电话会议的背景下,当人类说话者在房间里走动时。在信号处理领域中,研究人员已经开发了仅基于音频信息从这样的信号的混合中分离一个语音信号的技术,其中假设人是静态的并且通常不超过两个人在房间内,所述语音信号将由多个麦克风测量。然而,当人类说话者正在移动并且数量超过两个时,这种方法通常被发现失败。因此,必须从根本上采取新的方法,以推进该领域的最新技术。拉夫堡大学的Chambers教授和他的团队是英国第一个提出一种基于组合音频和视频处理的新方法来解决源分离问题的人,但他们的初步方法确定了视听扬声器定位,跟踪和分离方面的主要挑战,必须解决这些挑战,才能为室内环境中多个移动源的语音分离提供实用的解决方案。这些发现激发了这个新项目,由Kittler教授领导的萨里大学和由Jutten教授领导的法国格勒诺布尔GIPSA实验室的世界领先团队准备与拉夫堡大学的Chambers教授和他的团队合作,推进该领域的最先进技术。在这个新项目中,将雇用两名博士后研究人员,一名在拉夫堡,另一名在萨里。第一个将专注于开发从根本上新的语音源分离算法,移动扬声器通过使用几何室声学(例如位置和数量的源,他们的运动的描述)的第二个研究人员提供的信息。格勒诺布尔的研究团队将根据他们在整个项目中源分离方面的丰富经验提供技术指导,并将致力于为室内环境提供声学噪声模型,这也将有助于语音分离过程。为了实现这些任务,将开发基于频域的波束形成算法,其利用具有比扬声器更多的麦克风的麦克风阵列,使得可以使用数学凸优化来利用新的数据独立的超定向鲁棒波束形成器设计方法。另外,将利用进一步的几何信息来引入对描述期望源和干扰的定位信息中的误差的鲁棒性。为了改进定位信息,将使用一系列协作相机,并将使用音频和视频信息。粒子滤波和概率数据关联的先进方法将被用来提高跟踪性能。最后,视觉语音活动检测将用于确定波束成形操作内的活动源。我们强调,这项工作是不是实现驱动的,所以计算的复杂性,实时实现将不会是一个焦点,这将是未来的一个项目的主题。所有的新算法将进行评估,无论是在客观和主观的性能指标上标记的音频和视觉数据集在拉夫堡和萨里,并从CHIL研讨室在卡尔斯鲁厄大学(UKA),德国。为了确保这项开创性工作对英国和国际学术和研究界产生最大影响,所有算法和数据集都将通过项目网站提供。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fast pose invariant face recognition using super coupled multiresolution Markov Random Fields on a GPU
在 GPU 上使用超耦合多分辨率马尔可夫随机场进行快速姿势不变人脸识别
- DOI:10.1016/j.patrec.2014.05.017
- 发表时间:2014
- 期刊:
- 影响因子:5.1
- 作者:Rahimzadeh Arashloo S
- 通讯作者:Rahimzadeh Arashloo S
Multimodal blind source separation with a circular microphone array and robust beamforming
- DOI:
- 发表时间:2011-08
- 期刊:
- 影响因子:0
- 作者:S. M. Naqvi;Muhammad Salman Khan;Qingju Liu;Wenwu Wang;J. Chambers
- 通讯作者:S. M. Naqvi;Muhammad Salman Khan;Qingju Liu;Wenwu Wang;J. Chambers
Robust Feature Selection for Scaling Ambiguity Reduction in Audio-Visual Convolutive BSS
用于音视频卷积 BSS 中缩放模糊度减少的鲁棒特征选择
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:Syed Mohsen Naqvi (Author)
- 通讯作者:Syed Mohsen Naqvi (Author)
A Multimodal Approach to Blind Source Separation of Moving Sources
移动源盲源分离的多模态方法
- DOI:10.1109/jstsp.2010.2057198
- 发表时间:2010
- 期刊:
- 影响因子:7.5
- 作者:Naqvi S
- 通讯作者:Naqvi S
Audio video based fast fixed-point independent vector analysis for multisource separation in a room environment
- DOI:10.1186/1687-6180-2012-183
- 发表时间:2012-08
- 期刊:
- 影响因子:1.9
- 作者:Yanfeng Liang;S. M. Naqvi;J. Chambers
- 通讯作者:Yanfeng Liang;S. M. Naqvi;J. Chambers
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jonathon Chambers其他文献
Reconfigurable Intelligent Surface-Assisted B5G/6G Wireless Communications: Challenges, Solution and Future Opportunities
- DOI:
10.1109/mcom.002.2200047 - 发表时间:
2022 - 期刊:
- 影响因子:
- 作者:
Zhen Chen;Gaojie Chen;Jie Tang;Shun Zhang;Daniel K. C. So;Octavia A. Dobre;Kai-Kit Wong;Jonathon Chambers - 通讯作者:
Jonathon Chambers
A Sliding Window Variational Outlier-Robust Kalman Filter based on Student's t Noise Modelling
- DOI:
10.1109/taes.2022.3164012 - 发表时间:
2022 - 期刊:
- 影响因子:4.4
- 作者:
Fengchi Zhu;Yulong Huang;Chao Xue;Lyudmila Mihaylova;Jonathon Chambers - 通讯作者:
Jonathon Chambers
Omega‐3 fatty acids (fish oils) and prostate cancer: is there any evidence of a link and how should we advise our patients?
Omega-3 脂肪酸(鱼油)与前列腺癌:是否有任何证据表明存在关联?我们应如何为患者提供建议?
- DOI:
10.1111/ans.12792 - 发表时间:
2014 - 期刊:
- 影响因子:1.7
- 作者:
Jonathon Chambers;Jon‐Paul Meyer - 通讯作者:
Jon‐Paul Meyer
A novel adaptive Kalman filter with inaccurate process and measurement noise convariance matrices
一种新型自适应卡尔曼滤波器,具有不准确的处理和测量噪声协方差矩阵
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Yulong Huang;Yonggang Zhang;Zhemin Wu;Ning Li;Jonathon Chambers - 通讯作者:
Jonathon Chambers
Synchronization control of cyber–physical systems with time-varying dynamics under denial-of-service attacks
- DOI:
10.1016/j.jfranklin.2024.107243 - 发表时间:
2024-11-01 - 期刊:
- 影响因子:
- 作者:
Daotong Zhang;Peng Shi;Jonathon Chambers - 通讯作者:
Jonathon Chambers
Jonathon Chambers的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jonathon Chambers', 18)}}的其他基金
Communications Signal Processing Based Solutions for Massive Machine-to-Machine Networks (M3NETs)
基于通信信号处理的大规模机器对机器网络 (M3NET) 解决方案
- 批准号:
EP/R006377/1 - 财政年份:2018
- 资助金额:
$ 38.32万 - 项目类别:
Research Grant
Signal Processing Solutions for the Networked Battlespace
网络战场信号处理解决方案
- 批准号:
EP/K014307/2 - 财政年份:2015
- 资助金额:
$ 38.32万 - 项目类别:
Research Grant
Signal Processing Solutions for the Networked Battlespace
网络战场信号处理解决方案
- 批准号:
EP/K014307/1 - 财政年份:2013
- 资助金额:
$ 38.32万 - 项目类别:
Research Grant
Novel Communications Signal Processing Techs. for Transmission Over MIMO Frequency Selective Wireless Channels Using Polynomial Matrix Decompositions
新颖的通信信号处理技术。
- 批准号:
EP/F065477/1 - 财政年份:2008
- 资助金额:
$ 38.32万 - 项目类别:
Research Grant
Multi-Modal Blind Source Separation Algorithms
多模态盲源分离算法
- 批准号:
EP/C535308/2 - 财政年份:2008
- 资助金额:
$ 38.32万 - 项目类别:
Research Grant
相似海外基金
Audio-visual object-based dynamic scene representation from monocular video
单目视频中基于视听对象的动态场景表示
- 批准号:
2701695 - 财政年份:2022
- 资助金额:
$ 38.32万 - 项目类别:
Studentship
Multi-channel Audio Signal Processing Based on Sound-to-Light Conversion and Video Camera
基于声光转换和摄像机的多通道音频信号处理
- 批准号:
17F17049 - 财政年份:2017
- 资助金额:
$ 38.32万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Video and audio analytics algorithms for cloud-based remote monitoring systems
用于基于云的远程监控系统的视频和音频分析算法
- 批准号:
494881-2016 - 财政年份:2017
- 资助金额:
$ 38.32万 - 项目类别:
Collaborative Research and Development Grants
Video and audio analytics algorithms for cloud-based remote monitoring systems
用于基于云的远程监控系统的视频和音频分析算法
- 批准号:
494881-2016 - 财政年份:2016
- 资助金额:
$ 38.32万 - 项目类别:
Collaborative Research and Development Grants
RI: Small: Collaborative Research: Ontology based Perceptual Organization of Audio-Video Events using Pattern Theory
RI:小型:协作研究:使用模式理论对音频-视频事件进行基于本体的感知组织
- 批准号:
1217515 - 财政年份:2012
- 资助金额:
$ 38.32万 - 项目类别:
Standard Grant
RI: Small: Collaborative Research: Ontology based Perceptual Organization of Audio-Video Events using Pattern Theory
RI:小型:协作研究:使用模式理论对音频-视频事件进行基于本体的感知组织
- 批准号:
1217676 - 财政年份:2012
- 资助金额:
$ 38.32万 - 项目类别:
Standard Grant
A Study on QoE-based Quality Enhancement Techniques for Multi-View Video and Audio IP Transmission
基于QoE的多视点音视频IP传输质量增强技术研究
- 批准号:
23760332 - 财政年份:2011
- 资助金额:
$ 38.32万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Computer-on-Module & FPGA based Audio/Video Processing Board
计算机模块
- 批准号:
414295-2011 - 财政年份:2011
- 资助金额:
$ 38.32万 - 项目类别:
Experience Awards (previously Industrial Undergraduate Student Research Awards)
KANSEI-based Media Analysis/Personalization/Dissemination Systemfor Video and Audio Data on the Web
基于 KANSEI 的网络视频和音频数据媒体分析/个性化/传播系统
- 批准号:
23700128 - 财政年份:2011
- 资助金额:
$ 38.32万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Audio and Video Based Speech Separation for Multiple Moving Sources Within a Room Environment
针对房间环境内多个移动源的基于音频和视频的语音分离
- 批准号:
EP/H050000/1 - 财政年份:2010
- 资助金额:
$ 38.32万 - 项目类别:
Research Grant














{{item.name}}会员




