权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Multi-Modal Blind Source Separation Algorithms

多模态盲源分离算法

基本信息

批准号：
EP/C535308/2
负责人：
Jonathon Chambers
金额：
--
依托单位：
Loughborough University
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2008
资助国家：
英国
起止时间：
2008 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FC535308%2F2
关键词：
Multi Modal Blind Source Separation

项目摘要

This project concerns the emulation of the ability of a human to separate one speech source from a background of other speakers and possibly noise sources, such as an air conditioning unit, within an office environment. This is termed the cocktail party problem and it is a very challenging task to use a number of microphones together with a computer to process the recordings, and thereby extract the speaker of interest. As humans, we use much more than the sound that is perceived by our two ears to address this problem. Our eyes, for example, also provide visual cues which help in the process. It is therefore the focus of this work to integrate both audio and visual measurements, attained from microphones and cameras within the office, to aid in the separation process. The human is also likely to exploit knowledge of language in the separation process, we therefore plan to utilize mathematical models of the audio and speech recordings in the separation process, these are called coupled (or fused) Hidden Markov Models. When a word is uttered within a room, the sound wave propagates through many paths to the microphone, due to reflections on the walls, ceiling or floor, or other objects in the room, such as a table. This so-called multipath propagation, is modelled by what is called a convolutive mixture. A convolutive model is the relationship between the input and output of a linear, possibly multichannel, system which remembers (has memory) past inputs and possibly outputs (only inputs in this project). To perform the separation process it is therefore necessary to use a convolutive model. Such a model would need many calculations to perform separation but this becomes much easier in the frequency domain. Separation in the frequency domain is, we believe, the way forward to tackle this problem, but there are problems to be solved. In particular, how to reconstruct the extracted speech signal back in the time domain (the so-called permutation problem), how to deal with the case of more than two speakers in the room and when the speakers are moving. Our approach in this work is to use additional visual information to overcome these problems. We therefore wish to equip an intelligent office within the Centre of Digital Signal Processing at the Cardiff School of Engineering with microphones and cameras together with the necessary computing facilities to record examples of audio and visual signals for testing, initially with two well positioned speakers uttering distinct sounds, such as vowels and consenants, and then moving onto more speakers and movement; ultimately, recording natural continuous speech. The overall goal is to be able to demonstrate the ability to separate any one of the speakers utterances within the intelligent office which would then facilitate interaction, for example, with a voice recogniser or third party, at a remote location, as in teleconferencing.

该项目涉及模拟人类在办公环境中从其他说话者的背景和可能的噪声源（如空调设备）中分离一个语音源的能力。这被称为鸡尾酒会问题，它是一个非常具有挑战性的任务，使用一些麦克风和计算机一起处理录音，从而提取感兴趣的说话者。作为人类，我们使用的不仅仅是我们两耳感知到的声音来解决这个问题。例如，我们的眼睛也提供视觉线索，帮助这个过程。因此，这项工作的重点是结合从办公室内的麦克风和照相机获得的视听测量，以协助分离过程。人类也可能在分离过程中利用语言知识，因此我们计划在分离过程中利用音频和语音记录的数学模型，这些模型称为耦合（或融合）隐马尔可夫模型。在房间里说话时，由于墙壁、天花板、地板或房间里的其他物体（如桌子）的反射，声波通过许多路径传播到麦克风。这种所谓的多路径传播，是由所谓的卷积混合来建模的。卷积模型是线性（可能是多通道）系统的输入和输出之间的关系，该系统记住（有记忆）过去的输入和可能的输出（在这个项目中只有输入）。因此，为了执行分离过程，有必要使用卷积模型。这样的模型将需要许多计算来执行分离，但这在频域变得容易得多。频域分离，我们相信，是解决这个问题的方法，但是还有很多问题需要解决。特别是，如何将提取的语音信号重构回时域（所谓的置换问题），如何处理房间中有两个以上说话者以及说话者移动时的情况。我们在这项工作中的方法是使用额外的视觉信息来克服这些问题。因此，我们希望在卡迪夫工程学院数字信号处理中心内配备一个智能办公室，配备麦克风和摄像头以及必要的计算设备，以记录音频和视觉信号的示例以供测试，最初使用两个位置良好的扬声器发出不同的声音，例如元音和同意音，然后移动到更多的扬声器和运动；最终，记录自然连续的语音。总体目标是能够证明能够在智能办公室内分离任何一个说话者的话语，然后促进互动，例如，在远程位置与语音识别器或第三方进行互动，如在电话会议中。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Visual voice activity detection with optical flow

DOI：
10.1049/iet-ipr.2009.0042
发表时间：
2010-12-01
期刊：
IET IMAGE PROCESSING
影响因子：
2.3
作者：
Aubrey, A. J.;Hicks, Y. A.;Chambers, J. A.
通讯作者：
Chambers, J. A.

A multimodal approach for frequency domain independent component analysis with geometrically-based initialization

DOI：
发表时间：
2008-08
期刊：
2008 16th European Signal Processing Conference
影响因子：
0
作者：
S. M. Naqvi;Y. Zhang;T. Tsalaile;S. Sanei;J. Chambers
通讯作者：
S. M. Naqvi;Y. Zhang;T. Tsalaile;S. Sanei;J. Chambers

A Multimodal Approach to Blind Source Separation of Moving Sources

移动源盲源分离的多模态方法

DOI：
10.1109/jstsp.2010.2057198
发表时间：
2010
期刊：
IEEE Journal of Selected Topics in Signal Processing
影响因子：
7.5
作者：
Naqvi S
通讯作者：
Naqvi S

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jonathon Chambers其他文献

Reconfigurable Intelligent Surface-Assisted B5G/6G Wireless Communications: Challenges, Solution and Future Opportunities

DOI：
10.1109/mcom.002.2200047
发表时间：
2022
期刊：
IEEE Communications Magazine
影响因子：
作者：
Zhen Chen;Gaojie Chen;Jie Tang;Shun Zhang;Daniel K. C. So;Octavia A. Dobre;Kai-Kit Wong;Jonathon Chambers
通讯作者：
Jonathon Chambers