Multi-Modal Blind Source Separation Algorithms

多模态盲源分离算法

基本信息

  • 批准号:
    EP/C535308/2
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2008
  • 资助国家:
    英国
  • 起止时间:
    2008 至 无数据
  • 项目状态:
    已结题

项目摘要

This project concerns the emulation of the ability of a human to separate one speech source from a background of other speakers and possibly noise sources, such as an air conditioning unit, within an office environment. This is termed the cocktail party problem and it is a very challenging task to use a number of microphones together with a computer to process the recordings, and thereby extract the speaker of interest. As humans, we use much more than the sound that is perceived by our two ears to address this problem. Our eyes, for example, also provide visual cues which help in the process. It is therefore the focus of this work to integrate both audio and visual measurements, attained from microphones and cameras within the office, to aid in the separation process. The human is also likely to exploit knowledge of language in the separation process, we therefore plan to utilize mathematical models of the audio and speech recordings in the separation process, these are called coupled (or fused) Hidden Markov Models. When a word is uttered within a room, the sound wave propagates through many paths to the microphone, due to reflections on the walls, ceiling or floor, or other objects in the room, such as a table. This so-called multipath propagation, is modelled by what is called a convolutive mixture. A convolutive model is the relationship between the input and output of a linear, possibly multichannel, system which remembers (has memory) past inputs and possibly outputs (only inputs in this project). To perform the separation process it is therefore necessary to use a convolutive model. Such a model would need many calculations to perform separation but this becomes much easier in the frequency domain. Separation in the frequency domain is, we believe, the way forward to tackle this problem, but there are problems to be solved. In particular, how to reconstruct the extracted speech signal back in the time domain (the so-called permutation problem), how to deal with the case of more than two speakers in the room and when the speakers are moving. Our approach in this work is to use additional visual information to overcome these problems. We therefore wish to equip an intelligent office within the Centre of Digital Signal Processing at the Cardiff School of Engineering with microphones and cameras together with the necessary computing facilities to record examples of audio and visual signals for testing, initially with two well positioned speakers uttering distinct sounds, such as vowels and consenants, and then moving onto more speakers and movement; ultimately, recording natural continuous speech. The overall goal is to be able to demonstrate the ability to separate any one of the speakers utterances within the intelligent office which would then facilitate interaction, for example, with a voice recogniser or third party, at a remote location, as in teleconferencing.
该项目涉及模拟人类在办公环境中从其他说话者的背景和可能的噪声源(如空调设备)中分离一个语音源的能力。这被称为鸡尾酒会问题,它是一个非常具有挑战性的任务,使用一些麦克风和计算机一起处理录音,从而提取感兴趣的说话者。作为人类,我们使用的不仅仅是我们两耳感知到的声音来解决这个问题。例如,我们的眼睛也提供视觉线索,帮助这个过程。因此,这项工作的重点是结合从办公室内的麦克风和照相机获得的视听测量,以协助分离过程。人类也可能在分离过程中利用语言知识,因此我们计划在分离过程中利用音频和语音记录的数学模型,这些模型称为耦合(或融合)隐马尔可夫模型。在房间里说话时,由于墙壁、天花板、地板或房间里的其他物体(如桌子)的反射,声波通过许多路径传播到麦克风。这种所谓的多路径传播,是由所谓的卷积混合来建模的。卷积模型是线性(可能是多通道)系统的输入和输出之间的关系,该系统记住(有记忆)过去的输入和可能的输出(在这个项目中只有输入)。因此,为了执行分离过程,有必要使用卷积模型。这样的模型将需要许多计算来执行分离,但这在频域变得容易得多。频域分离,我们相信,是解决这个问题的方法,但是还有很多问题需要解决。特别是,如何将提取的语音信号重构回时域(所谓的置换问题),如何处理房间中有两个以上说话者以及说话者移动时的情况。我们在这项工作中的方法是使用额外的视觉信息来克服这些问题。因此,我们希望在卡迪夫工程学院数字信号处理中心内配备一个智能办公室,配备麦克风和摄像头以及必要的计算设备,以记录音频和视觉信号的示例以供测试,最初使用两个位置良好的扬声器发出不同的声音,例如元音和同意音,然后移动到更多的扬声器和运动;最终,记录自然连续的语音。总体目标是能够证明能够在智能办公室内分离任何一个说话者的话语,然后促进互动,例如,在远程位置与语音识别器或第三方进行互动,如在电话会议中。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Visual voice activity detection with optical flow
  • DOI:
    10.1049/iet-ipr.2009.0042
  • 发表时间:
    2010-12-01
  • 期刊:
  • 影响因子:
    2.3
  • 作者:
    Aubrey, A. J.;Hicks, Y. A.;Chambers, J. A.
  • 通讯作者:
    Chambers, J. A.
A multimodal approach for frequency domain independent component analysis with geometrically-based initialization
A Multimodal Approach to Blind Source Separation of Moving Sources
移动源盲源分离的多模态方法
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jonathon Chambers其他文献

Reconfigurable Intelligent Surface-Assisted B5G/6G Wireless Communications: Challenges, Solution and Future Opportunities
  • DOI:
    10.1109/mcom.002.2200047
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
  • 作者:
    Zhen Chen;Gaojie Chen;Jie Tang;Shun Zhang;Daniel K. C. So;Octavia A. Dobre;Kai-Kit Wong;Jonathon Chambers
  • 通讯作者:
    Jonathon Chambers
A Sliding Window Variational Outlier-Robust Kalman Filter based on Student's t Noise Modelling
Omega‐3 fatty acids (fish oils) and prostate cancer: is there any evidence of a link and how should we advise our patients?
Omega-3 脂肪酸(鱼油)与前列腺癌:是否有任何证据表明存在关联?我们应如何为患者提供建议?
  • DOI:
    10.1111/ans.12792
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    1.7
  • 作者:
    Jonathon Chambers;Jon‐Paul Meyer
  • 通讯作者:
    Jon‐Paul Meyer
A novel adaptive Kalman filter with inaccurate process and measurement noise convariance matrices
一种新型自适应卡尔曼滤波器,具有不准确的处理和测量噪声协方差矩阵
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yulong Huang;Yonggang Zhang;Zhemin Wu;Ning Li;Jonathon Chambers
  • 通讯作者:
    Jonathon Chambers
Synchronization control of cyber–physical systems with time-varying dynamics under denial-of-service attacks
  • DOI:
    10.1016/j.jfranklin.2024.107243
  • 发表时间:
    2024-11-01
  • 期刊:
  • 影响因子:
  • 作者:
    Daotong Zhang;Peng Shi;Jonathon Chambers
  • 通讯作者:
    Jonathon Chambers

Jonathon Chambers的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jonathon Chambers', 18)}}的其他基金

Communications Signal Processing Based Solutions for Massive Machine-to-Machine Networks (M3NETs)
基于通信信号处理的大规模机器对机器网络 (M3NET) 解决方案
  • 批准号:
    EP/R006377/1
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Signal Processing Solutions for the Networked Battlespace
网络战场信号处理解决方案
  • 批准号:
    EP/K014307/2
  • 财政年份:
    2015
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Signal Processing Solutions for the Networked Battlespace
网络战场信号处理解决方案
  • 批准号:
    EP/K014307/1
  • 财政年份:
    2013
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Audio and Video Based Speech Separation for Multiple Moving Sources Within a Room Environment
针对房间环境内多个移动源的基于音频和视频的语音分离
  • 批准号:
    EP/H049665/1
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Novel Communications Signal Processing Techs. for Transmission Over MIMO Frequency Selective Wireless Channels Using Polynomial Matrix Decompositions
新颖的通信信号处理技术。
  • 批准号:
    EP/F065477/1
  • 财政年份:
    2008
  • 资助金额:
    --
  • 项目类别:
    Research Grant

相似海外基金

Cross-modal motion responses in blind and deaf humans
盲人和聋人的跨模式运动反应
  • 批准号:
    9264530
  • 财政年份:
    2015
  • 资助金额:
    --
  • 项目类别:
Cross-modal motion responses in blind and deaf humans
盲人和聋人的跨模式运动反应
  • 批准号:
    8486050
  • 财政年份:
    2013
  • 资助金额:
    --
  • 项目类别:
Cross-modal motion responses in blind and deaf humans
盲人和聋人的跨模式运动反应
  • 批准号:
    8704941
  • 财政年份:
    2013
  • 资助金额:
    --
  • 项目类别:
Dancing Dots Music Touch TTT: Multi-modal Teaching System for Blind musicians. T
Dancing Dots Music Touch TTT:盲人音乐家的多模式教学系统。
  • 批准号:
    7928397
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
Sensory Cortical Organization and Cross-Modal Plasticity in Blind Humans
盲人的感觉皮层组织和跨模式可塑性
  • 批准号:
    9113167
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
Sensory cortical organization and cross-modal plasticity in blind subjects
盲人受试者的感觉皮层组织和跨模式可塑性
  • 批准号:
    7895576
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
Multi-Modal Blind Source Separation for Robot Audition
机器人试镜的多模态盲源分离
  • 批准号:
    EP/H012842/1
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Sensory Cortical Organization and Cross-Modal Plasticity in Blind Humans
盲人的感觉皮层组织和跨模式可塑性
  • 批准号:
    8514241
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
Sensory Cortical Organization and Cross-Modal Plasticity in Blind Humans
盲人的感觉皮层组织和跨模式可塑性
  • 批准号:
    8691821
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
Sensory cortical organization and cross-modal plasticity in blind subjects
盲人受试者的感觉皮层组织和跨模式可塑性
  • 批准号:
    7450018
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了