Audio-Visual Speech Enhancement and Speaker Separation
视听语音增强和扬声器分离
基本信息
- 批准号:2243852
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2019
- 资助国家:英国
- 起止时间:2019 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The problem with audio perception is that individual sounds are mixed together with unknown acoustic reverberations, and this makes it impossible to extract them without prior knowledge of the source characteristics. The problem of audio-source separation is a fundamental problem in audio perception. Humans have the ability to understanding speech when it is mixed with other types of sound and noise; by isolating and focusing attention to one voice from a multitude. This research aims to reproduce or model this accomplishment of the brain with computational and algorithmic means. Speech enhancement is a method of increasing speech intelligibility by using algorithms to separate and enhance the original source of the speech from others. Automating the process of speech enhancement has many real-world applications such as increasing the effectiveness of assistive technology for the hearing impaired, creating virtual reality with high clarity and better transcription of speech in noisy audio tracks. Additionally, with ever-increasing use of audio-visual and voice-controlled technologies, the ability to capture and enhance a speaker's voice is becoming imperative in the robustness of automatic speech recognition (ASR) systems. These systems tend to infer speech well in quiet environments, but they struggle when background noise is present.Although recently there has been significant advancement in speech separation using deep learning methods, it is still considered a difficult problem due to time-variant input signals and high variability of reverberant sound fields. Traditionally the task of speech enhancement is either performed on audio-only tracks or the combination of audio and video inputs. Deep learning techniques have been applied to challenging tasks such as removing background noise from speech, separating a speaker from multiple speech signals, or more generally separating arbitrary classes of sound from each other. This work will address the shortcomings of the current methods and will explore conditioning speech separation tasks by conditioning on complementary information, such as visual cues from the speaker's lip motions.
音频感知的问题是,单个声音与未知的声学混响混合在一起,这使得在没有源特征的先验知识的情况下无法提取它们。音频源分离问题是音频感知中的一个基本问题。人类有能力理解与其他类型的声音和噪音混合的语音;通过将注意力从众多声音中分离出来并集中到一个声音上。这项研究旨在通过计算和算法手段再现或模拟大脑的这一成就。语音增强是一种通过使用算法将原始语音源与其他语音源分离并增强来提高语音可懂度的方法。自动化语音增强过程具有许多现实应用,例如提高听力受损者辅助技术的有效性,创建具有高清晰度的虚拟现实以及在嘈杂音轨中更好地转录语音。此外,随着视听和语音控制技术的不断增加的使用,捕获和增强说话者的语音的能力在自动语音识别(ASR)系统的鲁棒性中变得至关重要。这些系统在安静的环境中能够很好地推断语音,但在存在背景噪声的情况下就很难了。虽然最近使用深度学习方法进行语音分离取得了重大进展,但由于输入信号时变和混响声场的高度可变性,语音分离仍然被认为是一个难题。传统上,语音增强的任务要么在仅音频的轨道上执行,要么在音频和视频输入的组合上执行。深度学习技术已被应用于具有挑战性的任务,例如从语音中去除背景噪声,从多个语音信号中分离说话者,或者更一般地将任意类别的声音彼此分离。这项工作将解决目前的方法的缺点,并将探索条件的语音分离任务,条件的补充信息,如从扬声器的嘴唇运动的视觉线索。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
- DOI:10.1109/cvpr52688.2022.01024
- 发表时间:2022-06
- 期刊:
- 影响因子:0
- 作者:Akam Rahimi;Triantafyllos Afouras;Andrew Zisserman
- 通讯作者:Akam Rahimi;Triantafyllos Afouras;Andrew Zisserman
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似国自然基金
基于多幅图象的Visual Hull重构及表面属性建模算法研究
- 批准号:60373031
- 批准年份:2003
- 资助金额:23.0 万元
- 项目类别:面上项目
相似海外基金
The role of audio-visual and auditory-motor integration in speech perception: Is what we hear dominated by what we see or how we move?
视听和听觉运动整合在言语感知中的作用:我们听到的内容是否受我们看到的或我们移动的方式支配?
- 批准号:
2386111 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Studentship
Development of visual feedback speech training method based on real-time audio visualization system in cleft palate
基于实时音频可视化系统的腭裂视觉反馈言语训练方法的开发
- 批准号:
20H03891 - 财政年份:2020
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
Audio-visual prosody of whispered and semi-whispered speech
耳语和半耳语语音的视听韵律
- 批准号:
426673330 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Research Grants
Improving audio-visual speech recognition with augmented facial-mapping.
通过增强面部映射改进视听语音识别。
- 批准号:
1964209 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Studentship
Audio-visual influences on infant speech perception
视听对婴儿言语感知的影响
- 批准号:
482168-2015 - 财政年份:2015
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Innvestigate on audio-visual integration of speech sound
语音视听一体化研究
- 批准号:
26750215 - 财政年份:2014
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Young Scientists (B)
Audio-visual speech processing in bilinguals across the lifespan
双语者一生中的视听语音处理
- 批准号:
449943-2013 - 财政年份:2013
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Analysis and synthesis method of phonetic/emotional information in audio-visual speech information
视听语音信息中语音/情感信息的分析与合成方法
- 批准号:
24650100 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
A multi-modal sensor fusion architecture for audio-visual speech understanding
用于视听语音理解的多模态传感器融合架构
- 批准号:
184129-2007 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Development of high-definition audio-visual speech communication systems based on the knowledge of /kansei/ information processing
基于感性/信息处理知识的高清视听语音通信系统的开发
- 批准号:
23500252 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)