Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)

面向认知启发的多模式助听设备的视觉驱动语音增强 (AV-COGHEAR)

基本信息

  • 批准号:
    EP/M026981/1
  • 负责人:
  • 金额:
    $ 53.29万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2015
  • 资助国家:
    英国
  • 起止时间:
    2015 至 无数据
  • 项目状态:
    已结题

项目摘要

Current commercial hearing aids use a number of sophisticated enhancement techniques to try and improve the quality of speech signals. However, today's best aids fail to work well in many everyday situations. In particular, they fail in busy social situations where there are many competing speech sources; they fail if the speaker is too far from the listener and swamped by noise. We have identified an opportunity to solve this problem by building hearing aids that can 'see'. This ambitious project aims to develop a new generation of hearing aid technology that extracts speech from noise by using a camera to see what the talker is saying. The wearer of the device will be able to focus their hearing on a target talker and the device will filter out competing sound. This ability, which is beyond that of current technology, has the potential to improve the quality of life of the millions suffering from hearing loss (over 10m in the UK alone).Our approach is consistent with normal hearing. Listeners naturally combine information from both their ears and eyes: we use our eyes to help us hear. When listening to speech, eyes follow the movements of the face and mouth and a sophisticated, multi-stage process uses this information to separate speech from the noise and fill in any gaps. Our hearing aid will act in much the same way. It will exploit visual information from a camera (e.g.using a Google Glass like system), and novel algorithms for intelligently combining audio and visual information, in order to improve speech quality and intelligibility in real-world noisy environments. The project is bringing together a critical mass of researchers with the complementary expertise necessary to make the audio-visual hearing-aid possible. The project will combine new contrasting approaches to audio-visual speech enhancement that have been developed by the Cognitive Computing group at Stirling and the Speech and Hearing Group at Sheffield. The Stirling approach uses the visual signal to filter out noise; whereas the Sheffield approach uses the visual signal to fill in 'gaps' in the speech. The vision processing needed to track a speaker's lip and face movement will use a revolutionary 'bar code' representation developed by the Psychology Division at Stirling. The MRC Institute of Hearing Research (IHR) will provide the expertise needed to evaluate the approach on real hearing loss sufferers. Phonak AG, a leading international hearing aid manufacturer, will provide the advice and guidance necessary to maximise potential for industrial impact.The project has been designed as a series of four workpackages that consider the key research challenges related to each component of the device's design. These questions have been identified by preliminary work at Sheffield and Stirling. Among the challenges are developing improved techniques for visually-driven audio-analysis; designing better metrics for weighting audio and visual evidence; developing techniques for optimally combining the noise-filtering and gap-filling approaches. A further key challenge is that, for a hearing aid to be effective, the processing cannot delay the signal by more than 10ms. In the final year of the project a full integrated, software prototype will be clinically evaluated using listening tests with hearing-impaired volunteers in a range of modern noisy reverberant environments. Evaluation will use a new purpose-built speech corpus that will be designed specifically for testing this new class of multimodal device. The project's clinical research partner, the Scottish Section of MRC IHR, will provide advice on the experimental design and analysis aspects throughout the trials. Industry leader Phonak AG will provide advice and technical support for benchmarking real-time hearing devices. The final clinically-tested prototype will be made available to the whole hearing community as a testbed for further research, development, evaluation and benchmarking.
目前的商用助听器使用了许多复杂的增强技术来尝试提高语音信号的质量。然而,当今最好的辅助工具在许多日常情况下都不能很好地发挥作用。特别是在有许多相互竞争的言语来源的繁忙的社交场合中,它们会失败;如果说话者离听者太远,被噪音淹没,他们就会失败。我们已经发现了一个解决这个问题的机会,那就是制造能“看见”的助听器。这个雄心勃勃的项目旨在开发新一代助听器技术,该技术可以通过摄像头看到说话者在说什么,从噪音中提取语音。该设备的佩戴者将能够将他们的听力集中在目标说话者身上,该设备将过滤掉竞争声音。这种能力超越了目前的技术,有可能改善数百万听力损失患者的生活质量(仅在英国就有超过1000万)。我们的方法符合正常听力。听者会很自然地把耳朵和眼睛的信息结合起来:我们用眼睛来帮助我们听。在听别人讲话时,眼睛会跟随脸部和嘴巴的动作,然后一个复杂的、多阶段的过程利用这些信息将语音从噪音中分离出来,并填补任何空白。我们的助听器也将以同样的方式发挥作用。它将利用来自相机的视觉信息(例如使用谷歌Glass之类的系统),以及智能结合音频和视觉信息的新算法,以提高现实世界嘈杂环境中的语音质量和可理解性。该项目汇集了大量具有互补性专业知识的研究人员,使视听助听器成为可能。该项目将结合由斯特林大学的认知计算小组和谢菲尔德大学的语言和听力小组开发的新的视听语音增强方法。斯特林方法利用视觉信号滤除噪声;而谢菲尔德方法则使用视觉信号来填补语音中的“空白”。追踪说话人嘴唇和面部运动所需的视觉处理将使用一种革命性的“条形码”表示方式,这种方式是由斯特林大学心理学部门开发的。MRC听力研究所(IHR)将提供所需的专业知识,以评估对真正听力损失患者的方法。峰力集团,国际领先的助听器制造商,将提供必要的建议和指导,以最大限度地发挥工业影响的潜力。该项目被设计为一系列四个工作包,考虑到与设备设计的每个组件相关的关键研究挑战。谢菲尔德和斯特林的初步工作已经确定了这些问题。挑战之一是开发改进的视觉驱动音频分析技术;设计更好的衡量视听证据的指标;开发了将噪声滤波和间隙填充方法最佳结合的技术。进一步的关键挑战是,为了使助听器有效,处理不能将信号延迟超过10ms。在项目的最后一年,一个完全集成的软件原型将在一系列现代嘈杂的混响环境中通过听力测试对听力受损的志愿者进行临床评估。评估将使用专门为测试这种新型多模态设备而设计的新的专用语音语料库。该项目的临床研究合作伙伴MRC IHR苏格兰分部将在整个试验过程中就实验设计和分析方面提供建议。行业领导者峰力集团将为实时听力设备标杆测试提供咨询和技术支持。最终的临床测试原型将提供给整个听力界,作为进一步研究、开发、评估和基准测试的测试平台。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Novel Spatiotemporal Longitudinal Methodology for Predicting Obesity Using Near Infrared Spectroscopy (NIRS) Cerebral Functional Activity Data
  • DOI:
    10.1007/s12559-017-9541-x
  • 发表时间:
    2018-01
  • 期刊:
  • 影响因子:
    5.4
  • 作者:
    A. Abdullah;A. Hussain;Imtiaz Hussain Khan
  • 通讯作者:
    A. Abdullah;A. Hussain;Imtiaz Hussain Khan
Lip-Reading Driven Deep Learning Approach for Speech Enhancement
Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System
认知启发的视听语音过滤:走向智能、模糊、多模态、两阶段语音增强系统
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Abel Andrew
  • 通讯作者:
    Abel Andrew
An Enhanced Binary Particle Swarm Optimization (E-BPSO) algorithm for service placement in hybrid cloud platforms
  • DOI:
    10.1007/s00521-022-07839-5
  • 发表时间:
    2018-06
  • 期刊:
  • 影响因子:
    6
  • 作者:
    Wissem Abbes;Zied Kechaou;Amir Hussain;A. Qahtani;Omar Almutiry;Habib Dhahri;A. Alimi
  • 通讯作者:
    Wissem Abbes;Zied Kechaou;Amir Hussain;A. Qahtani;Omar Almutiry;Habib Dhahri;A. Alimi
Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing
上下文敏感的新皮质神经元改变神经信息处理的有效性和效率
  • DOI:
    10.48550/arxiv.2207.07338
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adeel A
  • 通讯作者:
    Adeel A
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Amir Hussain其他文献

Novel deep neural network based pattern field classification architectures
基于新型深度神经网络的模式场分类架构
  • DOI:
    10.1016/j.neunet.2020.03.011
  • 发表时间:
    2020-03
  • 期刊:
  • 影响因子:
    7.8
  • 作者:
    Kaizhu Huang;Shufei Zhang;Rui Zhang;Amir Hussain
  • 通讯作者:
    Amir Hussain
Automatic object-oriented coding facility for product life cycle management of discrete products
用于离散产品的产品生命周期管理的自动面向对象编码工具
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
具有 Conformer 的深度复杂 U-Net,用于增强视听语音
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shafique Ahmed;Chia;Wenze Ren;Chin;Ernie Chu;Jun;Amir Hussain;H. Wang;Yu Tsao;Jen
  • 通讯作者:
    Jen
AVSE Challenge: Audio-Visual Speech Enhancement Challenge
AVSE 挑战赛:视听语音增强挑战赛
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Andrea Lorena Aldana Blanco;Cassia Valentini;Ondrej Klejch;M. Gogate;K. Dashtipour;Amir Hussain;P. Bell
  • 通讯作者:
    P. Bell
Artificial intelligence-enabled analysis of UK and US public attitudes on Facebook and Twitter towards COVID-19 vaccinations
利用人工智能分析 Facebook 和 Twitter 上英国和美国公众对 COVID-19 疫苗接种的态度
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amir Hussain;Ahsen Tahir;Zain U. Hussain;Zakariya Sheikh;M. Gogate;K. Dashtipour;Azhar Ali;Aziz Sheikh
  • 通讯作者:
    Aziz Sheikh

Amir Hussain的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Amir Hussain', 18)}}的其他基金

COG-MHEAR: Towards cognitively-inspired 5G-IoT enabled, multi-modal Hearing Aids
COG-MHEAR:迈向受认知启发的 5G-IoT 支持的多模式助听器
  • 批准号:
    EP/T021063/1
  • 财政年份:
    2021
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Research Grant
Dual Process Control Models in the Brain and Machines with Application to Autonomous Vehicle Control
大脑和机器中的双过程控制模型在自主车辆控制中的应用
  • 批准号:
    EP/I009310/1
  • 财政年份:
    2011
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Research Grant
Industrial CASE Account - Stirling 2009
工业案例账户 - 斯特灵 2009
  • 批准号:
    EP/H501584/1
  • 财政年份:
    2009
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Training Grant
Industrial CASE Account - Stirling 2008
工业案例账户 - 斯特灵 2008
  • 批准号:
    EP/G501750/1
  • 财政年份:
    2009
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Training Grant

相似海外基金

Implementing and Iterating WeWALK’s Agent-Based Guidance System (WeASSIST) in Rail Transport to Improve Visually Impaired Customer Experience
在铁路运输中实施和迭代 WeWALK 基于代理的引导系统 (WeASSIST),以改善视障客户体验
  • 批准号:
    10098144
  • 财政年份:
    2024
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Collaborative R&D
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
  • 批准号:
    2418188
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Standard Grant
Cross-modal plasticity after the loss of vision at two early developmental ages in the posterior parietal cortex: Adult connections, cortical function and behavior.
后顶叶皮质两个早期发育年龄视力丧失后的跨模式可塑性:成人连接、皮质功能和行为。
  • 批准号:
    10751658
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
Multisensory Augmented Reality as a bridge to audio-only accommodations for inclusive STEM interactive digital media
多感官增强现实作为包容性 STEM 交互式数字媒体的纯音频住宿的桥梁
  • 批准号:
    10693600
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
Variability of Brain Reorganization in Blindness
失明时大脑重组的变异性
  • 批准号:
    10562129
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
VIS4ION-Thailand (Visually Impaired Smart Service System for Spatial Intelligence and Onboard Navigation) - Resub - 1
VIS4ION-泰国(视障空间智能和车载导航智能服务系统)- Resub - 1
  • 批准号:
    10903051
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
Development of a stiffness identification learning kit for visually impaired person seeking to become acupressure therapist
为寻求成为指压治疗师的视障人士开发僵硬识别学习套件
  • 批准号:
    23K17629
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Inclusive Cross-sensory Social Play: Towards a new paradigm of assistive technology for early development of blind and visually impaired children
包容性跨感官社交游戏:为盲人和视障儿童的早期发展提供辅助技术的新范式
  • 批准号:
    EP/Y023676/1
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
    Research Grant
Anatomical, neural, and computational constraints on sensory cross-modal plasticity following early blindness
早期失明后感觉跨模态可塑性的解剖学、神经学和计算限制
  • 批准号:
    10570400
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
Glove-based Tactile Streaming of Braille Characters and Digital Images for the Visually Impaired
为视障人士提供基于手套的盲文字符和数字图像触觉流传输
  • 批准号:
    10601900
  • 财政年份:
    2023
  • 资助金额:
    $ 53.29万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了