权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)

面向认知启发的多模式助听设备的视觉驱动语音增强 (AV-COGHEAR)

基本信息

批准号：
EP/M026981/1
负责人：
Amir Hussain
金额：
$ 53.29万
依托单位：
University of Stirling
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2015
资助国家：
英国
起止时间：
2015 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FM026981%2F1
关键词：
Towards visually driven speech enhancement

项目摘要

Current commercial hearing aids use a number of sophisticated enhancement techniques to try and improve the quality of speech signals. However, today's best aids fail to work well in many everyday situations. In particular, they fail in busy social situations where there are many competing speech sources; they fail if the speaker is too far from the listener and swamped by noise. We have identified an opportunity to solve this problem by building hearing aids that can 'see'. This ambitious project aims to develop a new generation of hearing aid technology that extracts speech from noise by using a camera to see what the talker is saying. The wearer of the device will be able to focus their hearing on a target talker and the device will filter out competing sound. This ability, which is beyond that of current technology, has the potential to improve the quality of life of the millions suffering from hearing loss (over 10m in the UK alone).Our approach is consistent with normal hearing. Listeners naturally combine information from both their ears and eyes: we use our eyes to help us hear. When listening to speech, eyes follow the movements of the face and mouth and a sophisticated, multi-stage process uses this information to separate speech from the noise and fill in any gaps. Our hearing aid will act in much the same way. It will exploit visual information from a camera (e.g.using a Google Glass like system), and novel algorithms for intelligently combining audio and visual information, in order to improve speech quality and intelligibility in real-world noisy environments. The project is bringing together a critical mass of researchers with the complementary expertise necessary to make the audio-visual hearing-aid possible. The project will combine new contrasting approaches to audio-visual speech enhancement that have been developed by the Cognitive Computing group at Stirling and the Speech and Hearing Group at Sheffield. The Stirling approach uses the visual signal to filter out noise; whereas the Sheffield approach uses the visual signal to fill in 'gaps' in the speech. The vision processing needed to track a speaker's lip and face movement will use a revolutionary 'bar code' representation developed by the Psychology Division at Stirling. The MRC Institute of Hearing Research (IHR) will provide the expertise needed to evaluate the approach on real hearing loss sufferers. Phonak AG, a leading international hearing aid manufacturer, will provide the advice and guidance necessary to maximise potential for industrial impact.The project has been designed as a series of four workpackages that consider the key research challenges related to each component of the device's design. These questions have been identified by preliminary work at Sheffield and Stirling. Among the challenges are developing improved techniques for visually-driven audio-analysis; designing better metrics for weighting audio and visual evidence; developing techniques for optimally combining the noise-filtering and gap-filling approaches. A further key challenge is that, for a hearing aid to be effective, the processing cannot delay the signal by more than 10ms. In the final year of the project a full integrated, software prototype will be clinically evaluated using listening tests with hearing-impaired volunteers in a range of modern noisy reverberant environments. Evaluation will use a new purpose-built speech corpus that will be designed specifically for testing this new class of multimodal device. The project's clinical research partner, the Scottish Section of MRC IHR, will provide advice on the experimental design and analysis aspects throughout the trials. Industry leader Phonak AG will provide advice and technical support for benchmarking real-time hearing devices. The final clinically-tested prototype will be made available to the whole hearing community as a testbed for further research, development, evaluation and benchmarking.

目前的商用助听器使用了许多复杂的增强技术来尝试提高语音信号的质量。然而，当今最好的辅助工具在许多日常情况下都不能很好地发挥作用。特别是在有许多相互竞争的言语来源的繁忙的社交场合中，它们会失败；如果说话者离听者太远，被噪音淹没，他们就会失败。我们已经发现了一个解决这个问题的机会，那就是制造能“看见”的助听器。这个雄心勃勃的项目旨在开发新一代助听器技术，该技术可以通过摄像头看到说话者在说什么，从噪音中提取语音。该设备的佩戴者将能够将他们的听力集中在目标说话者身上，该设备将过滤掉竞争声音。这种能力超越了目前的技术，有可能改善数百万听力损失患者的生活质量（仅在英国就有超过1000万）。我们的方法符合正常听力。听者会很自然地把耳朵和眼睛的信息结合起来：我们用眼睛来帮助我们听。在听别人讲话时，眼睛会跟随脸部和嘴巴的动作，然后一个复杂的、多阶段的过程利用这些信息将语音从噪音中分离出来，并填补任何空白。我们的助听器也将以同样的方式发挥作用。它将利用来自相机的视觉信息（例如使用谷歌Glass之类的系统），以及智能结合音频和视觉信息的新算法，以提高现实世界嘈杂环境中的语音质量和可理解性。该项目汇集了大量具有互补性专业知识的研究人员，使视听助听器成为可能。该项目将结合由斯特林大学的认知计算小组和谢菲尔德大学的语言和听力小组开发的新的视听语音增强方法。斯特林方法利用视觉信号滤除噪声；而谢菲尔德方法则使用视觉信号来填补语音中的“空白”。追踪说话人嘴唇和面部运动所需的视觉处理将使用一种革命性的“条形码”表示方式，这种方式是由斯特林大学心理学部门开发的。MRC听力研究所（IHR）将提供所需的专业知识，以评估对真正听力损失患者的方法。峰力集团，国际领先的助听器制造商，将提供必要的建议和指导，以最大限度地发挥工业影响的潜力。该项目被设计为一系列四个工作包，考虑到与设备设计的每个组件相关的关键研究挑战。谢菲尔德和斯特林的初步工作已经确定了这些问题。挑战之一是开发改进的视觉驱动音频分析技术；设计更好的衡量视听证据的指标；开发了将噪声滤波和间隙填充方法最佳结合的技术。进一步的关键挑战是，为了使助听器有效，处理不能将信号延迟超过10ms。在项目的最后一年，一个完全集成的软件原型将在一系列现代嘈杂的混响环境中通过听力测试对听力受损的志愿者进行临床评估。评估将使用专门为测试这种新型多模态设备而设计的新的专用语音语料库。该项目的临床研究合作伙伴MRC IHR苏格兰分部将在整个试验过程中就实验设计和分析方面提供建议。行业领导者峰力集团将为实时听力设备标杆测试提供咨询和技术支持。最终的临床测试原型将提供给整个听力界，作为进一步研究、开发、评估和基准测试的测试平台。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Novel Spatiotemporal Longitudinal Methodology for Predicting Obesity Using Near Infrared Spectroscopy (NIRS) Cerebral Functional Activity Data

DOI：
10.1007/s12559-017-9541-x
发表时间：
2018-01
期刊：
Cognitive Computation
影响因子：
5.4
作者：
A. Abdullah;A. Hussain;Imtiaz Hussain Khan
通讯作者：
A. Abdullah;A. Hussain;Imtiaz Hussain Khan

Lip-Reading Driven Deep Learning Approach for Speech Enhancement

DOI：
10.1109/tetci.2019.2917039
发表时间：
2021-06-01
期刊：
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE
影响因子：
5.3
作者：
Adeel, Ahsan;Gogate, Mandar;Whitmer, William M.
通讯作者：
Whitmer, William M.

Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System

认知启发的视听语音过滤：走向智能、模糊、多模态、两阶段语音增强系统

DOI：
发表时间：
2015
期刊：
影响因子：
0
作者：
Abel Andrew
通讯作者：
Abel Andrew

An Enhanced Binary Particle Swarm Optimization (E-BPSO) algorithm for service placement in hybrid cloud platforms

DOI：
10.1007/s00521-022-07839-5
发表时间：
2018-06
期刊：
Neural Computing and Applications
影响因子：
6
作者：
Wissem Abbes;Zied Kechaou;Amir Hussain;A. Qahtani;Omar Almutiry;Habib Dhahri;A. Alimi
通讯作者：
Wissem Abbes;Zied Kechaou;Amir Hussain;A. Qahtani;Omar Almutiry;Habib Dhahri;A. Alimi

Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing

上下文敏感的新皮质神经元改变神经信息处理的有效性和效率

DOI：
10.48550/arxiv.2207.07338
发表时间：
2022
期刊：
影响因子：
0
作者：
Adeel A
通讯作者：
Adeel A

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Amir Hussain其他文献

Novel deep neural network based pattern field classification architectures

基于新型深度神经网络的模式场分类架构

DOI：
10.1016/j.neunet.2020.03.011
发表时间：
2020-03
期刊：
Neural Networks
影响因子：
7.8
作者：
Kaizhu Huang;Shufei Zhang;Rui Zhang;Amir Hussain
通讯作者：
Amir Hussain

Automatic object-oriented coding facility for product life cycle management of discrete products

用于离散产品的产品生命周期管理的自动面向对象编码工具

DOI：
发表时间：
2014
期刊：
International journal of computer integrated manufacturing (Print)
影响因子：
0
作者：
W. Khan;Amir Hussain
通讯作者：
Amir Hussain

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

具有 Conformer 的深度复杂 U-Net，用于增强视听语音

DOI：
发表时间：
2023
期刊：
arXiv.org
影响因子：
0
作者：
Shafique Ahmed;Chia;Wenze Ren;Chin;Ernie Chu;Jun;Amir Hussain;H. Wang;Yu Tsao;Jen
通讯作者：
Jen

AVSE Challenge: Audio-Visual Speech Enhancement Challenge

AVSE 挑战赛：视听语音增强挑战赛

DOI：
发表时间：
2023
期刊：
Spoken Language Technology Workshop
影响因子：
0
作者：
Andrea Lorena Aldana Blanco;Cassia Valentini;Ondrej Klejch;M. Gogate;K. Dashtipour;Amir Hussain;P. Bell
通讯作者：
P. Bell

Artificial intelligence-enabled analysis of UK and US public attitudes on Facebook and Twitter towards COVID-19 vaccinations

利用人工智能分析 Facebook 和 Twitter 上英国和美国公众对 COVID-19 疫苗接种的态度

DOI：
发表时间：
2020
期刊：
medRxiv
影响因子：
0
作者：
Amir Hussain;Ahsen Tahir;Zain U. Hussain;Zakariya Sheikh;M. Gogate;K. Dashtipour;Azhar Ali;Aziz Sheikh
通讯作者：
Aziz Sheikh