Video-based Speech Enhancement for Persons with Vision and Hearing Loss

针对视力和听力损失人士的基于视频的语音增强

基本信息

  • 批准号:
    8443624
  • 负责人:
  • 金额:
    $ 19.88万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-06-01 至 2015-05-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): Video-based Speech Enhancement for Persons with Hearing and Vision Loss Project Summary It is estimated that by 2030, the number of people in the United States over the age of 65 will account for over 20% of the total population. Hearing and vision loss naturally accompanies the aging process. Persons with hearing loss can benefit from observing the visual cues from a speaker such as the shape of the lips and facial expression to greatly improve their ability to comprehend speech. However, persons with vision loss cannot make use of these visual cues, and have a harder time understanding speech, especially in noisy environments. Furthermore, people with normal vision can use visual information to identify a speaker in a group, which allows them to focus on this person. This can greatly benefit a person with hearing loss who may be using a device such as a sound amplifier or a hearing aid. A user with vision loss, however, needs to be provided with this speaker information to make optimal use of such devices. We propose developing a prototype device that will clean the speech signal from a target speaker and improve speech comprehension for persons with hearing and vision loss in everyday situations. In order to accomplish this task, we need to harness the visual cues that have so far largely been ignored in the design of assistive technolo- gies for persons with hearing loss. Our first aim is to learn speaker-independent visual cues that are associated with the target speech signal, and use these audio-visual cues to design speech enhancement algorithms that perform much better in noisy everyday environment than current methods which only utilize the audio signal. We will utilize a video camera and computer vision methods to design advanced digital signal processing techniques to enhance the target speech signals recorded through a microphone. Our second aim is to use the video and audio signals to detect and efficiently localize the visible speaker. The information regarding the location of the speaker of interest can then be used to efficiently perform speaker separation, as well as be provided to the user. Finally, we aim to implement these developed algorithms on a portable prototype system. We will test the performance of this system and improve the user-interface through user experiments in real-world situations as well as laboratory conditions. The end product will show the feasibility and importance of incorporating multiple modalities into sensory assistive devices, and set the stage for future research and development efforts.
描述(申请人提供):基于视频的失聪人士语音增强项目摘要据估计,到2030年,美国65岁以上的人口将占总人口的20%以上。听力和视力的丧失自然伴随着衰老过程。听力损失的人可以从观察说话者的视觉线索中受益,例如嘴唇的形状和面部表情,从而极大地提高他们理解语言的能力。然而,视力丧失的人不能利用这些视觉线索,并且更难理解言语,特别是在嘈杂的环境中。此外,视力正常的人可以使用视觉信息来识别一组说话者,这使得他们能够专注于这个人。这对可能正在使用诸如扩音器或助听器等设备的听力损失的人来说是非常有益的。然而,需要向视力丧失的用户提供该扬声器信息以最佳地使用这种设备。我们建议开发一种原型设备,用于清除来自目标说话人的语音信号,并提高日常情况下听力和视力丧失的人的语音理解能力。为了完成这项任务,我们需要利用视觉线索,到目前为止,在为听力损失患者设计辅助技术时,视觉线索基本上被忽视了。我们的第一个目标是学习与目标语音信号相关联的与说话人无关的视觉线索,并使用这些视听线索来设计在日常噪声环境中比目前仅利用音频信号的方法更好地执行的语音增强算法。我们将利用摄像机和计算机视觉方法设计先进的数字信号处理技术,以增强通过麦克风记录的目标语音信号。我们的第二个目标是使用视频和音频信号来检测和有效地定位可见说话人。然后,可以使用关于感兴趣说话者的位置的信息来有效地执行说话者分离,并将其提供给用户。最后,我们的目标是在一个可移植的原型系统上实现这些算法。我们将通过在真实场景和实验室条件下的用户实验来测试该系统的性能,并改进用户界面。最终产品将展示将多种模式整合到感官辅助设备中的可行性和重要性,并为未来的研究和开发工作奠定基础。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Ender Tekin其他文献

Ender Tekin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Nonlinear Acoustics for the conditioning monitoring of Aerospace structures (NACMAS)
用于航空航天结构调节监测的非线性声学 (NACMAS)
  • 批准号:
    10078324
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
    BEIS-Funded Programmes
ORCC: Marine predator and prey response to climate change: Synthesis of Acoustics, Physiology, Prey, and Habitat In a Rapidly changing Environment (SAPPHIRE)
ORCC:海洋捕食者和猎物对气候变化的反应:快速变化环境中声学、生理学、猎物和栖息地的综合(蓝宝石)
  • 批准号:
    2308300
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Continuing Grant
University of Salford (The) and KP Acoustics Group Limited KTP 22_23 R1
索尔福德大学 (The) 和 KP Acoustics Group Limited KTP 22_23 R1
  • 批准号:
    10033989
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Knowledge Transfer Partnership
User-controllable and Physics-informed Neural Acoustics Fields for Multichannel Audio Rendering and Analysis in Mixed Reality Application
用于混合现实应用中多通道音频渲染和分析的用户可控且基于物理的神经声学场
  • 批准号:
    23K16913
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Combined radiation acoustics and ultrasound imaging for real-time guidance in radiotherapy
结合辐射声学和超声成像,用于放射治疗的实时指导
  • 批准号:
    10582051
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
Comprehensive assessment of speech physiology and acoustics in Parkinson's disease progression
帕金森病进展中言语生理学和声学的综合评估
  • 批准号:
    10602958
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
The acoustics of climate change - long-term observations in the arctic oceans
气候变化的声学——北冰洋的长期观测
  • 批准号:
    2889921
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Studentship
Collaborative Research: Estimating Articulatory Constriction Place and Timing from Speech Acoustics
合作研究:从语音声学估计发音收缩位置和时间
  • 批准号:
    2343847
  • 财政年份:
    2023
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Standard Grant
Collaborative Research: Estimating Articulatory Constriction Place and Timing from Speech Acoustics
合作研究:从语音声学估计发音收缩位置和时间
  • 批准号:
    2141275
  • 财政年份:
    2022
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Standard Grant
Flow Physics and Vortex-Induced Acoustics in Bio-Inspired Collective Locomotion
仿生集体运动中的流动物理学和涡激声学
  • 批准号:
    DGECR-2022-00019
  • 财政年份:
    2022
  • 资助金额:
    $ 19.88万
  • 项目类别:
    Discovery Launch Supplement
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了