权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Video-based Speech Enhancement for Persons with Vision and Hearing Loss

针对视力和听力损失人士的基于视频的语音增强

基本信息

批准号：
8443624
负责人：
Ender Tekin
金额：
$ 19.88万
依托单位：
SMITH-KETTLEWELL EYE RESEARCH INSTITUTE
依托单位国家：
美国
项目类别：
财政年份：
2013
资助国家：
美国
起止时间：
2013-06-01 至 2015-05-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8443624
关键词：
Accounting Acoustics Activities of Daily Living Address Adult Age Age-Years Aging-Related Process Algorithms Amplifiers Area Auditory Blindness Communication Comprehension Computer Vision Systems Cues Dependence Detection Development Devices Digital Signal Processing Effectiveness Elderly Environment Facial Expression Feedback Grant Hearing Aids Human Laboratories Lead Learning Life Lip structure Literature Location Machine Learning Measures Methods Modality Modeling Noise Output Performance Persons Play Population Presbycusis Process Quality of life Research Role Self-Help Devices Sensory Sensory Aids Shapes Signal Transduction Societies Source Speech Speech Intelligibility Speech Perception Staging System Techniques Testing Time United States Vision Visual Visual impairment Voice base computerized data processing design hearing impairment improved interest novel strategies performance tests prototype public health relevance research study social sound speech recognition tool development visual information

项目摘要

DESCRIPTION (provided by applicant): Video-based Speech Enhancement for Persons with Hearing and Vision Loss Project Summary It is estimated that by 2030, the number of people in the United States over the age of 65 will account for over 20% of the total population. Hearing and vision loss naturally accompanies the aging process. Persons with hearing loss can benefit from observing the visual cues from a speaker such as the shape of the lips and facial expression to greatly improve their ability to comprehend speech. However, persons with vision loss cannot make use of these visual cues, and have a harder time understanding speech, especially in noisy environments. Furthermore, people with normal vision can use visual information to identify a speaker in a group, which allows them to focus on this person. This can greatly benefit a person with hearing loss who may be using a device such as a sound amplifier or a hearing aid. A user with vision loss, however, needs to be provided with this speaker information to make optimal use of such devices. We propose developing a prototype device that will clean the speech signal from a target speaker and improve speech comprehension for persons with hearing and vision loss in everyday situations. In order to accomplish this task, we need to harness the visual cues that have so far largely been ignored in the design of assistive technolo- gies for persons with hearing loss. Our first aim is to learn speaker-independent visual cues that are associated with the target speech signal, and use these audio-visual cues to design speech enhancement algorithms that perform much better in noisy everyday environment than current methods which only utilize the audio signal. We will utilize a video camera and computer vision methods to design advanced digital signal processing techniques to enhance the target speech signals recorded through a microphone. Our second aim is to use the video and audio signals to detect and efficiently localize the visible speaker. The information regarding the location of the speaker of interest can then be used to efficiently perform speaker separation, as well as be provided to the user. Finally, we aim to implement these developed algorithms on a portable prototype system. We will test the performance of this system and improve the user-interface through user experiments in real-world situations as well as laboratory conditions. The end product will show the feasibility and importance of incorporating multiple modalities into sensory assistive devices, and set the stage for future research and development efforts.

描述(申请人提供)：基于视频的失聪人士语音增强项目摘要据估计，到2030年，美国65岁以上的人口将占总人口的20%以上。听力和视力的丧失自然伴随着衰老过程。听力损失的人可以从观察说话者的视觉线索中受益，例如嘴唇的形状和面部表情，从而极大地提高他们理解语言的能力。然而，视力丧失的人不能利用这些视觉线索，并且更难理解言语，特别是在嘈杂的环境中。此外，视力正常的人可以使用视觉信息来识别一组说话者，这使得他们能够专注于这个人。这对可能正在使用诸如扩音器或助听器等设备的听力损失的人来说是非常有益的。然而，需要向视力丧失的用户提供该扬声器信息以最佳地使用这种设备。我们建议开发一种原型设备，用于清除来自目标说话人的语音信号，并提高日常情况下听力和视力丧失的人的语音理解能力。为了完成这项任务，我们需要利用视觉线索，到目前为止，在为听力损失患者设计辅助技术时，视觉线索基本上被忽视了。我们的第一个目标是学习与目标语音信号相关联的与说话人无关的视觉线索，并使用这些视听线索来设计在日常噪声环境中比目前仅利用音频信号的方法更好地执行的语音增强算法。我们将利用摄像机和计算机视觉方法设计先进的数字信号处理技术，以增强通过麦克风记录的目标语音信号。我们的第二个目标是使用视频和音频信号来检测和有效地定位可见说话人。然后，可以使用关于感兴趣说话者的位置的信息来有效地执行说话者分离，并将其提供给用户。最后，我们的目标是在一个可移植的原型系统上实现这些算法。我们将通过在真实场景和实验室条件下的用户实验来测试该系统的性能，并改进用户界面。最终产品将展示将多种模式整合到感官辅助设备中的可行性和重要性，并为未来的研究和开发工作奠定基础。