权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Leveraging Natural Language Processing for Reverberant Speech Enhancement in Cochlear Implants

利用自然语言处理增强人工耳蜗的混响语音

基本信息

批准号：
10755798
负责人：
LESLIE M. COLLINS
金额：
$ 17.38万
依托单位：
DUKE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-03-15 至 2025-02-28
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10755798
关键词：
Acoustics Address Algorithms Area Artificial Intelligence Audiology Auditory Auditory Perception Benchmarking Church Clinical Cochlear Implants Comprehension Computer software Computers Data Effectiveness Environment Familiarity Frequencies Future Goals Hearing Aids Home Individual Knowledge Language Literature Machine Learning Maps Masks Methods Modernization Morphologic artifacts Natural Language Processing Noise Performance Predictive text Quality of life Research Resources Signal Transduction Speech Speech Intelligibility Speech Perception Structure System Techniques Testing Text Time United States National Institutes of Health Voice Work automated speech recognition deaf effectiveness evaluation experience experimental study flexibility hearing impairment improved innovation intelligent personal assistant machine learning algorithm multidisciplinary normal hearing novel open source portability prototype signal processing speech processing speech recognition speech synthesis success syntax time use

项目摘要

ABSTRACT The overarching goal of this project is to develop algorithms to address the difficulties that cochlear implant (CI) users experience interpreting speech in reverberant listening environments like churches, auditoriums and classrooms. Recent research has made progress in this area using time-frequency masking techniques, but these algorithms are often not robust in changing acoustic environments or are not amenable to real time processing. Machine learning (ML) and artificial intelligence (AI) techniques are burgeoning in many applications areas recently, but to date, AI/ML approaches for reverberation in CI users have shown limited success. Our proposed approach is to investigate several AI/ML speech enhancement methods based on the natural language processing (NLP) field to essentially recognize speech in reverberation and then clean it. We will provide final assessment of algorithm performance by using the open-source NIH-supported CCi-MOBILE CI research platform for its ease and flexibility necessary for developing and prototyping CI signal processing algorithms. We propose to use phoneme-based recognition and automatic speech recognition (ASR) approaches to develop and test our reverberation mitigation algorithms. Aim 1 will investigate the real-time feasibility of exploiting phoneme recognition for ML-based T-F masking in CIs. We will develop a novel phoneme-based T-F mask estimation algorithm and conduct speech recognition tests with an offline algorithm mode to compare conventional and phoneme-based T-F masking. This work will determine whether phoneme knowledge is beneficial for speech enhancement in CIs. Aim 2 will investigate the utility of real-time T-F mask estimation in CI users. We will implement various T-F mask estimation algorithms to mitigate reverberation from the literature (including our novel phoneme-based T-F algorithm developed in Aim 1) in real-time in CCi- MOBILE. In addition to their impact on speech intelligibility, algorithms will be benchmarked against CI computational limits and tolerable time delays of audiovisual asynchrony. This work will evaluate the effectiveness of T-F mask estimation algorithms in real-time operational conditions. Aim 3 will investigate advancing speech intelligibility for CI users via ASR and text-to-speech synthesis (ASR-TTS). We will investigate various front-end speech enhancement strategies to improve ASR predictions and TTS engines with generic and familiar synthetic voices. This work will use CCi-MOBILE to evaluate the utility of ASR-TTS and the effect of speaker familiarity on reverberant speech intelligibility in CI users. Our team brings AI/ML, hardware, experimental testing and audiology experience that will be needed for successful research. CCi- CLOUD, a cloud feature of CCI-MOBILE, will be used to facilitate remote and collaborative CI user studies. Our work is highly innovative and has the potential to instigate a paradigm shift towards AI/ML-driven auditory protheses that leverage NLP to adapt speech processing strategies to acoustic settings to maximize user benefits. Demonstrated success will improve the quality of life of CI users.

摘要这个项目的首要目标是开发算法来解决人工耳蜗植入的困难。 (CI)用户体验到在像教堂、礼堂和音乐厅这样的混响收听环境中解释语音，教室最近的研究在这一领域取得了进展，使用时频掩蔽技术，但这些算法通常在变化的声学环境中不鲁棒或者不适合于真实的时间处理.机器学习（ML）和人工智能（AI）技术在许多领域正在蓬勃发展。最近，AI/ML在CI用户中的混响方法显示出有限的应用领域，成功我们所提出的方法是研究几种AI/ML语音增强方法，自然语言处理（NLP）领域，从本质上识别混响中的语音，然后对其进行清理。我们将使用NIH支持的开源CCi-MOBILE提供算法性能的最终评估 CI研究平台，用于开发和原型化CI信号处理所需的易用性和灵活性算法我们建议使用基于音素的识别和自动语音识别（ASR）方法来开发和测试我们的混响缓解算法。目标1将调查实时在CI中利用基于ML的T-F掩蔽的音素识别的可行性。我们要写一部小说基于音素的T-F掩码估计算法，并使用离线算法进行语音识别测试模式来比较传统的和基于音素的T-F掩蔽。这项工作将确定音素是否知识对于CI中的语音增强是有益的。目标2将研究实时T-F掩模的实用性在CI用户中的估计。我们将实现各种T-F掩模估计算法来减轻混响从文献（包括我们的新的音素为基础的T-F算法在目标1）实时在CCi- 移动的.除了对语音清晰度的影响外，算法还将以CI为基准计算限制和可容忍的视听延迟。这项工作将评估 T-F掩模估计算法在实时操作条件下的有效性。Aim 3将进行调查通过ASR和文本到语音合成（ASR-TTS）提高CI用户的语音清晰度。我们将研究各种前端语音增强策略，以改善ASR预测和TTS引擎用普通和熟悉的合成声音。本文将使用CCi-MOBILE来评估ASR-TTS的实用性以及说话人熟悉度对CI用户混响语音清晰度的影响。我们的团队带来了AI/ML，硬件，实验测试和听力学经验，将需要成功的研究。CCi- 云，CCI-MOBILE的云功能，将用于促进远程和协作CI用户研究。我们的工作具有高度创新性，有可能引发向AI/ML驱动的听觉模式转变。利用NLP使语音处理策略适应声学设置，以最大限度地提高用户效益证明成功将提高CI用户的生活质量。