权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Perceptual Methods for Speech Communication

言语交流的感知方法

基本信息

批准号：
RGPIN-2016-04412
负责人：
Chan, WaiYip
金额：
$ 2.62万
依托单位：
Queen's University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2017
资助国家：
加拿大
起止时间：
2017-01-01 至 2018-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=637488
关键词：
Perceptual Methods Speech Communication

项目摘要

Speech communication is arguably the most important social activity. Speech signals are packed with verbal and non-verbal information. Humans can learn a great deal from listening to just a snippet of spoken speech: the spoken words and language; the speaker’s identity, gender, accent, age, language proficiency, emotional state, health; the background environment based on the ambient sounds and noises captured; and any communications/recording media which carried the signal, based on perceivable “signatures” imprinted by the media on the signal. Moreover, human listeners are remarkably capable of comprehending the embedded linguistic and paralinguistic information even when the speech signal is highly corrupted. Computational intelligence, as embodied in the signal processing algorithms that run on electronic devices, is not (yet) a match for human performance. Nevertheless, the emerging Internet of Everything will further multiply the opportunities for speech communication between humans and electronic devices and between humans mediated by networked devices. Such speech communication can happen anytime and anywhere. As a result, the instances of devices picking up degraded speech signals will mushroom. To address this challenge, we propose to research speech signal processing methods that would enable machine extraction of speech information to be resilient to acoustic degradations, i.e., ambient noise and reverberation. Ideas for innovation will draw upon emerging advances in speech cognitive science and robust signal modeling and recovery techniques. New speech signal processing methods inspired by human cortical processing of acoustic speech signals will be developed. Moreover, a speech signal model ubiquitously deployed in today’s cellular telephones will be revamped. A major goal is to advance the current level of machine comprehension of verbal information embedded in degraded speech signals, and by doing so also provide human listeners with highly intelligible speech. The new knowledge, methods, and algorithms anticipated from the proposed research will engender new capabilities for enhancing existing and enabling future speech communication and acoustic interface technologies. Besides training highly qualified R&D personnel, the research will produce new and better tools to benefit both industry and academia. One type of tool to be produced will enable R&D engineers to optimize their speech enhancement algorithms. Industry sectors that will benefit from the proposed research include communications and information technology, gaming, machine-mediated learning, hearing instruments, health and fitness, and robotics. The new knowledge and solutions are anticipated to also contribute to farther fields such as soundscape design for indoor and outdoor environments.

言语交际可以说是最重要的社会活动。语音信号包含了语言和非语言信息。人类可以从仅仅听一段口头讲话中学到很多东西：口语和语言;说话者的身份，性别，口音，年龄，语言熟练程度，情绪状态，健康状况;基于捕获的环境声音和噪音的背景环境;以及任何携带信号的通信/记录媒体，基于媒体在信号上留下的可感知的“签名”。此外，即使在语音信号高度失真的情况下，人类听众也能够理解嵌入的语言和非语言信息。在电子设备上运行的信号处理算法中体现的计算智能（尚未）与人类的表现相匹配。然而，新兴的万物互联将进一步增加人与电子设备之间以及通过联网设备介导的人与人之间的语音通信的机会。这种语音交流可以随时随地发生。因此，设备拾取降级语音信号的实例将如雨后春笋般涌现。为了应对这一挑战，我们提出研究语音信号处理方法，该方法将使语音信息的机器提取能够对声学劣化具有弹性，即，环境噪声和混响。创新的想法将借鉴语音认知科学和强大的信号建模和恢复技术的新兴进展。新的语音信号处理方法的启发，人类皮层处理的声学语音信号将被开发。此外，语音信号模型无处不在地部署在今天的蜂窝电话将被改造。一个主要目标是提高当前机器对嵌入在退化语音信号中的言语信息的理解水平，并通过这样做为人类听众提供高度可理解的语音。新的知识，方法和算法预计从拟议的研究将产生新的能力，以加强现有的和未来的语音通信和声学接口技术。除了培养高素质的研发人员外，这项研究还将产生新的更好的工具，使工业界和学术界受益。其中一种工具将使研发工程师能够优化他们的语音增强算法。将受益于拟议研究的行业包括通信和信息技术、游戏、机器介导学习、听力仪器、健康和健身以及机器人技术。新的知识和解决方案预计也将有助于进一步的领域，如室内和室外环境的声景设计。