权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Experiments and models of speech recognition across tonal and non-tonal language systems (EMSATON)

跨声调和非声调语言系统的语音识别实验和模型（EMSATON）

基本信息

批准号：
415895050
负责人：
Professor Dr. Birger Kollmeier
金额：
--
依托单位：
Abteilung Medizinische Physik
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2019
资助国家：
德国
起止时间：
2018-12-31 至 2022-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/415895050?language=en
关键词：
Experiments models speech recognition across

项目摘要

Human speech communication is the basis of our culture. Even though the articulation organs and the ear are very similar across all humans, their usage across languages shows high variability in solving the task to effectively communicate not only in quiet, but also under challenging acoustical conditions and for hearing impairment. The current project will shed light on how this is achieved by exploring the acoustic, phonetic and audiological foundations of speech recognition of tonal and non-tonal languages and the ability of current speech recognition models to replicate possible differences in recognition across tonal and non-tonal languages. The main long-term goal of EMSATON is to quantitatively understand the reduction of human speech recognition in noise as being influenced by different talkers and speaking styles (i.e., Lombard speech), different language systems (i.e., tonal languages (Mandarin, Cantonese) vs. Western languages (German, English, Spanish)) and different impairment factors (i.e., type of noise, reverberation, individual hearing impairment).We will exploit and extend the closed-set multilingual Matrix sentence recognition test that can be used to assess speech recognition in a highly comparable way across languages (i.e., 20 languages including German, British and American English, Spanish, and recently Mandarin). We will develop the Matrix test in Cantonese to have a second tonal language as reference and will relate the new tonal language tests to non-tonal languages. We will also investigate the effect of talker by including (bilingual) talkers and the effect of speaking style (normal and Lombard speech with a high production effort). Both objective acoustic-phonetic analysis and speech recognition modelling will be performed to better understand the differences and the importance of different speech cues across different languages (tonal vs. non-tonal), across talkers, and speaking styles. In order to identify relevant factors for (speech-related) differences across very different languages and to evaluate a number of assumptions of existing models like SII, HASPI, STOI or the FADE model, the data across languages, speakers and speaking styles will be used to test the prediction accuracy of current models and to establish a benchmark set of data and predictions.This will provide us with the basis for a quantitative, model-based analysis of the language effect and several underlying factors across two tonal languages (Mandarin, Cantonese) and typical non-tonal languages (German, English, Spanish). A possible outcome might be guidelines for constructing assistive listening and hearing devices in a more language-type-specific way, thus optimizing the respective benefit for tonal and non-tonal language users.

人类的言语交流是我们文化的基础。尽管发音器官和耳朵在所有人类中都非常相似，但它们在不同语言中的使用显示出高度的变异性，不仅在安静的情况下，而且在具有挑战性的声学条件下和听力障碍的情况下解决有效沟通的任务。目前的项目将通过探索声调和非声调语言的语音识别的声学、语音和听力基础，以及现有语音识别模型复制声调和非声调语言在识别方面可能存在的差异的能力，阐明如何实现这一点。EMSATON的主要长期目标是定量地了解不同说话者和说话风格(即隆巴德语言)、不同语言系统(即声调语言(普通话、广东话)与西方语言(德语、英语、西班牙语))和不同损伤因素(即噪声类型、混响、个体听力障碍)对人类在噪声中的语音识别能力的影响。我们将开发和扩展封闭式多语言矩阵句子识别测试，该测试可用于以高度可比的方式评估跨语言的语音识别(即，包括德语、英式和美式英语、西班牙语，最近还有普通话)。我们将发展粤语的矩阵测试，以第二种声调语言作为参考，并将新的声调语言测试与非声调语言联系起来。我们还将调查说话者的效果，包括(双语)说话者和说话风格(正常和隆巴德语言，高产出努力)的效果。将进行客观的声学-语音分析和语音识别建模，以更好地了解不同语言(声调与非声调)、说话者和说话风格之间不同语音提示的差异和重要性。为了找出不同语言(与语音有关的)差异的相关因素，并评估现有模型如SII、HASPI、STOI或FADE模型的一些假设，我们将使用跨语言、说话人和说话风格的数据来测试当前模型的预测准确性，并建立一组基准数据和预测。这将为我们提供基础，以定量的、基于模型的分析两种声调语言(普通话、广东话)和典型的非声调语言(德语、英语、西班牙语)的语言效果和几个潜在因素。一个可能的结果可能是以更具语言类型的方式构建辅助听力和听力设备的指导方针，从而优化对有声调和非有声调语言使用者的各自好处。