权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Self-supervised graph-based representation for language and speaker detection

用于语言和说话人检测的自监督基于图的表示

基本信息

批准号：
21K17776
负责人：
沈鵬
金额：
$ 2.91万
依托单位：
National Institute of Information and Communications Technology
依托单位国家：
日本
项目类别：
Grant-in-Aid for Early-Career Scientists
财政年份：
2021
资助国家：
日本
起止时间：
2021-04-01 至 2024-03-31
项目状态：
已结题

项目摘要

I focused on investigating how to better represent speech signals for both language recognition and speech recognition tasks. In detail, the following work was done to progress this project:1. Improving the representation of speech signal for language identification (LID): We propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. The research paper was accepted by Interspeech 2022. Additionally, we further investigated these techniques on the NICT LID system, which also demonstrated robustness on cross-channel data.2. Another work focuses on improving RNN-T for Mandarin ASR. I propose to use a novel pronunciation-aware unique character encoding for building end-to-end RNN-T-based Mandarin ASR systems. The proposed encoding is a combination of pronunciation-based syllable and character index (CI). By introducing the CI, the RNN-T model can overcome the homophone problem while utilizing the pronunciation information for extracting modeling units. With the proposed encoding, the model outputs can be converted into the final recognition result through a one-to-one mapping. This paper was accepted by IEEE SLT 2022.

我专注于研究如何更好地表示语言识别和语音识别任务的语音信号。具体而言，为推进本项目的实施，主要做了以下工作：1.改进用于语言识别（LID）的语音信号的表示：我们通过将RNN转换器模型集成到语言嵌入框架中，提出了一种用于LID任务的基于转换器的语言嵌入方法。受益于RNN换能器的语言表示能力的优点，所提出的方法可以利用语音感知的声学特征和显式语言特征用于LID任务。该研究论文被Interspeech 2022接受。此外，我们还在NICT LID系统上进一步研究了这些技术，该系统也证明了跨通道数据的鲁棒性。2.另一项工作重点是改进RNN-T用于普通话ASR。我建议使用一种新的发音感知的独特字符编码来构建端到端的基于RNN-T的普通话ASR系统。所提出的编码是基于发音的音节和字符索引（CI）的组合。通过引入CI，RNN-T模型可以克服同音异义词问题，同时利用发音信息提取建模单元。利用所提出的编码，模型输出可以通过一对一的映射转换为最终的识别结果。该论文被IEEE EQUIP 2022接受。