基于端到端多任务学习的藏语多方言语音识别方法研究-猫眼课题宝

权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

基于端到端多任务学习的藏语多方言语音识别方法研究

结题报告

批准号：

61976236

项目类别：

面上项目

资助金额：

61.0 万元

负责人：

赵悦

依托单位：

中央民族大学

学科分类：

机器感知与机器视觉

结题年份：

2023

批准年份：

2019

项目状态：

已结题

项目参与者：

赵悦

关键词：

端到端训练声学模型深度概率图模型藏语多方言语音识别多任务学习

国基评审专家1V1指导中标率高出同行96.8%

中文摘要

藏语是我国少数民族语言中应用比较广泛的语言之一，主要分为卫藏、康和安多三大方言。目前，藏语语音识别的主要研究对象是卫藏方言拉萨话，康方言和安多方言缺乏足够的研究。为促进各方言区人们的交流，加速藏语多方言语音识别技术发展，本项目以实现藏语多方言多任务语音识别系统为目标，将多方言语音内容识别、方言种类识别和说话人识别等多个任务整合到统一的网络架构中，利用彼此之间的隐含信息提升各单独任务和低资源方言的识别性能。本项目拟从多方言语音特征表征和多任务关系学习两个关键技术开展研究，解决三个关键科学问题：1) 深度概率图模型的数据表示方法；2) 端到端训练的深度概率图模型学习方法；3) 多任务之间相关性学习方法。.具体研究内容和研究方案是：1)开展基于深度时序回归贝叶斯网络的藏语多方言语音特征表示研究；2) 开展基于软参数共享技术的多任务共享关系研究；3) 开展基于多任务损失函数的任务间互益关系研究。

英文摘要

Tibetan is one of the most widely used languages of ethnic minorities in China. It is divided into three major dialects in China, including Ü-Tsang, Kham and Amdo dialect. At present, the main research object of Tibetan speech recognition is Lhasa speech of Ü-Tsang dialect. There is a serious lack of research on speech recognition for Kham dialect and Amdo dialect. In order to promote the communication of people in different dialect areas, and speed up the research of other Tibetan dialects’ speech recognition, this project aims to build a Tibetan multi-dialect multitask speech recognition system, which integrates multi-dialect speech content recognition, dialect identity recognition and speaker recognition into a unified network architecture, making full use of the implicit information between each other to improve the recognition performance of each individual task and low-resource Tibetan dialects. This project focuses on two key technologies of multi-dialect speech feature representation and the multitask relationship learning. The three key scientific problems need to be solved in this project: 1) deep probabilistic graphical model based data representation method; 2) deep probabilistic graphical model learning method using end-to-end training; 3) multitask relatedness learning methods..The specific research contents and schemes are as follows:1) research on deep temporal regression Bayesian network based Tibetan multi-dialect speech feature representation; 2) research on soft parameter sharing technology based multitask sharing relationship learning; 3) research on multitask loss function based multitask mutual benefit relationship learning.

期刊论文列表

专著列表

科研奖励列表

会议论文列表

专利列表

An Open Speech Resource for Tibetan Multi-dialect and Multi-task Recognition

DOI：--

发表时间：2020

期刊：

International Journal of Computational Science and Engineering

影响因子：2

作者：

Yue Zhao;Xiaona Xu;Jianjian Yue;Wei Song;Xiali Li;Licheng Wu;Qiang Ji

通讯作者：Qiang Ji

Latent Regression Bayesian Network for Speech Representation

用于语音表示的潜在回归贝叶斯网络

DOI：10.3390/electronics12153342

发表时间：2023

期刊：

Electronics

影响因子：2.9

作者：

Liang Xu;Yue Zhao;Xiaona Xu;Yigang Liu;Qiang Ji

通讯作者：Qiang Ji

Near-Optimal Active Learning for Multilingual Grapheme-to-Phoneme Conversion

多语言字素到音素转换的近乎最优主动学习

DOI：--

发表时间：2023

期刊：

Applied Sciences

影响因子：--

作者：

Dezhi Cao;Yue Zhao;Licheng Wu

通讯作者：Licheng Wu

WaveNet With Cross-Attention for Audiovisual Speech Recognition

DOI：10.1109/access.2020.3024218

发表时间：2020

期刊：

IEEE Access

影响因子：3.9

作者：

Hui Wang;Fei Gao;Yue Zhao;Licheng Wu

通讯作者：Hui Wang;Fei Gao;Yue Zhao;Licheng Wu

Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition.

DOI：10.3390/e24101429

发表时间：2022-10-08

期刊：

Entropy (Basel, Switzerland)

影响因子：--

作者：

通讯作者：

国内基金

海外基金

会员权益说明：