基于端到端多任务学习的藏语多方言语音识别方法研究

批准号:
61976236
项目类别:
面上项目
资助金额:
61.0 万元
负责人:
赵悦
依托单位:
学科分类:
机器感知与机器视觉
结题年份:
2023
批准年份:
2019
项目状态:
已结题
项目参与者:
赵悦
国基评审专家1V1指导 中标率高出同行96.8%
结合最新热点,提供专业选题建议
深度指导申报书撰写,确保创新可行
指导项目中标800+,快速提高中标率
微信扫码咨询
中文摘要
藏语是我国少数民族语言中应用比较广泛的语言之一,主要分为卫藏、康和安多三大方言。目前,藏语语音识别的主要研究对象是卫藏方言拉萨话,康方言和安多方言缺乏足够的研究。为促进各方言区人们的交流,加速藏语多方言语音识别技术发展,本项目以实现藏语多方言多任务语音识别系统为目标,将多方言语音内容识别、方言种类识别和说话人识别等多个任务整合到统一的网络架构中,利用彼此之间的隐含信息提升各单独任务和低资源方言的识别性能。本项目拟从多方言语音特征表征和多任务关系学习两个关键技术开展研究,解决三个关键科学问题:1) 深度概率图模型的数据表示方法;2) 端到端训练的深度概率图模型学习方法;3) 多任务之间相关性学习方法。.具体研究内容和研究方案是:1)开展基于深度时序回归贝叶斯网络的藏语多方言语音特征表示研究;2) 开展基于软参数共享技术的多任务共享关系研究;3) 开展基于多任务损失函数的任务间互益关系研究。
英文摘要
Tibetan is one of the most widely used languages of ethnic minorities in China. It is divided into three major dialects in China, including Ü-Tsang, Kham and Amdo dialect. At present, the main research object of Tibetan speech recognition is Lhasa speech of Ü-Tsang dialect. There is a serious lack of research on speech recognition for Kham dialect and Amdo dialect. In order to promote the communication of people in different dialect areas, and speed up the research of other Tibetan dialects’ speech recognition, this project aims to build a Tibetan multi-dialect multitask speech recognition system, which integrates multi-dialect speech content recognition, dialect identity recognition and speaker recognition into a unified network architecture, making full use of the implicit information between each other to improve the recognition performance of each individual task and low-resource Tibetan dialects. This project focuses on two key technologies of multi-dialect speech feature representation and the multitask relationship learning. The three key scientific problems need to be solved in this project: 1) deep probabilistic graphical model based data representation method; 2) deep probabilistic graphical model learning method using end-to-end training; 3) multitask relatedness learning methods..The specific research contents and schemes are as follows:1) research on deep temporal regression Bayesian network based Tibetan multi-dialect speech feature representation; 2) research on soft parameter sharing technology based multitask sharing relationship learning; 3) research on multitask loss function based multitask mutual benefit relationship learning.
期刊论文列表
专著列表
科研奖励列表
会议论文列表
专利列表
DOI:--
发表时间:2020
期刊:International Journal of Computational Science and Engineering
影响因子:2
作者:Yue Zhao;Xiaona Xu;Jianjian Yue;Wei Song;Xiali Li;Licheng Wu;Qiang Ji
通讯作者:Qiang Ji
Latent Regression Bayesian Network for Speech Representation
用于语音表示的潜在回归贝叶斯网络
DOI:10.3390/electronics12153342
发表时间:2023
期刊:Electronics
影响因子:2.9
作者:Liang Xu;Yue Zhao;Xiaona Xu;Yigang Liu;Qiang Ji
通讯作者:Qiang Ji
Near-Optimal Active Learning for Multilingual Grapheme-to-Phoneme Conversion
多语言字素到音素转换的近乎最优主动学习
DOI:--
发表时间:2023
期刊:Applied Sciences
影响因子:--
作者:Dezhi Cao;Yue Zhao;Licheng Wu
通讯作者:Licheng Wu
DOI:10.1109/access.2020.3024218
发表时间:2020
期刊:IEEE Access
影响因子:3.9
作者:Hui Wang;Fei Gao;Yue Zhao;Licheng Wu
通讯作者:Hui Wang;Fei Gao;Yue Zhao;Licheng Wu
DOI:10.3390/e24101429
发表时间:2022-10-08
期刊:Entropy (Basel, Switzerland)
影响因子:--
作者:
通讯作者:
国内基金
海外基金
