Deep structured speech models

深层结构化语音模型

基本信息

  • 批准号:
    RGPIN-2021-02652
  • 负责人:
  • 金额:
    $ 2.77万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

State-of-the-art speech recognition models are either end-to-end models or hybrid models. End-to-end models are entirely based on deep neural networks (DNN). Hybrid models combine an underlying structure of weighted finite-state automata (WFSA) with a surface layer of deep neural networks. When trained on large enough amounts of annotated recordings similar to the test data, end-to-end models can outperform hybrid models. For mismatched or smaller training data, hybrid models are often a better choice since they can use prior linguistic knowledge encoded in their structure to better generalize and avoid overfitting. However, in hybrid models, finite-state automata and deep neural components are not integrated; they are trained independently with different objectives and algorithms. As a result, a performance gap remains even when enough training data is available. In practice, hybrid models are complicated to train, require specialized coding, and cannot be easily integrated with common deep learning frameworks such as Pytorch or Tensorflow. Thus their implementations lag behind the latest developments in general deep learning. Recent proposals for differentiable automata suggest how they could be integrated with deep neural networks into a single model with an end-to-end differentiable loss, in a way that is efficient in time and space to be scalable enough for speech problems. This opens up several areas of research which are yet almost unexplored for speech modelling. I propose to work on three promising lines of investigation. 1- New architectures with joint training of WFSA and DNN parameters may bridge the performance gap when enough training data is available. 2 - New loss functions closer to actual sequence-based objective functions such as word or phoneme error rate should yield better performance than approximate losses used in deep neural only models. 3- Partial supervision afforded by structured generative models can significantly reduce the need for transcribed data. Although applicable to a wide range of problems, integrated models will have their largest impact where underlying structure is complex and annotated data is scarce. Thus I intend to apply them first to problems I encountered in my recent work on Indigenous languages spoken in Canada, ranging from subword analysis to speech recognition. Making speech technology accessible to these languages will benefit their transcription, preservation, and revitalization. This research addresses key limitations of current deep learning models in speech recognition, but potentially has broader applications in natural language processing, machine translation, or genomics, where sequence-to-sequence and segmentation problems are common. Because it combines the solid mathematical framework of probabilistic models with the practical, scalable methods of deep learning, this approach is well positioned to generate advances in knowledge while providing a rich learning environment.
最先进的语音识别模型要么是端到端模型,要么是混合模型。端到端模型完全基于深度神经网络(DNN)。混合模型结合了加权有限状态自动机(WFSA)的底层结构和深层神经网络的表层。当在足够多的类似于测试数据的带注释的录音上进行训练时,端到端模型可以优于混合模型。对于不匹配或较小的训练数据,混合模型通常是更好的选择,因为它们可以使用编码在其结构中的先前语言知识来更好地泛化和避免过拟合。然而,在混合模型中,有限状态自动机和深层神经组件并不是集成在一起的,它们是用不同的目标和算法独立训练的。因此,即使有足够的训练数据可用,性能差距仍然存在。在实践中,混合模型训练起来很复杂,需要专门的编码,而且不能很容易地与常见的深度学习框架(如Pytorch或TensorFlow)集成。因此,它们的实现落后于一般深度学习的最新发展。最近关于可微自动机的提议表明,它们如何与深度神经网络集成到一个具有端到端可微损失的单一模型中,以一种在时间和空间上高效的方式来足够可扩展地处理语音问题。这打开了几个研究领域,这些领域在语音建模方面几乎还没有被探索过。我建议致力于三条有希望的调查路线。1-当有足够的训练数据时,具有WFSA和DNN参数的联合训练的新体系结构可能会弥补性能差距。2-新的损失函数更接近实际的基于序列的目标函数,例如单词或音素错误率,应该比深度神经模型中使用的近似损失产生更好的性能。结构化生成模型提供的部分监督可以显著减少对转录数据的需求。尽管集成模型适用于广泛的问题,但在底层结构复杂和注释数据稀缺的情况下,集成模型将产生最大影响。因此,我打算首先将它们应用到我最近关于加拿大土著语言的工作中遇到的问题,从子词分析到语音识别。使这些语言能够使用语音技术将有利于它们的转录、保存和振兴。这项研究解决了当前深度学习模型在语音识别中的关键局限性,但在自然语言处理、机器翻译或基因组学中具有更广泛的应用前景,在这些领域,序列到序列和分段问题很常见。由于它结合了可靠的概率模型的数学框架和实用的、可扩展的深度学习方法,这种方法处于有利地位,可以在提供丰富的学习环境的同时产生知识进步。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Boulianne, Gilles其他文献

Joint factor analysis versus eigenchannels in speaker recognition
Speaker and session variability in GMM-based speaker verification

Boulianne, Gilles的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Boulianne, Gilles', 18)}}的其他基金

Deep structured speech models
深层结构化语音模型
  • 批准号:
    DGECR-2021-00092
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Launch Supplement
Deep structured speech models
深层结构化语音模型
  • 批准号:
    RGPIN-2021-02652
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Grants Program - Individual

相似海外基金

HEAR-HEARTFELT (Identifying the risk of Hospitalizations or Emergency depARtment visits for patients with HEART Failure in managed long-term care through vErbaL communicaTion)
倾听心声(通过口头交流确定长期管理护理中的心力衰竭患者住院或急诊就诊的风险)
  • 批准号:
    10723292
  • 财政年份:
    2023
  • 资助金额:
    $ 2.77万
  • 项目类别:
Crowd-Powered Machine Learning to Diagnose ASD and ADHD in Adolescents from Digital Social Interactions
众包机器学习通过数字社交互动诊断青少年 ASD 和 ADHD
  • 批准号:
    10682965
  • 财政年份:
    2023
  • 资助金额:
    $ 2.77万
  • 项目类别:
RI Core: Medium: Structured variability in vocal tract articulation dynamics in speech
RI 核心:中:言语中声道发音动态的结构变异
  • 批准号:
    2311676
  • 财政年份:
    2023
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Standard Grant
Deep structured speech models
深层结构化语音模型
  • 批准号:
    RGPIN-2021-02652
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Grants Program - Individual
Deep structured speech models
深层结构化语音模型
  • 批准号:
    DGECR-2021-00092
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Launch Supplement
ROR Plus: Randomized Trial of a Structured Approach to Parent-Infant Reading (SHARE/STEP) and Limiting Screen Time Delivered via a Multimedia Intervention During Pediatric Well-Visits
ROR Plus:结构化亲子阅读方法(分享/步骤)和通过儿科健康访问期间多媒体干预限制屏幕时间的随机试验
  • 批准号:
    10217626
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
ROR Plus: Randomized Trial of a Structured Approach to Parent-Infant Reading (SHARE/STEP) and Limiting Screen Time Delivered via a Multimedia Intervention During Pediatric Well-Visits
ROR Plus:结构化亲子阅读方法(分享/步骤)和通过儿科健康访问期间多媒体干预限制屏幕时间的随机试验
  • 批准号:
    10494068
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
Leveraging deep learning and clinical notes for surveillance and prediction of intentional self-harm and suicide
利用深度学习和临床记录来监测和预测故意自残和自杀
  • 批准号:
    10330113
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
Core G: Biomarker Core
核心 G:生物标志物核心
  • 批准号:
    9922106
  • 财政年份:
    2020
  • 资助金额:
    $ 2.77万
  • 项目类别:
Cross-sectional and longitudinal predictors of distressing psychotic-like experiences in childhood and adolescence
童年和青春期痛苦的精神病样经历的横截面和纵向预测因素
  • 批准号:
    10054751
  • 财政年份:
    2020
  • 资助金额:
    $ 2.77万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了