Deep structured speech models

深层结构化语音模型

基本信息

  • 批准号:
    RGPIN-2021-02652
  • 负责人:
  • 金额:
    $ 2.77万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2021
  • 资助国家:
    加拿大
  • 起止时间:
    2021-01-01 至 2022-12-31
  • 项目状态:
    已结题

项目摘要

State-of-the-art speech recognition models are either end-to-end models or hybrid models. End-to-end models are entirely based on deep neural networks (DNN). Hybrid models combine an underlying structure of weighted finite-state automata (WFSA) with a surface layer of deep neural networks. When trained on large enough amounts of annotated recordings similar to the test data, end-to-end models can outperform hybrid models. For mismatched or smaller training data, hybrid models are often a better choice since they can use prior linguistic knowledge encoded in their structure to better generalize and avoid overfitting. However, in hybrid models, finite-state automata and deep neural components are not integrated; they are trained independently with different objectives and algorithms. As a result, a performance gap remains even when enough training data is available. In practice, hybrid models are complicated to train, require specialized coding, and cannot be easily integrated with common deep learning frameworks such as Pytorch or Tensorflow. Thus their implementations lag behind the latest developments in general deep learning. Recent proposals for differentiable automata suggest how they could be integrated with deep neural networks into a single model with an end-to-end differentiable loss, in a way that is efficient in time and space to be scalable enough for speech problems. This opens up several areas of research which are yet almost unexplored for speech modelling. I propose to work on three promising lines of investigation. 1- New architectures with joint training of WFSA and DNN parameters may bridge the performance gap when enough training data is available. 2 - New loss functions closer to actual sequence-based objective functions such as word or phoneme error rate should yield better performance than approximate losses used in deep neural only models. 3- Partial supervision afforded by structured generative models can significantly reduce the need for transcribed data. Although applicable to a wide range of problems, integrated models will have their largest impact where underlying structure is complex and annotated data is scarce. Thus I intend to apply them first to problems I encountered in my recent work on Indigenous languages spoken in Canada, ranging from subword analysis to speech recognition. Making speech technology accessible to these languages will benefit their transcription, preservation, and revitalization. This research addresses key limitations of current deep learning models in speech recognition, but potentially has broader applications in natural language processing, machine translation, or genomics, where sequence-to-sequence and segmentation problems are common. Because it combines the solid mathematical framework of probabilistic models with the practical, scalable methods of deep learning, this approach is well positioned to generate advances in knowledge while providing a rich learning environment.
最先进的语音识别模型要么是端到端模型,要么是混合模型。端到端模型完全基于深度神经网络(DNN)。混合模型将加权有限状态自动机 (WFSA) 的底层结构与深层神经网络的表层相结合。当对足够多的类似于测试数据的带注释的记录进行训练时,端到端模型可以优于混合模型。对于不匹配或较小的训练数据,混合模型通常是更好的选择,因为它们可以使用编码在其结构中的先验语言知识来更好地概括并避免过度拟合。然而,在混合模型中,有限状态自动机和深度神经组件没有集成;他们接受不同目标和算法的独立训练。因此,即使有足够的训练数据可用,性能差距仍然存在。在实践中,混合模型训练起来很复杂,需要专门的编码,并且无法轻松地与 Pytorch 或 Tensorflow 等常见深度学习框架集成。因此,它们的实现落后于一般深度学习的最新发展。最近关于可微分自动机的提议表明,如何将它们与深度神经网络集成到具有端到端可微分损失的单个模型中,以一种在时间和空间上有效的方式,以便对于语音问题具有足够的可扩展性。这开辟了语音建模几乎尚未探索的几个研究领域。我建议开展三个有希望的研究方向。 1- 当有足够的训练数据可用时,联合训练 WFSA 和 DNN 参数的新架构可以弥补性能差距。 2 - 更接近实际基于序列的目标函数(例如单词或音素错误率)的新损失函数应该比仅深度神经模型中使用的近似损失产生更好的性能。 3-结构化生成模型提供的部分监督可以显着减少对转录数据的需求。虽然集成模型适用于广泛的问题,但在底层结构复杂且注释数据稀缺的情况下,集成模型将产生最大的影响。因此,我打算首先将它们应用于我最近在加拿大土著语言工作中遇到的问题,从子词分析到语音识别。让这些语言能够使用语音技术将有利于它们的转录、保存和复兴。这项研究解决了当前深度学习模型在语音识别中的主要局限性,但可能在自然语言处理、机器翻译或基因组学中具有更广泛的应用,这些领域的序列到序列和分割问题很常见。由于它将概率模型的坚实数学框架与实用、可扩展的深度学习方法相结合,因此这种方法能够很好地促进知识进步,同时提供丰富的学习环境。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Boulianne, Gilles其他文献

Joint factor analysis versus eigenchannels in speaker recognition
Speaker and session variability in GMM-based speaker verification

Boulianne, Gilles的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Boulianne, Gilles', 18)}}的其他基金

Deep structured speech models
深层结构化语音模型
  • 批准号:
    RGPIN-2021-02652
  • 财政年份:
    2022
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Grants Program - Individual
Deep structured speech models
深层结构化语音模型
  • 批准号:
    DGECR-2021-00092
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Launch Supplement

相似海外基金

HEAR-HEARTFELT (Identifying the risk of Hospitalizations or Emergency depARtment visits for patients with HEART Failure in managed long-term care through vErbaL communicaTion)
倾听心声(通过口头交流确定长期管理护理中的心力衰竭患者住院或急诊就诊的风险)
  • 批准号:
    10723292
  • 财政年份:
    2023
  • 资助金额:
    $ 2.77万
  • 项目类别:
Crowd-Powered Machine Learning to Diagnose ASD and ADHD in Adolescents from Digital Social Interactions
众包机器学习通过数字社交互动诊断青少年 ASD 和 ADHD
  • 批准号:
    10682965
  • 财政年份:
    2023
  • 资助金额:
    $ 2.77万
  • 项目类别:
RI Core: Medium: Structured variability in vocal tract articulation dynamics in speech
RI 核心:中:言语中声道发音动态的结构变异
  • 批准号:
    2311676
  • 财政年份:
    2023
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Standard Grant
Deep structured speech models
深层结构化语音模型
  • 批准号:
    RGPIN-2021-02652
  • 财政年份:
    2022
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Grants Program - Individual
Deep structured speech models
深层结构化语音模型
  • 批准号:
    DGECR-2021-00092
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
    Discovery Launch Supplement
ROR Plus: Randomized Trial of a Structured Approach to Parent-Infant Reading (SHARE/STEP) and Limiting Screen Time Delivered via a Multimedia Intervention During Pediatric Well-Visits
ROR Plus:结构化亲子阅读方法(分享/步骤)和通过儿科健康访问期间多媒体干预限制屏幕时间的随机试验
  • 批准号:
    10217626
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
ROR Plus: Randomized Trial of a Structured Approach to Parent-Infant Reading (SHARE/STEP) and Limiting Screen Time Delivered via a Multimedia Intervention During Pediatric Well-Visits
ROR Plus:结构化亲子阅读方法(分享/步骤)和通过儿科健康访问期间多媒体干预限制屏幕时间的随机试验
  • 批准号:
    10494068
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
Leveraging deep learning and clinical notes for surveillance and prediction of intentional self-harm and suicide
利用深度学习和临床记录来监测和预测故意自残和自杀
  • 批准号:
    10330113
  • 财政年份:
    2021
  • 资助金额:
    $ 2.77万
  • 项目类别:
Core G: Biomarker Core
核心 G:生物标志物核心
  • 批准号:
    9922106
  • 财政年份:
    2020
  • 资助金额:
    $ 2.77万
  • 项目类别:
Cross-sectional and longitudinal predictors of distressing psychotic-like experiences in childhood and adolescence
童年和青春期痛苦的精神病样经历的横截面和纵向预测因素
  • 批准号:
    10054751
  • 财政年份:
    2020
  • 资助金额:
    $ 2.77万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了