Robust Syllable Recognition in the Acousic-Waveform Domain

声音波形域中的鲁棒音节识别

基本信息

  • 批准号:
    EP/D053005/1
  • 负责人:
  • 金额:
    $ 26.44万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2006
  • 资助国家:
    英国
  • 起止时间:
    2006 至 无数据
  • 项目状态:
    已结题

项目摘要

This proposal is concerned with robust classification/recognition of speech units (phonemes and consonant-vowel syllables) in the domain of acoustic waveforms. The motivation for this research comes from the idea that speech units should be much better separated in the high-dimensional spaces formed by acoustic waveforms than in the smaller representation spaces which are used in state-of-the-art speech recognition systems and which involve significant compression and dimension reduction. Hence, recognition/classification in the acoustic waveform domain should exhibit a higher level of robustness to additive noise than classification in low-dimensional feature spaces.In the first phase of the project we will investigate classification of speech units in the acoustic waveform domain under severe noise conditions, around 0dB signal-to-noise ratio and below, while in the second phase we will study techniques which would make classification robust also to linear filtering. The particular tasks that will be tackled in the first phase can be summarized as follows:1. Study the detailed structure of the sets of acoustic waveforms of individual speech units; in particular their intrinsic dimensions, and the existence of possible nonlinear surfaces on which the data are concentrated.2. Guided by the findings from item 1 above, estimate statistical models of the distribution of speech units in the acoustic waveform domain. We will then design and systematically assess so-called generative classifiers, whose defining property is that they are based on such statistical models.3. Investigate classification of speech units in the acoustic waveform domain using discriminative classification techniques (artificial neural networks, support vector machines, and relevance vector machines). These can be a useful alternative to generative techniques because they focus directly on the classification problem without building explicit models of waveform distributions for each speech unit.4. Construct classifiers by grouping speech units hierarchically. Top-level classifiers will be constructed to distinguish between a small of groups of similar speech units, followed by classifiers separating groups into subgroups and so on. Different methods for defining subgroups will be explored, including confusion matrices of the classifiers from item 3, appropriate distance measures between the statistical models obtained in item 2, and possibly perceptual experiments.A potential argument against our approach is that classification in the acoustic waveform domain will break down in the presence of linear filtering. However, this can be avoided by considering narrow-band signals: for these, the effect of linear filtering is approximately equivalent to amplitude scaling and time delay. In the second phase of the project, we will therefore consider speech classification using narrow-band components of acoustic waveforms. For classification of signals in individual sub-bands, the techniques investigated in the first phase of the project will be considered. A new issue is then how to combine the results of sub-band classifiers to minimize the overall classification error. Here recently developed machine learning techniques will be used, as specified in the case for support.As explained, individual sub-band classifiers should be robust to linear filtering because the latter does not significantly alter the shape of narrow-band signals. On the other hand, the dimension of the spaces of sub-band waveforms will be still high enough to facilitate classification robust to additive noise. Hence, the overall scheme is expected to be robust to both additive noise and linear fitering.
这一建议是关于在声学波形领域的语音单位(音素和辅音-元音音节)的鲁棒分类/识别。这项研究的动机来自于这样一种想法,即语音单元应该在由声波形成的高维空间中更好地分离,而不是在最先进的语音识别系统中使用的较小的表示空间中,这涉及到显著的压缩和降维。因此,在声波波形域中的识别/分类应该比在低维特征空间中的分类对加性噪声表现出更高的鲁棒性。在项目的第一阶段,我们将研究在严重噪声条件下(约0dB信噪比及以下)声学波形域中语音单元的分类,而在第二阶段,我们将研究使分类对线性滤波也具有鲁棒性的技术。第一阶段将处理的具体任务可概括如下:研究单个语音单元的声学波形集的详细结构;特别是它们的固有维数,以及可能存在的数据集中的非线性曲面。根据上面第1项的结果,估计语音单元在声学波形域中分布的统计模型。然后,我们将设计和系统地评估所谓的生成分类器,其定义属性是它们基于这样的统计模型。使用判别分类技术(人工神经网络、支持向量机和相关向量机)研究声学波形域的语音单位分类。这些可以是生成技术的有用替代方案,因为它们直接关注分类问题,而无需为每个语音单元构建波形分布的显式模型。通过对语音单元分层分组来构建分类器。将构建顶级分类器来区分一小部分相似的语音单位,然后构建分类器将组分成子组,以此类推。将探讨定义子组的不同方法,包括项目3中分类器的混淆矩阵,项目2中获得的统计模型之间的适当距离度量,以及可能的感知实验。反对我们方法的一个潜在的论点是,在存在线性滤波的情况下,声学波形域中的分类将被破坏。然而,这可以通过考虑窄带信号来避免:对于窄带信号,线性滤波的效果大致相当于幅度缩放和时间延迟。因此,在该项目的第二阶段,我们将考虑使用声波波形的窄带分量进行语音分类。对于单个子波段的信号分类,将考虑在项目第一阶段研究的技术。一个新的问题是如何结合子带分类器的结果,使整体分类误差最小化。这里将使用最近开发的机器学习技术,如在案例中指定进行支持。如上所述,单个子带分类器应该对线性滤波具有鲁棒性,因为线性滤波不会显著改变窄带信号的形状。另一方面,子带波形的空间维数仍然足够高,有利于分类对加性噪声的鲁棒性。因此,整个方案有望对加性噪声和线性滤波都具有鲁棒性。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Combined Features and Kernel Design for Noise Robust Phoneme Classification Using Support Vector Machines
使用支持向量机进行噪声稳健音素分类的组合特征和内核设计
Towards robust phoneme classification: Augmentation of PLP models with acoustic waveforms
迈向稳健的音素分类:用声学波形增强 PLP 模型
Tuning support vector machines for robust phoneme classification with acoustic waveforms
调整支持向量机以利用声学波形进行稳健的音素分类
Robust phoneme classification: exploiting the adaptability of acoustic waveform models
鲁棒音素分类:利用声学波形模型的适应性
Combined PLP - acoustic waveform classification for robust phoneme recognition using support vector machines
组合 PLP - 使用支持向量机进行稳健音素识别的声学波形分类
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zoran Cvetkovic其他文献

Overcomplete expansions and robustness
过度完备的扩展和鲁棒性

Zoran Cvetkovic的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zoran Cvetkovic', 18)}}的其他基金

Challenges in Immersive Audio Technology
沉浸式音频技术的挑战
  • 批准号:
    EP/X032981/1
  • 财政年份:
    2024
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Research Grant
SpeechWave
语音波
  • 批准号:
    EP/R012067/1
  • 财政年份:
    2018
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Research Grant
Visits to University of California, Berkeley, Stanford University, and SRI International
访问加州大学伯克利分校、斯坦福大学、SRI International
  • 批准号:
    EP/K034626/1
  • 财政年份:
    2013
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Research Grant
Perceptual Sound Field Reconstruction and Coherent Emulation
感知声场重建和相干仿真
  • 批准号:
    EP/F001142/1
  • 财政年份:
    2008
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Research Grant

相似海外基金

Elucidation of the Syllable Formation Principles of Japanese Sign Language Using Machine Learning Algorithms
利用机器学习算法阐明日语手语的音节形成原理
  • 批准号:
    23H00626
  • 财政年份:
    2023
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Well-formedness Condition of the Japanese Sign Language syllable
日语手语音节的格式良好状况
  • 批准号:
    18H00671
  • 财政年份:
    2018
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
The phonetic/phonological properties of varieties in English and a new syllable theory
英语变体的语音/音系特性和新音节理论
  • 批准号:
    18K00673
  • 财政年份:
    2018
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
An attempt to establish acoustic phonology: Temporal changes of spectra and syllable formation in English speech
建立声学音系学的尝试:英语语音中频谱和音节形成的时间变化
  • 批准号:
    17H06197
  • 财政年份:
    2017
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Pioneering)
Encoding of syllable sequence by context dependent response modulation in auditory neurons
通过听觉神经元中上下文相关的响应调制对音节序列进行编码
  • 批准号:
    17K07067
  • 财政年份:
    2017
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Doctoral Dissertation Research: Yazgulyami Syllable Structure [yah]
博士论文研究:Yazgulyami 音节结构 [yah]
  • 批准号:
    1500802
  • 财政年份:
    2015
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Standard Grant
Effects of phonotactic constraints and syllable duration on L2 speech processing
音位限制和音节持续时间对 L2 语音处理的影响
  • 批准号:
    15K02757
  • 财政年份:
    2015
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Are phonemes perceptually real?: An examination of the parallel phoneme-syllable processing model
音素在感知上是真实的吗?:并行音素音节处理模型的检验
  • 批准号:
    15K02493
  • 财政年份:
    2015
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The history of Japanese syllable structure using the Chinese-Japanese glosses
使用汉日注释的日语音节结构的历史
  • 批准号:
    26370528
  • 财政年份:
    2014
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
An empirical study of Japanese syllabic structure within a restricitve framework of Syllable Theory based on a dependency/licensing mechanism
基于依赖/许可机制的音节理论限制框架下的日语音节结构实证研究
  • 批准号:
    25370442
  • 财政年份:
    2013
  • 资助金额:
    $ 26.44万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了