SpeechWave
语音波
基本信息
- 批准号:EP/R012067/1
- 负责人:
- 金额:$ 93.54万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2018
- 资助国家:英国
- 起止时间:2018 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Speech recognition has made major advances in the past few years. Error rates have been reduced by more than half on standard large-scale tasks such as Switchboard (conversational telephone speech), MGB (multi-genre broadcast recordings), and AMI (multiparty meetings). These research advances have quickly translated into commercial products and services: speech-based applications and assistants such as such as Apple's Siri, Amazon's Alexa, and Google voice search have become part of daily life for many people. Underpinning the improved accuracy of these systems are advances in acoustic modelling, with deep learning having had an outstanding influence on the field.However, speech recognition is still very fragile: it has been successfully deployed in specific acoustic conditions and task domains - for instance, voice search on a smart phone - and degrades severely when the conditions change. This is because speech recognition is highly vulnerable to additive noise caused by multiple acoustic sources, and to reverberation. In both cases, acoustic conditions which have essentially no effect on the accuracy of human speech recognition can have a catastrophic impact on the accuracy of a state-of-the-art automatic system. A reason for such brittleness is the lack of a strong model for acoustic robustness. Robustness is usually addressed through multi-condition training, in which the training set comprises speech examples across the many required acoustic conditions, often constructed by mixing speech with noise at different signal-to-noise ratios. For a limited set of acoustic conditions these techniques can work well, but they are inefficient and do not offer a model of multiple acoustic sources, nor do they factorise the causes of variability. For instance, the best reported speech recognition results for transcription of the AMI corpus test set using single distant microphone recordings is about 38% word error rate (for non-overlapped speech), compared to about 5% error rate for human listeners. In the past few years there have been several approaches that have tried to address these problems: explicitly learning to separate multiple sources; factorised acoustic models using auxiliary features; and learned spectral masks for multi-channel beam-forming. SpeechWave will pursue an alternative approach to robust speech recognition: The development of acoustic models which learn directly from the speech waveform. The motivation to operate directly in the waveform domain arises from the insight that redundancy in speech signals is highly likely to be a key factor in the robustness of human speech recognition. Current approaches to speech recognition separate non-adaptive signal processing components from the adaptive acoustic model, and in so doing lose the redundancy - and, typically, information such as the phase - present in the speech waveform. Waveform models are particularly exciting as they combine the previously distinct signal processing and acoustic modelling components.In SpeechWave, we shall explore novel waveform-based convolutional and recurrent networks which combine speech enhancement and recognition in a factorised way, and approaches based on kernel methods and on recent research advances in sparse signal processing and speech perception. Our research will be evaluated on standard large-scale speech corpora. In addition we shall participate in, and organise, international challenges to assess the performance of speech recognition technologies. We shall also validate our technologies in practice, in the context of the speech recognition challenges faced by our project partners BBC, Emotech, Quorate, and SRI.
语音识别在过去几年中取得了重大进展。在标准的大规模任务中,错误率已经减少了一半以上,例如Switchboard(对话电话语音),MGB(多类型广播录音)和AMI(多方会议)。这些研究进展已迅速转化为商业产品和服务:基于语音的应用程序和助手,如苹果的Siri,亚马逊的Alexa和谷歌语音搜索已成为许多人日常生活的一部分。声学建模的进步是这些系统准确性提高的基础,深度学习在该领域产生了突出影响。然而,语音识别仍然非常脆弱:它已经成功地部署在特定的声学条件和任务领域-例如智能手机上的语音搜索-并在条件变化时严重退化。这是因为语音识别非常容易受到由多个声源引起的加性噪声和混响的影响。在这两种情况下,对人类语音识别的准确性基本上没有影响的声学条件可能对最先进的自动系统的准确性产生灾难性的影响。这种脆弱性的原因是缺乏一个强大的声学鲁棒性模型。鲁棒性通常通过多条件训练来解决,其中训练集包括跨越许多所需声学条件的语音示例,通常通过将语音与不同信噪比的噪声混合来构建。对于一组有限的声学条件,这些技术可以很好地工作,但它们是低效的,并且不提供多个声源的模型,也不分解可变性的原因。例如,使用单个远距离麦克风记录的AMI语料库测试集的转录的最佳报告的语音识别结果是约38%的单词错误率(对于非重叠语音),相比之下,人类听众的错误率为约5%。在过去的几年里,已经有几种方法试图解决这些问题:明确学习分离多个源;使用辅助特征的分解声学模型;以及学习多通道波束形成的频谱掩模。SpeechWave将寻求一种强大的语音识别方法:开发直接从语音波形学习的声学模型。直接在波形域中操作的动机源于这样的认识,即语音信号中的冗余很可能是人类语音识别鲁棒性的关键因素。当前的语音识别方法将非自适应信号处理组件与自适应声学模型分离,并且这样做会丢失语音波形中存在的冗余,并且通常丢失诸如相位的信息。波形模型是特别令人兴奋的,因为它们结合了联合收割机以前不同的信号处理和声学建模组件。在SpeechWave中,我们将探索新的基于波形的卷积和递归网络,它以因子分解的方式将联合收割机语音增强和识别结合起来,并基于核方法和稀疏信号处理和语音感知的最新研究进展。我们的研究将在标准的大规模语音语料库上进行评估。此外,我们将参与和组织国际挑战赛,以评估语音识别技术的性能。我们还将在实践中验证我们的技术,在我们的项目合作伙伴BBC,Quorate和SRI面临的语音识别挑战的背景下。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
- DOI:10.1109/taslp.2023.3237167
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Erfan Loweimi;Zhengjun Yue;P. Bell;S. Renals;Z. Cvetković
- 通讯作者:Erfan Loweimi;Zhengjun Yue;P. Bell;S. Renals;Z. Cvetković
Speech Acoustic Modelling Using Raw Source and Filter Components
- DOI:10.21437/interspeech.2021-53
- 发表时间:2021-08
- 期刊:
- 影响因子:0
- 作者:Erfan Loweimi;Z. Cvetković;P. Bell;S. Renals
- 通讯作者:Erfan Loweimi;Z. Cvetković;P. Bell;S. Renals
Towards a Unified Analysis of Random Fourier Features
- DOI:
- 发表时间:2018-06
- 期刊:
- 影响因子:0
- 作者:Zhu Li;Jean-Francois Ton;Dino Oglic;D. Sejdinovic
- 通讯作者:Zhu Li;Jean-Francois Ton;Dino Oglic;D. Sejdinovic
Speech Acoustic Modelling from Raw Phase Spectrum
- DOI:10.1109/icassp39728.2021.9413727
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Erfan Loweimi;Z. Cvetković;P. Bell;S. Renals
- 通讯作者:Erfan Loweimi;Z. Cvetković;P. Bell;S. Renals
Phonetic Error Analysis Beyond Phone Error Rate
- DOI:10.1109/taslp.2023.3313417
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Erfan Loweimi;Andrea Carmantini;Peter Bell;Steve Renals;Z. Cvetkovic
- 通讯作者:Erfan Loweimi;Andrea Carmantini;Peter Bell;Steve Renals;Z. Cvetkovic
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zoran Cvetkovic其他文献
Overcomplete expansions and robustness
过度完备的扩展和鲁棒性
- DOI:
10.1109/tfsa.1996.547479 - 发表时间:
1996 - 期刊:
- 影响因子:0
- 作者:
Zoran Cvetkovic;Martin Vetterli - 通讯作者:
Martin Vetterli
Zoran Cvetkovic的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zoran Cvetkovic', 18)}}的其他基金
Challenges in Immersive Audio Technology
沉浸式音频技术的挑战
- 批准号:
EP/X032981/1 - 财政年份:2024
- 资助金额:
$ 93.54万 - 项目类别:
Research Grant
Visits to University of California, Berkeley, Stanford University, and SRI International
访问加州大学伯克利分校、斯坦福大学、SRI International
- 批准号:
EP/K034626/1 - 财政年份:2013
- 资助金额:
$ 93.54万 - 项目类别:
Research Grant
Perceptual Sound Field Reconstruction and Coherent Emulation
感知声场重建和相干仿真
- 批准号:
EP/F001142/1 - 财政年份:2008
- 资助金额:
$ 93.54万 - 项目类别:
Research Grant
Robust Syllable Recognition in the Acousic-Waveform Domain
声音波形域中的鲁棒音节识别
- 批准号:
EP/D053005/1 - 财政年份:2006
- 资助金额:
$ 93.54万 - 项目类别:
Research Grant
相似国自然基金
车载中央计算平台软件框架及泊车功能研发与产业化应用
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
低空飞行器及其空域的设计与监管平台软件
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于金刚石高效散热封装的高功率高压GaN器件研发与产业化
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
新能源智能汽车高性能精密零部件装备研制与产业化
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
高效智能化超低风速风电机组关键技术及装备研制
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
绿氢制储加注关键技术与装备研发
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
复杂电子产品超精密加工及检测关键技术研究与应用
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
抗消化性溃疡新药研发
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于合成生物学的动物底盘品种优化及中试应用研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
1.1 类中药创新药“鱼酱排毒合剂”开发
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
CDT year 1 so TBC in Oct 2024
CDT 第 1 年,预计 2024 年 10 月
- 批准号:
2879865 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
$ 93.54万 - 项目类别:
Studentship