Explore new approaches to distant microphone speech recognition that combine information across multiple microphone array devices

探索结合多个麦克风阵列设备信息的远程麦克风语音识别新方法

基本信息

  • 批准号:
    2112956
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Studentship
  • 财政年份:
    2018
  • 资助国家:
    英国
  • 起止时间:
    2018 至 无数据
  • 项目状态:
    已结题

项目摘要

It is becoming common for speech to be used to communicate with digital devices. In the last few years, devices such as Google Home and Amazon Alexa have arrived in millions of homes. Getting speech recognition to work well in home environments is very challenging. The home is often a very noisy place, for example, if the device is placed in the kitchen, the washing machine may be running and people could be talking in the background. Also, the person speaking is often several metres away from the device (the 'distant microphone' scenario). This is a problem because the speech signal may easily be dominated by other sound sources which may be closer to the microphones.This project will develop novel solutions to the distant microphone speech recognition problem. It will be conducted within the Speech and Hearing Research Group under the supervision of Prof. Jon Barker. It will take advantage of a new data set ('CHiME-5') that has been acquired by Prof. Barker's research team with support from Google (http://spandh.dcs.shef.ac.uk/chime_challenge/). CHiME-5 is a set of recordings of parties taking place in real homes. The data is captured with multiple recording devices, each of which captures video and four synchronised microphone channels. This unique data provides an opportunity to address new research questions lying outside the scope of current speech technology.Research questionsTwo key research directions will be prioritised,Visually-driven beamforming algorithms: The most successful approach to distant microphone speech recognition is to use multiple microphones and apply techniques that enhance the signals coming from some directions while suppressing the signals coming from others. This requires detecting and tracking which directions are important. The project will look at how this information might be extracted from the video signal (e.g.,using person tracking techniques.)Speech recognition with multiple microphone arrays: The 'beamforming' described above requires synchronised microphones with known positions with respect to each other. It can therefore be easily applied across multiple devices whose relative location is uncertain (e.g., combining outputs of two Google Homes in the same room). The CHiME-5 data has up to six devices within the same acoustic area and therefore provides a unique opportunity to find new solutions to this problem. A starting place would be to explore techniques for weighting and fusing the outputs of independent recognition systems.MethodologySpeech recognition systems have evolved into hugely complex pieces of software. Fortunately, speech research has been effectively open-sourced with the community now focused around the Kaldi speech recognition toolkit. The CHiME-5 data set will be published with an open-source Kaldi 'baseline' that will represent a state-of-the-art system for single device audio-only system. It will also provide a set of 'rules' for training systems that allowsfair comparison between research groups. This will provide a robust reference against which to compare the performance of audio-visual and multi-device extensions.The research will require a mixture of methods to be employed: video face and person tracking and beamforming algorithms; speech recognition fusion strategies, and signal quality assessment techniques. In addition, it will be necessary to have a fuller understanding of state-of-the-art techniques employed in the baseline recogniser, including convolutional neural networks, i-vector analysis, speaker-adaptive training, neural network language modelling, etc. Fortunately there are many excellent textbooks, tutorial papers and review papers that coverthese areas.CHiME-5 is a complex 'conversational' speech recognition task. Training and testing the recognition systems will be computationally demanding. Modern speech recognisers use 'deep learning' which requires specialist GPU hardware.
语音被用于与数字设备通信正变得越来越普遍。在过去的几年里,谷歌主页和亚马逊Alexa等设备已经进入了数百万个家庭。让语音识别在家庭环境中很好地工作是非常具有挑战性的。家里通常是一个非常嘈杂的地方,例如,如果设备放在厨房里,洗衣机可能正在运行,人们可能在后台交谈。此外,说话的人通常距离设备有几米远(“远距离麦克风”场景)。这是一个问题,因为语音信号很容易被其他离麦克风更近的声源所支配。这个项目将为远程麦克风语音识别问题开发新的解决方案。它将在乔恩·巴克教授的监督下,在言语和听力研究小组内进行。它将利用巴克教授的研究团队在谷歌(http://spandh.dcs.shef.ac.uk/chime_challenge/).的支持下获得的一个新的数据集(CHIME-5)CHINE-5是一组在真实家庭中举行的派对的录音。这些数据是用多个记录设备捕获的,每个设备都捕获视频和四个同步的麦克风通道。这些独特的数据为解决当前语音技术范围之外的新研究问题提供了机会。研究问题两个关键研究方向将优先考虑,视觉驱动的波束形成算法:远程麦克风语音识别最成功的方法是使用多个麦克风,并应用技术来增强来自某些方向的信号,同时抑制来自其他方向的信号。这需要检测和跟踪哪些方向是重要的。该项目将研究如何从视频信号中提取这些信息(例如,使用人物跟踪技术)。使用多个麦克风阵列进行语音识别:上面描述的波束成形需要具有彼此已知位置的同步麦克风。因此,它可以很容易地应用于相对位置不确定的多个设备(例如,将同一房间中两个Google Home的输出组合在一起)。CHINE-5数据在同一声学区域内有多达六个设备,因此为找到解决该问题的新方案提供了独特的机会。首先是探索对独立识别系统的输出进行加权和融合的技术。方法论语音识别系统已经演变成极其复杂的软件。幸运的是,语音研究已经被有效地开源,社区现在专注于Kaldi语音识别工具包。CHINE-5数据集将与开源的Kaldi‘Baseline’一起发布,这将代表单设备纯音频系统的最先进系统。它还将为培训系统提供一套规则,允许研究小组之间进行公平比较。这将提供一个可靠的参考,用于比较视听和多设备扩展的性能。这项研究将需要使用多种方法:视频人脸和人物跟踪和波束形成算法;语音识别融合策略,以及信号质量评估技术。此外,有必要对基线识别器中采用的最先进技术有更全面的了解,包括卷积神经网络、I向量分析、说话人自适应训练、神经网络语言建模等。幸运的是,有许多优秀的教科书、教程和评论论文涵盖了这些领域。CHIME-5是一项复杂的对话式语音识别任务。训练和测试识别系统将需要大量的计算。现代语音识别器使用深度学习,这需要专业的GPU硬件。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

其他文献

吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
生命分子工学・海洋生命工学研究室
生物分子工程/海洋生物技术实验室
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:

的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    --
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
  • 批准号:
    2908917
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
  • 批准号:
    2879438
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
  • 批准号:
    2890513
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
CDT year 1 so TBC in Oct 2024
CDT 第 1 年,预计 2024 年 10 月
  • 批准号:
    2879865
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
  • 批准号:
    2876993
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship

相似国自然基金

脊髓新鉴定SNAPR神经元相关环路介导SCS电刺激抑制恶性瘙痒
  • 批准号:
    82371478
  • 批准年份:
    2023
  • 资助金额:
    48.00 万元
  • 项目类别:
    面上项目
tau轻子衰变与新物理模型唯象研究
  • 批准号:
    11005033
  • 批准年份:
    2010
  • 资助金额:
    18.0 万元
  • 项目类别:
    青年科学基金项目
HIV gp41的NHR区新靶点的确证及高效干预
  • 批准号:
    81072676
  • 批准年份:
    2010
  • 资助金额:
    33.0 万元
  • 项目类别:
    面上项目
强子对撞机上新物理信号的多轻子末态研究
  • 批准号:
    10675110
  • 批准年份:
    2006
  • 资助金额:
    36.0 万元
  • 项目类别:
    面上项目

相似海外基金

New approaches to training deep probabilistic models
训练深度概率模型的新方法
  • 批准号:
    2613115
  • 财政年份:
    2025
  • 资助金额:
    --
  • 项目类别:
    Studentship
C-NEWTRAL: smart CompreheNsive training to mainstrEam neW approaches for climaTe-neutRal cities through citizen engAgement and decision-making support
C-NEWTRAL:智能综合培训,通过公民参与和决策支持将气候中和城市的新方法纳入主流
  • 批准号:
    EP/Y032640/1
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Research Grant
PINK - Provision of Integrated Computational Approaches for Addressing New Markets Goals for the Introduction of Safe-and-Sustainable-by-Design Chemicals and Materials
PINK - 提供综合计算方法来解决引入安全和可持续设计化学品和材料的新市场目标
  • 批准号:
    10097944
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    EU-Funded
NAfANE: New Approaches for Approximate Nash Equilibria
NAfANE:近似纳什均衡的新方法
  • 批准号:
    EP/X039862/1
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Research Grant
Predicting future biodiversity of ecosystem service providers in Japan using new approaches to quantify and reduce uncertainty
使用量化和减少不确定性的新方法来预测日本生态系统服务提供者的未来生物多样性
  • 批准号:
    24K09176
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Collaborative Research: New Approaches to Predicting Long-time Behavior of Polymer Glasses
合作研究:预测聚合物玻璃长期行为的新方法
  • 批准号:
    2330759
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
EAGER: IMPRESS-U: Developing new approaches and structural materials to rebuild damaged Ukrainian infrastructure with environmental sustainability considerations
EAGER:IMPRESS-U:开发新方法和结构材料,在考虑环境可持续性的情况下重建受损的乌克兰基础设施
  • 批准号:
    2412196
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
A History of Storms: New Approaches to Climate Fiction and Climate Literacy
风暴史:气候小说和气候素养的新方法
  • 批准号:
    AH/Y000196/1
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Fellowship
New approaches measuring Australia’s creative workforce: Beyond the Census
衡量澳大利亚创意劳动力的新方法:超越人口普查
  • 批准号:
    LP230100198
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Linkage Projects
New mathematical approaches to learn the equations of life from noisy data
从噪声数据中学习生命方程的新数学方法
  • 批准号:
    DP230100025
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Discovery Projects
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了