Collaborative Research: Improving Techniques of Automatic Speech Recognition and Transfer Learning using Documentary Linguistic Corpora
合作研究:利用文献语言语料库改进自动语音识别和迁移学习技术
基本信息
- 批准号:2123624
- 负责人:
- 金额:$ 19.35万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-12-01 至 2025-05-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Computational tools such as automatic speech recognition (that is, the conversion of speech to text), are increasingly used to facilitate and mediate communication. Doctors speak into their computers, which transcribe their speech into legible written summaries; online virtual assistants have become ubiquitous in support networks in a wide range of situations; and end users increasingly expect their speech to be understood, processed, and acted upon by cell phones, navigation devices, and tools such as Alexa. The creation of such mechanisms, however, is currently dependent upon a large amount of training data (speech and text) that is only available for major languages. It is quite challenging to develop speech recognition systems when only 10 hours of transcribed audio is available. One way of addressing this problem is through transfer learning, in which a speech recognizer is trained on a relatively large amount of data for one endangered language ( 50 hours of transcribed audio) is then extended to related languages for which only a small corpus of material will be developed (10 hours of transcribed audio and 90 hours of untranscribed audio). The objectives of this project are both theoretical and substantive. For the first, this project advances the development of natural language processing for low-resource languages and establishes a protocol for extending this to other related languages. Substantively, this project produces an unprecedented corpus of transcribed audio for five related languages, facilitating the comparative study of these languages by theoretical and descriptive linguists. The data and findings will be available at Linguistic Data Consortium at the University of Pennsylvania, and Sam Noble Oklahoma Museum of Natural History, University of Oklahoma.State-of-the-art automatic speech recognition (ASR) depends upon the existence of a corpus of material (audio recordings with time-coded transcriptions) and the application of artificial intelligence systems that utilize neural networks to replicate humans learning by interpreting raw data. This present project employs what is called an "end-to-end neural network." Effectively, the artificial neural network is presented with input data (the acoustic speech signal) and a prepared the end result (a transcription) and learns to achieve the same result. To accomplish this, the original corpus is divided into training (~ 80%), validation (~10%), and test (~10%) sets. For endangered language documentation the goal is not simply accuracy of the ASR system but also the reduction of human effort to achieve highly accurate time-coded transcriptions that will be archived as a permanent record of target language. The project team has already developed a highly accurate system for one phonologically difficult tonal language (character error rate 8%) and reduced the human effort required to produce an accurate time-coded transcription by 75% (from 40 hours needed by a human starting from scratch to 9 hours needed by a human proofing a transcription generated by ASR). For this project the same team will explore ASR strategies for a morphologically complex agglutinative language in the hope of achieving the same degree of accuracy and reduction in human effort. This project will also address another challenge for state-of-the-art ASR: The transfer of an effective system developed for one language to low-resource, virtually undocumented related languages. Should the project be successful it will serve as a model for similar efforts with other languages and language groups.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
诸如自动语音识别(即语音到文本的转换)之类的计算工具越来越多地用于促进和调解通信。医生对着他们的电脑说话,电脑会将他们的讲话转录成清晰的书面摘要;在线虚拟助手在各种情况下的支持网络中无处不在;最终用户越来越希望他们的讲话能被手机、导航设备和Alexa等工具理解、处理和执行。然而,这种机制的创建目前依赖于大量的训练数据(语音和文本),而这些数据仅适用于主要语言。当只有10小时的转录音频可用时,开发语音识别系统是相当具有挑战性的。解决这个问题的一种方法是通过迁移学习,其中语音识别器在一种濒危语言(50小时的转录音频)的相对大量的数据上进行训练,然后扩展到仅开发一小部分材料的相关语言(10小时的转录音频和90小时的未转录音频)。该项目的目标既是理论性的,也是实质性的。首先,该项目推动了低资源语言的自然语言处理的发展,并建立了一个协议,将其扩展到其他相关语言。实质上,该项目为五种相关语言制作了前所未有的转录音频语料库,促进了理论和描述语言学家对这些语言的比较研究。这些数据和研究结果将在宾夕法尼亚大学的语言数据联盟和俄克拉荷马州自然历史博物馆,俄克拉荷马州大学。最先进的自动语音识别(ASR)依赖于语料库的存在(带有时间编码传输的音频记录)以及人工智能系统的应用,该系统利用神经网络通过解释原始数据来复制人类的学习。本项目采用所谓的“端到端神经网络”。“有效地,人工神经网络呈现输入数据(声学语音信号)和准备好的最终结果(转录),并学习以实现相同的结果。为了实现这一点,原始语料库被分为训练集(约80%)、验证集(约10%)和测试集(约10%)。对于濒危语言文档,目标不仅仅是ASR系统的准确性,而且还减少了人工努力,以实现高度准确的时间编码翻译,并将其作为目标语言的永久记录存档。该项目团队已经为一种语音学上困难的音调语言开发了一个高度准确的系统(字符错误率为8%),并将产生准确的时间编码转录所需的人力减少了75%(从人类从头开始需要40小时到人类校对ASR生成的转录所需的9小时)。在这个项目中,同一个团队将为一种形态复杂的粘着语言探索ASR策略,希望达到同样的准确度,并减少人类的努力。该项目还将解决最先进的ASR面临的另一个挑战:将为一种语言开发的有效系统转移到低资源,几乎没有文档的相关语言。如果该项目获得成功,它将成为与其他语言和语言团体进行类似努力的典范。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
ML-SUPERB:多语言语音通用性能基准
- DOI:10.21437/interspeech.2023-1316
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Shi, Jiatong;Berrebbi, Dan;Chen, William;Hu, En-Pei;Huang, Wei-Ping;Chung, Ho-Lam;Chang, Xuankai;Li, Shang-Wen;Mohamed, Abdelrahman;Lee, Hung-yi
- 通讯作者:Lee, Hung-yi
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Shinji Watanabe其他文献
Discriminative feature transforms using differenced maximum mutual information
使用差分最大互信息进行判别性特征变换
- DOI:
10.1109/icassp.2012.6288981 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Marc Delcroix;A. Ogawa;Shinji Watanabe;T. Nakatani;Atsushi Nakamura - 通讯作者:
Atsushi Nakamura
CMU’s IWSLT 2022 Dialect Speech Translation System
CMU 的 IWSLT 2022 方言语音翻译系统
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Brian Yan;Patrick Fernandes;Siddharth Dalmia;Jiatong Shi;Yifan Peng;Dan Berrebbi;Xinyi Wang;Graham Neubig;Shinji Watanabe - 通讯作者:
Shinji Watanabe
太陽光有効利用のための分子論的物質変換化学
有效利用阳光的分子材料转化化学
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Hiroshi Tanimura;Shinji Watanabe;and Tetsu Ichitsubo;山本 雅納 - 通讯作者:
山本 雅納
SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
SPGISpeech:5, 000 小时的转录金融音频,用于完全格式化的端到端语音识别
- DOI:
10.21437/interspeech.2021-1860 - 发表时间:
2021 - 期刊:
- 影响因子:9
- 作者:
Patrick K. O’Neill;Vitaly Lavrukhin;Somshubra Majumdar;V. Noroozi;Yuekai Zhang;Oleksii Kuchaiev;Jagadeesh Balam;Yuliya Dovzhenko;Keenan Freyberg;Michael D. Shulman;Boris Ginsburg;Shinji Watanabe;G. Kucsko - 通讯作者:
G. Kucsko
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
基于E-Branchformer和多任务学习的失语症语音识别和检测新基准
- DOI:
10.48550/arxiv.2305.13331 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Jiyang Tang;William Chen;Xuankai Chang;Shinji Watanabe;B. MacWhinney - 通讯作者:
B. MacWhinney
Shinji Watanabe的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Shinji Watanabe', 18)}}的其他基金
Collaborative Research: RI: Medium: Flexible Deep Speech Synthesis through Gestural Modeling
合作研究:RI:Medium:通过手势建模进行灵活的深度语音合成
- 批准号:
2106929 - 财政年份:2021
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Improving Upper Division Physics Education and Strengthening Student Research Opportunities at 14 HSIs in California
合作研究:改善加州 14 所 HSI 的高年级物理教育并加强学生研究机会
- 批准号:
2345092 - 财政年份:2024
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
Collaborative Research: Improving Upper Division Physics Education and Strengthening Student Research Opportunities at 14 HSIs in California
合作研究:改善加州 14 所 HSI 的高年级物理教育并加强学生研究机会
- 批准号:
2345093 - 财政年份:2024
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
SBP: Collaborative Research: Improving Engagement with Professional Development Programs by Attending to Teachers' Psychosocial Experiences
SBP:协作研究:通过关注教师的社会心理体验来提高对专业发展计划的参与度
- 批准号:
2314254 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
Collaborative Research: Improving Worker Safety by Understanding Risk Compensation as a Latent Precursor of At-risk Decisions
合作研究:通过了解风险补偿作为风险决策的潜在前兆来提高工人安全
- 批准号:
2326937 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Small: Measuring, Validating and Improving upon App-Based Privacy Nutrition Labels
合作研究:SaTC:核心:小型:测量、验证和改进基于应用程序的隐私营养标签
- 批准号:
2247952 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
Collaborative Research: Reducing Model Uncertainty by Improving Understanding of Pacific Meridional Climate Structure during Past Warm Intervals
合作研究:通过提高对过去温暖时期太平洋经向气候结构的理解来降低模型不确定性
- 批准号:
2303568 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Continuing Grant
Collaborative Research: Improving Model Representations of Antarctic Ice-shelf Instability and Break-up due to Surface Meltwater Processes
合作研究:改进地表融水过程导致的南极冰架不稳定和破裂的模型表示
- 批准号:
2213704 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
Collaborative Research: SitS: Improving Rice Cultivation by Observing Dynamic Soil Chemical Processes from Grain to Landscape Scales
合作研究:SitS:通过观察从谷物到景观尺度的动态土壤化学过程来改善水稻种植
- 批准号:
2226647 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
Collaborative Research: SitS: Improving Rice Cultivation by Observing Dynamic Soil Chemical Processes from Grain to Landscape Scales
合作研究:SitS:通过观察从谷物到景观尺度的动态土壤化学过程来改善水稻种植
- 批准号:
2226648 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: RCBP-RF: CPS: Socially Informed Traffic Signal Control for Improving Near Roadway Air Quality
合作研究:CISE-MSI:RCBP-RF:CPS:用于改善附近道路空气质量的社会知情交通信号控制
- 批准号:
2318696 - 财政年份:2023
- 资助金额:
$ 19.35万 - 项目类别:
Standard Grant