RI: Small: Collaborative Research: Automatic Creation of New Speech Sound Inventories
RI:小型:协作研究:自动创建新语音库存
基本信息
- 批准号:1910319
- 负责人:
- 金额:$ 25.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-07-01 至 2023-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Speech technology is supposed to be available for everyone, but in reality, it is not. There are 7000 languages spoken in the world, but speech technology (speech-to-text recognition and text-to-speech synthesis) only works in a few hundred of them. This project will solve that problem, by automatically figuring out the set of phonemes for each new language, that is, the set of speech sounds that define differences between words (for example, "peek" versus "peck:" long-E and short-E are distinct phonemes in English). Phonemes are the link between speaking and writing. A neural net that converts speech into text using some kind of phoneme inventory, and then back again, can be said to have used the correct phoneme inventory if its resynthesized speech always has the same meaning as the speech it started with. This approach can even be tested in languages that don't have any standard written form, because the text doesn't have to be real text: it could be chat alphabet (the kind of pseudo-Roman-alphabet that speakers of Arabic and Hindi sometimes use on twitter), or it could even be a picture (showing, in an image, what the user was describing). This research will make it possible for people to talk to their artificial intelligence systems (smart speakers, smart phones, smart cars, etc.) using their native languages. This research will advance science by providing big-data tools that scientists can use to study languages that do not have a (standard) writing system.End-to-end neural network methods can be used to develop speech-to-text-to-speech (S2T2S) and other spoken language processing applications with little additional software infrastructure, and little background knowledge. In fact, toolkits provide recipes so that a researcher with no prior speech experience can train an end-to-end neural system after only a few hours of data preparation. End-to-end systems are only practical, however, for languages with thousands of hours of transcribed data. For under-resourced languages (languages with very little transcribed speech) cross-language adaptation is necessary; for unwritten languages (those lacking any standard and well-known orthographic convention), it is necessary to define a spoken language task that doesn't require writing before one can even attempt cross-language adaptation. Preliminary evidence suggests that both types of cross-language adaptation are performed more accurately if the system has available, or creates, a phoneme inventory for the under-resourced language, and leverages the phoneme inventory to facilitate adaptation. The aim of this project is to automatically infer the acoustic phoneme inventory for under-resourced and unwritten languages in order to maximize the speech technology quality of an end-to-end neural system adapted into that language. The research team has demonstrated that it is possible to visualize sub-categorical distinctions between sounds as a neural net adapts to a new phoneme category; proposed experiments 1 and 2 leverage visualizations of this type, along with other methods of phoneme inventory validation, to improve cross-language adaptation. Experiments 3 and 4 go one step further, by adapting to languages without orthography; for a speech technology system to be trained and used in a language without orthography, it must first learn a useful phoneme inventory. Innovations in this project that occur nowhere else include: (1) the use of articulatory feature transcription as a multi-task training criterion for an end-to-end neural system that seeks to learn the phoneme set of a new language, (2) the use of visualization error rate as a training criterion in multi-task learning -- this training criterion is based on a method recently developed to visualize the adaptation of phoneme categories in a neural network, (3) the application of cross-language adaptation to improve the error rates of image2speech applications in a language without orthography, (4) the use of non-standard orthography (chat alphabet) to transcribe speech in an unwritten language, and (5) the use of non-native transcription (mismatched crowdsourcing) to jump-start the speech2chat training task. The methods proposed here will facilitate the scientific study of language, for example, by helping phoneticians to document the phoneme inventories of undocumented languages, thereby expediting the study of currently undocumented endangered languages before they disappear. Conversely, in minority languages with active but shrinking native speaker populations, planned methods will help develop end-to-end neural training methods with which the native speakers can easily develop new speech applications. All planned software will be packaged as recipes for the speech recognition virtual kitchen, permitting high school students and undergraduates with no speech expertise to develop systems for their own languages, and encouraging their interest in speech.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
语音技术本应适用于所有人,但实际上并非如此。 世界上有7000种语言,但语音技术(语音到文本识别和文本到语音合成)只适用于其中的几百种。这个项目将解决这个问题,通过自动计算出每种新语言的音素集,即定义单词之间差异的语音集(例如,“peek”与“peck“:长E和短E是英语中不同的音素)。 音素是连接说话和写作的纽带。 一个神经网络使用某种音素库将语音转换成文本,然后再转换回来,如果它重新合成的语音总是与它开始时的语音具有相同的含义,那么可以说它使用了正确的音素库。 这种方法甚至可以在没有任何标准书面形式的语言中进行测试,因为文本不一定是真实的文本:它可以是聊天字母表(阿拉伯语和印地语的人有时在twitter上使用的那种伪罗马字母表),甚至可以是图片(在图像中显示用户正在描述的内容)。 这项研究将使人们与他们的人工智能系统(智能扬声器、智能手机、智能汽车等)对话成为可能使用他们的母语。 这项研究将通过提供大数据工具来推进科学,科学家可以使用这些工具来研究没有(标准)书写系统的语言。端到端神经网络方法可以用于开发语音到文本到语音(S2 T2 S)和其他口语处理应用程序,而无需额外的软件基础设施和背景知识。事实上,工具包提供了食谱,使没有语音经验的研究人员可以在几个小时的数据准备后训练端到端的神经系统。然而,端到端系统仅适用于具有数千小时转录数据的语言。对于资源不足的语言(几乎没有转录语音的语言),跨语言适应是必要的;对于非书面语言(缺乏任何标准和众所周知的正字法惯例),有必要定义一个口语任务,在尝试跨语言适应之前不需要写作。初步证据表明,这两种类型的跨语言适应更准确地执行,如果系统有可用的,或创建,一个音素库存的资源不足的语言,并利用音素库存,以促进适应。该项目的目的是自动推断资源不足和非书面语言的声学音素库存,以最大限度地提高适应该语言的端到端神经系统的语音技术质量。研究小组已经证明,当神经网络适应新的音素类别时,可以将声音之间的子类别区分可视化;拟议的实验1和2利用这种类型的可视化,沿着其他音素库存验证方法,以改善跨语言适应。实验3和4更进一步,通过适应没有正字法的语言;对于要在没有正字法的语言中训练和使用的语音技术系统,它必须首先学习有用的音素库存。该项目的创新在其他任何地方都没有出现,包括:(1)使用发音特征转录作为寻求学习新语言的音素集的端到端神经系统的多任务训练标准,(2)在多任务学习中使用可视化错误率作为训练标准--该训练标准基于最近开发的用于可视化神经网络中音素类别的适应的方法,(3)应用跨语言自适应来改善图像到语音应用在没有正字法的语言中的错误率,(4)使用非标准正字法(聊天字母表)以非书面语言转录语音,以及(5)使用非本地转录(不匹配的众包)来启动speech 2chat训练任务。 这里提出的方法将促进语言的科学研究,例如,通过帮助语音学家记录未被记录的语言的音素库存,从而加快对目前未被记录的濒危语言的研究。相反,在少数民族语言中,母语人口活跃但不断减少,有计划的方法将有助于开发端到端的神经训练方法,母语者可以轻松开发新的语音应用程序。所有计划中的软件都将被打包成语音识别虚拟厨房的配方,允许没有语音专业知识的高中生和大学生开发自己语言的系统,并鼓励他们对语音的兴趣。该奖项反映了NSF的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions
用于从语音字幕和图像区域中发现类词单元的 DNN-HMM-DNN 混合模型
- DOI:10.21437/interspeech.2020-1148
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Wang, Liming;Hasegawa-Johnson, Mark
- 通讯作者:Hasegawa-Johnson, Mark
That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages
听起来很熟悉:跨语言语音表示迁移的分析
- DOI:10.21437/interspeech.2020-2513
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Żelasko, Piotr;Moro-Velázquez, Laureano;Hasegawa-Johnson, Mark;Scharenborg, Odette;Dehak, Najim
- 通讯作者:Dehak, Najim
Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval
- DOI:10.1109/icassp39728.2021.9414418
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Liming Wang;Xinsheng Wang;M. Hasegawa-Johnson;O. Scharenborg;N. Dehak
- 通讯作者:Liming Wang;Xinsheng Wang;M. Hasegawa-Johnson;O. Scharenborg;N. Dehak
Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks
使用递归渐进神经网络进行语音识别的跨语言发音特征信息传输
- DOI:10.21437/interspeech.2022-11202
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Morshed, Mahir;Hasegawa-Johnson, Mark
- 通讯作者:Hasegawa-Johnson, Mark
Training Spoken Language Understanding Systems with Non-Parallel Speech and Text
- DOI:10.1109/icassp40776.2020.9054664
- 发表时间:2020-05
- 期刊:
- 影响因子:0
- 作者:Leda Sari;Samuel Thomas;M. Hasegawa-Johnson
- 通讯作者:Leda Sari;Samuel Thomas;M. Hasegawa-Johnson
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mark Hasegawa-Johnson其他文献
Mark Hasegawa-Johnson的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mark Hasegawa-Johnson', 18)}}的其他基金
FAI: A New Paradigm for the Evaluation and Training of Inclusive Automatic Speech Recognition
FAI:包容性自动语音识别评估和训练的新范式
- 批准号:
2147350 - 财政年份:2022
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
EAGER: Matching Non-Native Transcribers to the Distinctive Features of the Language Transcribed
EAGER:将非母语转录者与转录语言的独特特征相匹配
- 批准号:
1550145 - 财政年份:2015
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
FODAVA-Partner: Visualizing Audio for Anomaly Detection
FODAVA-合作伙伴:可视化音频以进行异常检测
- 批准号:
0807329 - 财政年份:2008
- 资助金额:
$ 25.98万 - 项目类别:
Continuing Grant
RI Medium: Audio Diarization - Towards Comprehensive Description of Audio Events
RI Medium:音频二值化 - 全面描述音频事件
- 批准号:
0803219 - 财政年份:2008
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Audiovisual Distinctive-Feature-Based Recognition of Dysarthric Speech
基于视听特征的构音障碍语音识别
- 批准号:
0534106 - 财政年份:2005
- 资助金额:
$ 25.98万 - 项目类别:
Continuing Grant
Prosodic, Intonational, and Voice Quality Correlates of Disfluency
韵律、语调和语音质量与不流畅的相关性
- 批准号:
0414117 - 财政年份:2004
- 资助金额:
$ 25.98万 - 项目类别:
Continuing Grant
CAREER: Landmark-Based Speech Recognition in Music and Speech Backgrounds
职业:音乐和语音背景中基于地标的语音识别
- 批准号:
0132900 - 财政年份:2002
- 资助金额:
$ 25.98万 - 项目类别:
Continuing Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232298 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
- 批准号:
2345528 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232055 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232054 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232300 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232299 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2334936 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313130 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant
RI: Small: Collaborative Research: Evolutionary Approach to Optimal Morphology and Control of Transformable Soft Robots
RI:小型:协作研究:可变形软机器人的最佳形态和控制的进化方法
- 批准号:
2325491 - 财政年份:2023
- 资助金额:
$ 25.98万 - 项目类别:
Standard Grant