Breaking the Unwritten Language Barrier
打破不成文的语言障碍
基本信息
- 批准号:259117245
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2014
- 资助国家:德国
- 起止时间:2013-12-31 至 2018-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The BULB project aims at supporting the documentation of unwritten languages with the help of automatic speech and language processing, in particular automatic speech recognition (ASR) and machine translation (MT). We will address the documentation of three mostly unwritten African languages of the Bantu family (Basaa, Myene and Embosi). The main steps of the project are:1. To collect the corpora at a reasonable cost, using a three step methodology, following the work of S. Bird and M. Liberman:collecting a large corpus of speech (100 hours) in a community, including elicited material, stories, dialogs and broadcasts;re-speaking. As the sound quality of the recordings will be very spontaneous, with possibly overlapping speech in noisy environments, carefully articulated re-speaking by a reference speaker will give rise to more accurate automatic phonetic transcriptions and to improved material for phonetic/phonological studies.oral translation. Translation is the natural way to document a new language; oral translations will accelerate the documentation process. Our Bantu data will be translated to French, a major language and a second language in the regions of our studied communities.2. The collected oral data (Bantu originals and French translations) contain the necessary information to document the studied languages. ASR is expected to automatically produce accurate transcriptions in source and target languages and MT to provide meaningful alignments between both, to speed up the major tasks of documentation, description and analysis. The major automatic processing steps are:phonetic transcription of the studied languages. This step requires first a set of language-independent phone models which must be tuned to the language under study via unsupervised adaptation techniques;word transcription of the oral French translations. Language and acoustic models need to be adapted to obtain high transcription accuracy;alignments between the phonetic transcriptions (originals, respeaking) of the studied language. Alignments are highly valuable for large scale acoustic-phonetic studies, phonological and prosodic data mining and dialectal variations studies;cross-language alignments that aim at linking phone sequences in the studied language with French words. Such alignments may prove very useful for morphological studies, vocabulary and pronunciation elaboration.The success of the project relies on a strong German-French cooperation between linguists and computer scientists. Cooperations will be fostered and strengthened by a series of courses benefiting the scientific community beyond the present consortium. During these courses, linguists will present to computer scientists the major steps to document an unknown language, and computer scientists will introduce their methods to process a "new" language thus generating phonetic transcriptions and pseudo-word alignments to be returned to linguists.
BULB项目旨在借助自动语音和语言处理,特别是自动语音识别(ASR)和机器翻译(MT),支持非书面语言的文档编制。我们将讨论班图族的三种主要不成文的非洲语言(Basaa,Myene和Embosi)的文件。该项目的主要步骤是:1.以合理的成本收集语料库,采用三步法,遵循S。Bird和M.利伯曼:在一个社区收集大量的语料库(100小时),包括引出的材料,故事,对话和广播;重新说话。由于录音的声音质量是非常自然的,在嘈杂的环境中可能会有重叠的语音,由参考说话人仔细清晰地复述将产生更准确的自动语音翻译,并改进语音/音韵学研究的材料。翻译是记录一种新语言的自然方式;口头翻译将加快记录过程。我们的班图数据将被翻译成法语,法语是我们研究社区地区的主要语言和第二语言。所收集的口头资料(班图语原文和法文译文)包含了记录所研究语言的必要信息。ASR预计将自动生成源语言和目标语言的准确翻译,MT将在两者之间提供有意义的对齐,以加快文档,描述和分析的主要任务。自动处理的主要步骤是:所研究语言的音标。这一步首先需要一组独立于语言的电话模型,必须通过无监督的适应技术调整到所研究的语言;法语口语翻译的单词转录。需要调整语言和声学模型以获得高转录准确性;所研究语言的语音转录(原始、重新峰化)之间的对齐。对齐是非常有价值的大规模声学语音研究,语音和韵律数据挖掘和方言的变化研究;跨语言对齐,旨在连接所研究的语言与法语单词的音素序列。这样的比对可能对形态学研究、词汇和发音的完善非常有用。该项目的成功依赖于语言学家和计算机科学家之间的德法合作。将通过一系列课程促进和加强合作,使本集团以外的科学界受益。在这些课程中,语言学家将向计算机科学家介绍记录一种未知语言的主要步骤,计算机科学家将介绍他们处理“新”语言的方法,从而生成语音转换和伪词对齐,并返回给语言学家。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Towards phoneme inventory discovery for documentation of unwritten languages
面向非书面语言记录的音素清单发现
- DOI:10.1109/icassp.2017.7953148
- 发表时间:2017
- 期刊:
- 影响因子:0
- 作者:Müller;Markus;Jörg Franke;Alex Waibel;Sebastian Stüker
- 通讯作者:Sebastian Stüker
Neural Language Codes for Multilingual Acoustic Models
多语言声学模型的神经语言代码
- DOI:10.21437/interspeech.2018-1241
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Markus Müller;Sebastian Stüker;Alex Waibel
- 通讯作者:Alex Waibel
Unsupervised Phoneme Segmentation of Previously Unseen Languages
以前未见过的语言的无监督音素分割
- DOI:10.21437/interspeech.2016-1440
- 发表时间:2016
- 期刊:
- 影响因子:0
- 作者:Vetter;Markus Müller;Fatima Hamlaoui;Graham Neubig;Satoshi Nakamura;Sebastian Stüker;Alex Waibel
- 通讯作者:Alex Waibel
DBLSTM based multilingual articulatory feature extraction for language documentation
基于 DBLSTM 的语言文档多语言发音特征提取
- DOI:10.1109/asru.2017.8268966
- 发表时间:2017
- 期刊:
- 影响因子:0
- 作者:Müller;Markus;Sebastian Stüker;Alex Waibel
- 通讯作者:Alex Waibel
The prosody of focus and emphasis in Sepedi
Sepedi 中焦点和强调的韵律
- DOI:10.1109/robomech.2016.7813146
- 发表时间:2016
- 期刊:
- 影响因子:0
- 作者:M. Raborife;G. Turco ;S. Zerbian
- 通讯作者:S. Zerbian
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Dr. Fatima Hamlaoui其他文献
Dr. Fatima Hamlaoui的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Unwritten Constitutional Norms and Principles: A Comparative Study
不成文的宪法规范和原则:比较研究
- 批准号:
ES/X008185/1 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Research Grant
Biotin catabolism: an unwritten chapter in the metabolism of an essential vitamin
生物素分解代谢:必需维生素代谢中不成文的章节
- 批准号:
10346796 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Biotin catabolism: an unwritten chapter in the metabolism of an essential vitamin
生物素分解代谢:必需维生素代谢中不成文的章节
- 批准号:
10533814 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Uncovering the Unwritten-Rules in Photoredox Catalysis for Late Stage Functionalisation
揭示光氧化还原催化后期功能化的不成文规则
- 批准号:
2278965 - 财政年份:2019
- 资助金额:
-- - 项目类别:
Studentship
Elucidation and Lexicography of Vanuatu Unwritten Languages in Danger of Extinction
濒临灭绝的瓦努阿图不成文语言的阐释和词典编纂
- 批准号:
18K00579 - 财政年份:2018
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Extensive material from field research about the almost unknown unwritten Prasun language (Afghanistan) will be elaborated and prepared for publication in print.
来自实地研究的有关几乎不为人所知的未成文普拉逊语(阿富汗)的大量材料将被详细阐述并准备印刷出版。
- 批准号:
222132533 - 财政年份:2012
- 资助金额:
-- - 项目类别:
Research Grants
A Reconstruction of Paleograpy:Blending Written and Unwritten Information
古文字学的重建:书面和非书面信息的混合
- 批准号:
23520841 - 财政年份:2011
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
What makes unwritten rules work? A framework for understanding normative influence.
是什么让不成文的规则发挥作用?
- 批准号:
DP0877146 - 财政年份:2008
- 资助金额:
-- - 项目类别:
Discovery Projects
THE UNWRITTEN HISTORY OF AUSTRALIAN MODERNISM
澳大利亚现代主义不成文的历史
- 批准号:
LP0347073 - 财政年份:2003
- 资助金额:
-- - 项目类别:
Linkage Projects
Unstable Prospects of Modern Unwritten Constitution Under the Thatcher Governments
撒切尔政府领导下的现代不成文宪法的不稳定前景
- 批准号:
02620013 - 财政年份:1990
- 资助金额:
-- - 项目类别:
Grant-in-Aid for General Scientific Research (C)