Cross-linguistic phonetics and morphology using a time-aligned multilingual reference corpus built from documentations of 50 languages: Big data on small languages

使用根据 50 种语言的文档构建的时间对齐的多语言参考语料库进行跨语言语音学和形态学:小语言的大数据

基本信息

项目摘要

Speech rate and pauses provide us with a window into the cognitive-neural and physiological-articulatory bases of the human language production system, but crosslinguistic variation in this domain remain understudied. This project fills this gap by comparative studies of spontaneously spoken language in a diverse sample of 50 languages. For this purpose, we create a multilingual reference corpus of language documentation data (DoReCo) consisting of annotations and associated audio recordings that are archived at repositories such as The Language Archive (TLA), especially from the DOBES collection. DoReCo will be built from data that are already transcribed, translated into a major language, and time-aligned at the level of discourse units with audio files. Within the current project, these data will be time-aligned at the phoneme level. We have identified at least 50 languages, from which corpora of at least 10,000 words can be included in DoReCo, and a subset of at least 30 of these, which are additionally already annotated for morpheme breaks and morpheme glosses. In DoReCo, subcorpora and annotations are treated as citable publications, provided with a permanent identifier and associated with a CC BY 4.0 license. DoReCo will have a lasting effect beyond the specific research goals of the DoReCo project, as a platform for easy access to over one million words of annotated corpus data from over 50 languages for cross-linguistic research on spoken language. This represents an unprecedented contribution to open, reproducible science regarding global linguistic diversity and cultural heritage. Both of DoReCo’s two specific research goals address the universality of constraints on human language arising from species-wide articulatory and cognitive properties: Firstly, we investigate patterns of phonetic lengthening with the aim towards establishing universal vs. language-specific patterns in (i) the degree to which different types of phonological segments undergo variation in duration (e.g. vowels vs. different types of consonants)–reflecting articulatory and perceptual constraints–and (ii) word-final lengthening as indicative of major vs. minor prosodic boundaries–reflecting cognitive constraints on planning and potentially signalling discourse units. Secondly, we investigate universal vs. language-specific patterns in the temporal distribution of morphemes regarding (i) information rate in terms of morphemes per second and (ii) the number of morphemes in inter-pausal units–both reflecting cognitive constraints on language use. The project will be carried out by an interdisciplinary team bringing together expertise on documentary linguistics, phonetics, typology, and quantitative linguistics, with strong institutional support from two leading research centres in Germany and France.
语速和停顿为我们提供了一扇了解人类语言产生系统的认知-神经和生理-发音基础的窗口,但这一领域的跨语言变异仍未得到充分研究。该项目通过对50种不同语言的自然语言进行比较研究,填补了这一空白。为此,我们创建了一个语言文档数据(DoReCo)的多语言参考语料库(DoReCo),其中包括注释和相关的录音,这些录音保存在语言档案馆(TLA)等存储库中,特别是来自Dobes集合。DoReCo将由已经转录、翻译成一种主要语言的数据建立,并在带有音频文件的话语单元级别上与时间保持一致。在目前的项目中,这些数据将在音素层面上保持时间一致。我们已经确定了至少50种语言,其中至少10,000个单词的语料库可以包含在DoReCo中,以及其中至少30种语言的子集,这些语言已经被额外标注了语素中断和语素注释。在DoReCo中,子语料库和注释被视为可引用的出版物,提供永久标识符,并与CC by 4.0许可证相关联。DoReCo将产生超越DoReCo项目具体研究目标的持久影响,作为一个平台,可以轻松获取来自50多种语言的100多万字注释语料库数据,用于跨语言口语研究。这是对有关全球语言多样性和文化遗产的开放的、可复制的科学的前所未有的贡献。DoReco的两个具体研究目标都着眼于人类语言受到发音和认知特性限制的普遍性:首先,我们研究了语音延长的模式,目的是在(I)不同类型的音系片段在持续时间上经历变化的程度(例如,元音和不同类型的辅音)-反映发音和感知限制-以及(Ii)词尾延长作为主要和次要韵律边界的指示-反映对规划和潜在发出信号的话语单元的认知限制。其次,我们考察了语素在时间分布上的普遍模式和特定的语言模式,包括(I)每秒语素的信息率和(Ii)停顿单位中的语素数量--两者都反映了对语言使用的认知限制。该项目将由一个跨学科小组实施,该小组将汇集文献语言学、语音学、类型学和定量语言学方面的专业知识,并得到德国和法国两个主要研究中心的大力机构支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Privatdozent Dr. Frank Seifart, since 11/2019其他文献

Privatdozent Dr. Frank Seifart, since 11/2019的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

Linguistic Experience and Generalization: Early Links between Sounds,Words, and Grammar
语言经验和概括:声音、单词和语法之间的早期联系
  • 批准号:
    10291816
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
Linguistic Experience and Generalization: Early Links between Sounds,Words, and Grammar
语言经验和概括:声音、单词和语法之间的早期联系
  • 批准号:
    10054653
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
Cross-linguistic influence in the acquisition of phonology and phonetics by multilingual children and adults
多语言儿童和成人习得音韵学和语音学的跨语言影响
  • 批准号:
    349906019
  • 财政年份:
    2017
  • 资助金额:
    --
  • 项目类别:
    Research Grants
Effects of linguistic experience on speech perception
语言经验对言语感知的影响
  • 批准号:
    8413779
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
LINGUISTIC AND SOCIAL RESPONSES TO SPEECH IN INFANTS AT RISK FOR AUTISM
有自闭症风险的婴儿对言语的语言和社会反应
  • 批准号:
    8326761
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
Effects of linguistic experience on speech perception
语言经验对言语感知的影响
  • 批准号:
    8023910
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
Effects of linguistic experience on speech perception
语言经验对言语感知的影响
  • 批准号:
    8220712
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
Perception of talker cues and linguistic processing in SLI
SLI 中谈话者线索的感知和语言处理
  • 批准号:
    8196934
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
Perception of talker cues and linguistic processing in SLI
SLI 中谈话者线索的感知和语言处理
  • 批准号:
    8384861
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
Perception of talker cues and linguistic processing in SLI
SLI 中谈话者线索的感知和语言处理
  • 批准号:
    8035630
  • 财政年份:
    2010
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了