Building a corpus of phonemic lexicons to study information theoretic universals
建立音素词典语料库来研究信息论共性
基本信息
- 批准号:1829290
- 负责人:
- 金额:$ 39.11万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-01 至 2024-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Human language use reflects the nature of human communication. For instance, frequent words tend to have fewer sounds than infrequent ones, which facilitates quick production and understanding. However, little is known about more fine-grained distinctions. For instance, English has more /k/ than /p/ sounds. Does that reflect a property of human language and its physiological and perceptual nature or a historical accident? Answering such questions requires comparative data on the frequency and phonological makeup of words in many languages. This project will build on existing textual sources and word frequency lists to provide the phonological makeup of words in close to 200 low-resource languages. The phonological word lists will provide an invaluable resource to the understanding of human language and provide much-needed linguistic resources to low-resource languages. The outputs of the project will be made public and easily accessible, thereby assisting in documenting and teaching the processed languages, and in building computational linguistic resources such as text-to-speech engines. The research team, including trained undergraduate and graduate students, will create rules to translate alphabets to phonemic representation for multiple languages. The team will then collect textual resources and word frequency lists from publicly available sources such as online Bibles, newspapers, and movie subtitles. The rules will be applied separately to each source and the resulting phonological representations will be made publicly available, such that not only researchers but also the general public will be able to use and interact with the data. The researchers will proceed to use the data to investigate whether the information theoretic properties of sounds have distributional universality: do sounds tend to provide similar amounts of information cross-linguistically, and if so, does their information content correlate with their phonetic properties? Universality is an age-old question, and the similarities and differences of properties across language can provide new insights into language use. Specifically, the researchers will use information theoretic properties to predict whether low information or other previously studied phonological properties are likely to promote consonant weakening in those languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人类语言的使用反映了人类交流的本质。例如,频繁使用的单词往往比不频繁使用的单词发音更少,这有助于快速生成和理解。然而,人们对更细粒度的区别知之甚少。例如,英语的/k/音比/p/音多。这反映了人类语言的特性及其生理和感性的本质,还是历史的偶然?回答这些问题需要对许多语言中单词的频率和音系构成进行比较。该项目将建立在现有文本来源和词频列表的基础上,为近200种低资源语言的单词提供语音组成。语音词表将为理解人类语言提供宝贵的资源,并为资源匮乏的语言提供急需的语言资源。该项目的产出将公开并易于获取,从而协助记录和教授处理过的语言,并协助建立诸如文本转语音引擎之类的计算语言资源。该研究团队包括训练有素的本科生和研究生,他们将创建规则,将字母表翻译成多种语言的音素表示。然后,团队将从公开的资源中收集文本资源和词频列表,如在线圣经、报纸和电影字幕。这些规则将分别应用于每个来源,并且由此产生的语音表示将公开提供,这样不仅研究人员而且公众都可以使用并与数据交互。研究人员将继续使用这些数据来调查声音的信息理论属性是否具有分布普遍性:声音是否倾向于在跨语言中提供相似数量的信息,如果是这样,它们的信息内容是否与其语音属性相关?普遍性是一个古老的问题,不同语言属性的异同可以为语言使用提供新的见解。具体来说,研究人员将使用信息理论属性来预测低信息或其他先前研究过的语音属性是否可能促进这些语言中的辅音弱化。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Schwa’s duration and acoustic position in American English
美式英语中 Schwa 的持续时间和声学位置
- DOI:10.1016/j.wocn.2022.101198
- 发表时间:2023
- 期刊:
- 影响因子:1.9
- 作者:Cohen Priva, Uriel;Strand, Emily
- 通讯作者:Strand, Emily
The stability of segmental properties across genre and corpus types in low-resource languages
低资源语言中跨流派和语料库类型的分段属性的稳定性
- DOI:10.7275/fttf-fq95
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Cohen Priva, Uriel;Yang, Shiying;Strand, Emily
- 通讯作者:Strand, Emily
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Uriel Cohen Priva其他文献
Constructing Typing-Time Corpora: A New Way to Answer Old Questions
构建打字时间语料库:回答老问题的新方法
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
Uriel Cohen Priva - 通讯作者:
Uriel Cohen Priva
The interdependence of frequency, predictability, and informativity in the segmental domain
分段域中频率、可预测性和信息性的相互依赖性
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:1.1
- 作者:
Uriel Cohen Priva;T. Jaeger - 通讯作者:
T. Jaeger
The Organization of Lexicons: a Cross-Linguistic Analysis of Monosyllabic Words
词典的组织:单音节词的跨语言分析
- DOI:
10.7275/r58p5xpz - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Shiying Yang;C. Sanker;Uriel Cohen Priva - 通讯作者:
Uriel Cohen Priva
The causal structure of lenition: A case for the causal precedence of durational shortening
延缓的因果结构:持续时间缩短的因果优先性案例
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:2.1
- 作者:
Uriel Cohen Priva;Uriel Cohen Emily Gleason - 通讯作者:
Uriel Cohen Emily Gleason
The role of fast speech in sound change
快速语音在声音变化中的作用
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Uriel Cohen Priva;Emily Gleason - 通讯作者:
Emily Gleason
Uriel Cohen Priva的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
From corpus to target data as steps for automatic assessment of L2 speech: L2 French phonological lexicon of Japanese learners
从语料库到目标数据作为 L2 语音自动评估的步骤:日语学习者的 L2 法语语音词典
- 批准号:
23K20100 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Small Molecule Degraders of Tryptophan 2,3-Dioxygenase Enzyme (TDO) as Novel Treatments for Neurodegenerative Disease
色氨酸 2,3-双加氧酶 (TDO) 的小分子降解剂作为神经退行性疾病的新疗法
- 批准号:
10752555 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
The role of nigrostriatal and striatal cell subtype signaling in behavioral impairments related to schizophrenia
黑质纹状体和纹状体细胞亚型信号传导在精神分裂症相关行为障碍中的作用
- 批准号:
10751224 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
Centre for Corpus Approaches to Social Science
社会科学语料库方法中心
- 批准号:
ES/Z000025/1 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
Research Grant
Computational and neural signatures of interoceptive learning in anorexia nervosa
神经性厌食症内感受学习的计算和神经特征
- 批准号:
10824044 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
Childhood trauma, hippocampal function, and anhedonia among those at heightened risk for psychosis
精神病高危人群中的童年创伤、海马功能和快感缺失
- 批准号:
10825287 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
Frontocortical representations of amygdala-mediated learning under uncertainty
不确定性下杏仁核介导的学习的额皮质表征
- 批准号:
10825354 - 财政年份:2024
- 资助金额:
$ 39.11万 - 项目类别:
Developing and Evaluating a Positive Valence Treatment for Alcohol Use Disorder with Anxiety or Depression
开发和评估治疗伴有焦虑或抑郁的酒精使用障碍的正价疗法
- 批准号:
10596013 - 财政年份:2023
- 资助金额:
$ 39.11万 - 项目类别:
Iron deficits and their relationship with symptoms and cognition in Psychotic Spectrum Disorders
铁缺乏及其与精神病谱系障碍症状和认知的关系
- 批准号:
10595270 - 财政年份:2023
- 资助金额:
$ 39.11万 - 项目类别: