Building a parsed historical corpus to investigate word-order change and variation
构建经过解析的历史语料库来研究词序变化和变异
基本信息
- 批准号:2314522
- 负责人:
- 金额:$ 45.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-09-01 至 2026-08-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Living languages change over time in a number of areas, including not only vocabulary and pronunciation, but also sentence structure. Historical linguistics is concerned with documenting these changes and seeking explanations for them. Changes in sentence structure often occur over an extended period of time, including a period in which there is variation between various grammatical patterns for expressing a basic notion. The only evidence for these changes before the introduction of sound recording consists of written documents. However, gathering sufficient evidence from written documents for a rigorous scientific investigation of variation and change in grammatical patterns in the history of a given language requires the examination of a large, parsed corpus — a collection of texts that is divided into sentences, clauses, and phrases. This project builds a parsed electronic corpus of a single language, covering multiple centuries, geographical areas, and text genres. This allows for the investigation of grammatical change and variation in the history of the language as well as comparison with similar developments in related languages. The corpus is publicly available for any researcher to use, and outreach to universities and high schools promotes public awareness of the use of science and technology to explore questions about the structure of language. The development of the corpus also contributes to the training of the next generation of researchers in linguistics including a postdoctoral researcher, graduate students, and undergraduate students. This project builds a 1.4-million-word syntactically parsed electronic corpus including 165 texts spanning the years 1050-1950 and ten dialectal regions, and a range of text genres. This requires substantial extension of existing annotation schemes based on previous syntactically parsed corpora to accommodate a broader range of syntactic phenomena, while also keeping the annotation scheme as comparable as possible with those used in the handful of syntactically parsed historical corpora of other languages. This project involves manual annotation of texts, correcting errors that arise in automatic part-of-speech parsing, disambiguation of many sentences, and cross-checking for accuracy of syntactic annotations. The resulting annotated corpus fills a gap among the set of parsed corpora the world's languages and is available free of charge to researchers around the world, together with documentation on the use of the corpus. The empirical data generated by this project informs research on the mechanisms and spread of typological change over time across closely related dialects. The corpus can be used to investigate phenomena not only in the domain of syntax, but also in the interfaces between syntax and other components of grammar. Given the broad range of texts in the corpus, these phenomena can be examined synchronically, diachronically, sociolinguistically, and in comparison with other languages.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
活的语言随着时间的推移在许多方面发生变化,不仅包括词汇和发音,还包括句子结构。历史语言学关注的是记录这些变化,并为它们寻求解释。句子结构的变化往往发生在一个较长的时期,包括一个时期,其中有各种语法模式之间的变化,以表达一个基本概念。在引进录音之前,这些变化的唯一证据是书面文件。然而,从书面文件中收集足够的证据,对给定语言历史上语法模式的变化和变化进行严格的科学调查,需要检查大型的经过分析的语料库-一个分为句子,从句和短语的文本集合。该项目建立了一个单一语言的解析电子语料库,涵盖多个世纪,地理区域和文本类型。这允许调查语法变化和语言历史上的变化,以及与相关语言的类似发展进行比较。该语料库可供任何研究人员公开使用,并推广到大学和高中,促进公众意识到使用科学和技术来探索有关语言结构的问题。语料库的发展也有助于培养下一代语言学研究人员,包括博士后研究人员,研究生和本科生。该项目建立了一个140万字的句法分析电子语料库,包括165个文本,跨越1050-1950年和10个方言地区,以及一系列文本体裁。这需要大量扩展现有的注释计划的基础上以前的句法分析语料库,以适应更广泛的句法现象,同时也保持注释计划尽可能与其他语言的句法分析的历史语料库中使用的少数可比。该项目涉及文本的手动注释,纠正自动词性分析中出现的错误,消除许多句子的歧义,并交叉检查句法注释的准确性。由此产生的注释语料库填补了世界语言分析语料库之间的空白,并免费提供给世界各地的研究人员,以及使用语料库的文档。该项目产生的经验数据为研究密切相关的方言类型学变化随时间的机制和传播提供了信息。语料库不仅可以用来研究句法领域的现象,而且可以用来研究句法与语法其他成分之间的界面。鉴于语料库中的文本范围广泛,这些现象可以从共时、历时、社会语言学以及与其他语言的比较中进行研究。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christopher Sapp其他文献
Christopher Sapp的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Developing a program for language teaching with parsed corpora
使用解析语料库开发语言教学程序
- 批准号:
19K00541 - 财政年份:2019
- 资助金额:
$ 45.8万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development and Application of the Parsed Algebraic Iterative Reconstruction Algorithm for Use in Optical Computed Tomography for Gel Dosimetry
用于凝胶剂量测定的光学计算机断层扫描的解析代数迭代重建算法的开发和应用
- 批准号:
511071-2017 - 财政年份:2017
- 资助金额:
$ 45.8万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
Parsed and Audio-Aligned Corpus of Bilingual Russian Child Speech (BiRCh)
俄语儿童双语语音 (BiRCh) 的解析和音频对齐语料库
- 批准号:
1651083 - 财政年份:2017
- 资助金额:
$ 45.8万 - 项目类别:
Standard Grant
Collaborative Research: A Corpus of New York City English: Audio-Aligned and Parsed
合作研究:纽约市英语语料库:音频对齐和解析
- 批准号:
1630286 - 财政年份:2016
- 资助金额:
$ 45.8万 - 项目类别:
Standard Grant
Collaborative Research: A Corpus of New York City English: Audio-Aligned and Parsed
合作研究:纽约市英语语料库:音频对齐和解析
- 批准号:
1630377 - 财政年份:2016
- 资助金额:
$ 45.8万 - 项目类别:
Standard Grant
Collaborative Research: A corpus of New York City English: Audio-aligned and parsed
合作研究:纽约市英语语料库:音频对齐和解析
- 批准号:
1629348 - 财政年份:2016
- 资助金额:
$ 45.8万 - 项目类别:
Standard Grant
Collaborative Research: A Corpus of New York City English: Audio-Aligned and Parsed
合作研究:纽约市英语语料库:音频对齐和解析
- 批准号:
1630274 - 财政年份:2016
- 资助金额:
$ 45.8万 - 项目类别:
Standard Grant
The Extraction and Application of Visual and Semantic Information from Japanese Treebanks with Syntactic and Semantic Analysis Annotation
日语树库视觉和语义信息的句法和语义分析注释提取和应用
- 批准号:
15K02469 - 财政年份:2015
- 资助金额:
$ 45.8万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Research on annotation for the development of a parsed corpus of Japanese with a special focus on complex sentences
以复杂句子为重点的日语解析语料库开发注释研究
- 批准号:
15H03210 - 财政年份:2015
- 资助金额:
$ 45.8万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
43rd Linguistic Symposium on Romance Languages: Special Session on Romance Parsed Corpora-New York City - April, 2013
第 43 届罗曼语言语言学研讨会:罗曼解析语料库特别会议 - 纽约 - 2013 年 4 月
- 批准号:
1256700 - 财政年份:2013
- 资助金额:
$ 45.8万 - 项目类别:
Standard Grant