EAGER: Converting Print Dictionaries to Machine-Interpretable Format
EAGER:将印刷词典转换为机器可解释的格式
基本信息
- 批准号:1644606
- 负责人:
- 金额:$ 7.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-15 至 2018-02-28
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A dictionary documents the building blocks of a language -- its words and idiomatic phrases, with descriptions of their pronunciations, grammatical properties, meanings and uses -- and is an essential component of language documentation, together with a reference grammar and transcribed texts and recordings. Until recently, dictionaries were compiled and organized by hand, entered into some kind of typesetting system, and finally rendered in print form for use by scholars and language learners. Contemporary dictionaries are now compiled and organized electronically so that the information they contain can be used not only to produce stand-alone print artifacts, but also be integrated with the other components to ensure greater accuracy of the documentation as a whole, enable updates to be produced at regular intervals, and support the development of natural-language processing tools for the languages that are documented in this way. The goal of this exploratory project is to develop methods for machines to understand the implicit structure of the hundreds of extant print dictionaries of endangered and other low-resource languages as a critical first step in enabling their documentation to be of maximal usefulness to future generations.Print dictionaries use ordering, typeface and other formatting conventions to indicate the intended structure of dictionary entries. The first task of this project is to use optical character reading (OCR) software to convert those entries to machine-interpretable form so as to preserve the original formatting. The second is to develop software to convert the corrected OCR output into structured, machine-interpretable archive-standard formats. Because print formats vary widely across dictionaries, human intervention is required to inform the software about how to translate the implicit representations for a particular dictionary's entries into explicit ones. But such manual annotation is only required for a small part of the dictionary, as the formatting conventions are consistent across all of its entries, and once learned can be used to identify and correct errors and inconsistencies, and enable automated editing tasks like updating orthographies. The tool will be developed, tested and evaluated using print dictionaries of two indigenous languages of Latin America that were produced in the latter part of the twentieth century. This project is jointly supported by the Documenting Endangered Languages Program in the Behavioral and Cognitive Sciences Division and by the Robust Intelligence Program in the Information and Intelligent Systems Division.
词典记录一种语言的组成部分-其单词和习惯用语,并说明其发音、语法特性、含义和用途-是语言文献的重要组成部分,还有参考语法、转录文本和录音。直到最近,词典都是手工编纂和组织的,输入某种排版系统,最后以印刷形式呈现,供学者和语言学习者使用。当代词典现在都是以电子方式编纂和组织的,因此,它们所包含的信息不仅可以用来制作独立的印刷品,而且还可以与其他组成部分相结合,以确保整个文献的更高准确性,使更新能够定期进行,并支持为以这种方式记录的语言开发自然语言处理工具。这个探索性项目的目标是为机器开发方法,以理解濒危语言和其他低资源语言的数百种现存印刷词典的隐含结构,作为使其文档对后代最大有用性的关键第一步。印刷词典使用排序,字体和其他格式约定来指示词典条目的预期结构。本项目的第一项任务是使用光学字符阅读(OCR)软件将这些条目转换为机器可读形式,以保留原始格式。第二个是开发软件,将校正后的OCR输出转换为结构化的、机器可解释的档案标准格式。由于打印格式在不同的字典中差异很大,因此需要人工干预来通知软件如何将特定字典条目的隐式表示转换为显式表示。但这种手动注释只需要字典的一小部分,因为格式约定在其所有条目中是一致的,并且一旦学习就可以用于识别和纠正错误和不一致,并实现自动编辑任务,如更新拼写。将利用20世纪后期制作的拉丁美洲两种土著语言的印刷词典来开发、测试和评价这一工具。该项目由行为和认知科学司的濒危语言记录计划以及信息和智能系统司的强大智能计划共同支持。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Endangered Data for Endangered Languages: Digitizing Print dictionaries
濒危语言的濒危数据:数字化印刷词典
- DOI:10.18653/v1/w17-0112
- 发表时间:2017
- 期刊:
- 影响因子:0
- 作者:Maxwell, Michael;Bills, Aric
- 通讯作者:Bills, Aric
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Michael Maxwell其他文献
Evaluating trunnionosis in modular anatomic shoulder arthroplasties: a retrieval study
- DOI:
10.1016/j.jse.2023.04.001 - 发表时间:
2023-10-01 - 期刊:
- 影响因子:
- 作者:
Michael Maxwell;Trevor Tooley;Ian Penvose;Corinn Gehrke;Denise Koueiter;Brett Wiater;Erin Baker;J. Michael Wiater - 通讯作者:
J. Michael Wiater
Secreted Cysteine-Rich Repeat Proteins “SCREPs”: A Novel Multi-Domain Architecture
分泌性富含半胱氨酸的重复蛋白“SCREP”:一种新颖的多域结构
- DOI:
10.3389/fphar.2018.01333 - 发表时间:
2018 - 期刊:
- 影响因子:5.6
- 作者:
Michael Maxwell;Eivind A. B. Undheim;M. Mobli - 通讯作者:
M. Mobli
STREAMLInED Challenges: Aligning Research Interests with Shared Tasks
STREAMLInED 挑战:使研究兴趣与共同任务保持一致
- DOI:
10.18653/v1/w17-0106 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Gina;Emily M. Bender;Patrick Littell;Kristen Howell;S. Chelliah;Joshua Crowgey;Dan Garrette;Jeff Good;S. Hargus;David Inman;Michael Maxwell;M. Tjalve;Fei Xia - 通讯作者:
Fei Xia
Portfolio optimisation under the tracking error constraint
跟踪误差约束下的投资组合优化
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Michael Maxwell - 通讯作者:
Michael Maxwell
Active Investment Strategies under Tracking Error Constraints
- DOI:
10.1007/s11294-019-09746-3 - 发表时间:
2019-08-07 - 期刊:
- 影响因子:0.700
- 作者:
Michael Maxwell;Gary van Vuuren - 通讯作者:
Gary van Vuuren
Michael Maxwell的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Michael Maxwell', 18)}}的其他基金
Biology through art: an innovative, interdisciplinary approach to teaching biology
通过艺术的生物学:一种创新的、跨学科的生物学教学方法
- 批准号:
2315749 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Standard Grant
RCN-UBE Incubator: Biology through art: an innovative, interdisciplinary approach to teaching biology
RCN-UBE 孵化器:通过艺术实现生物学:一种创新的跨学科生物学教学方法
- 批准号:
2120607 - 财政年份:2021
- 资助金额:
$ 7.48万 - 项目类别:
Standard Grant
相似国自然基金
KCC2—GABA A受体反转(converting)在丙泊酚后处理脑保护作用中的机制研究
- 批准号:81100984
- 批准年份:2011
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Converting lignin condensed structures into high-value polyaromatic hydrocarbon chemicals by controlled pyrolysis
通过受控热解将木质素缩合结构转化为高价值的多芳烃化学品
- 批准号:
24K17940 - 财政年份:2024
- 资助金额:
$ 7.48万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Converting Biomass into Value-Added Catalysts for Water Electrolysis
将生物质转化为水电解的增值催化剂
- 批准号:
LP230100183 - 财政年份:2024
- 资助金额:
$ 7.48万 - 项目类别:
Linkage Projects
Black Soldier Fly Pilot Trial - Innovative Black Soldier Fly (BSF) Micro Farm (MF) Project, reclaiming waste feed inputs and converting it into proteins for animal feed
黑水虻试点试验 - 创新的黑水虻 (BSF) 微型农场 (MF) 项目,回收废弃饲料投入并将其转化为动物饲料的蛋白质
- 批准号:
10071795 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Collaborative R&D
Converting cytoskeletal forces into biochemical signals
将细胞骨架力转化为生化信号
- 批准号:
10655891 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
PFI-TT: Scalable Manufacturing of Novel Catalysts for Converting CO2 to Valuable Products
PFI-TT:可规模化生产将二氧化碳转化为有价值产品的新型催化剂
- 批准号:
2326072 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Continuing Grant
Photocatalysts for Converting Plastic Wastes into Hydrogen and Chemicals
将塑料废物转化为氢气和化学品的光催化剂
- 批准号:
FT230100192 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
ARC Future Fellowships
Converting intestinal epithelial function by cell transplantation
通过细胞移植改变肠上皮功能
- 批准号:
23H02974 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
21ENGBIO - Converting a cellular dustbin into a protein storing organelle
21ENGBIO - 将细胞垃圾箱转变为蛋白质储存细胞器
- 批准号:
BB/W012162/1 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Research Grant
Principle and practice of catalytic decarbonization process constructed by converting industrial GHG to solid carbon and green resources
工业温室气体转化为固体碳和绿色资源的催化脱碳工艺原理与实践
- 批准号:
23H00530 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
Catalyst design for converting carbon dioxide into valuable chemicals
将二氧化碳转化为有价值的化学品的催化剂设计
- 批准号:
DE230100357 - 财政年份:2023
- 资助金额:
$ 7.48万 - 项目类别:
Discovery Early Career Researcher Award














{{item.name}}会员




