Machine Translation and Automated Analysis of Cuneiform Languages

楔形文字的机器翻译和自动分析

基本信息

项目摘要

History and culture of ancient Mesopotamia, home of the first empires and birthplace of writing, are mostly known through literary and royal inscriptions. Yet, administrative texts, that make up well over 90% of all cuneiform documents, have received much less attention: Even when transliterated and digitized, most remain untranslated, and therefore inaccessible to scholars in even closely related fields. But these texts are unique and deeply insightful socio-historical witnesses, as they document the day-to-day management of early state economies. Because of their vast numbers, their human translation appears to be an unachievable task. From the 21st c. BC alone, we have access to more than 67,000 digital transcriptions as routinely produced by specialists in a particular subset of documents, but without translation difficult to interpret even by specialists in other branches of Assyriology. MTAAC combines recent developments in machine learning (ML) with statistical and neural machine translation (MT) to facilitate the analysis of this material, thereby fundamentally expanding its accessibility to the Humanities and Social Sciences.Main outcome is a methodology, its implementation, and a body of translated and analyzed texts, released under open licenses. Beyond cuneiform studies, we set an example for processing a host of comparable datasets in different historical philologies. Because the texts are so numerous, we supplement human labor with automated solutions. Statistical and neural approaches to Natural Language Processing have been maturing in the last decades, and enjoy wide usage, but have rarely been applied to even major historical languages. We aim to bridge this gap, set an example for ML and MT in the Humanities, and facilitate studies of cuneiform languages. To increase re-usability, we adapt and develop community-maintained specifications based on linked open data formalisms, and propose rules of best practice for collaboration with other digital humanities actors such as museums, and portals for various strands of philology. PI Heather Baker, University of Toronto, Canada, leads the work on language specific aspects in MTAAC. Co-PI Robert Englund, UCLA, director of the Cuneiform Digital Library Initiative, is in charge of data management and hosting. Co-PI Christian Chiarcos, Goethe University Frankfurt, Germany, is responsible ML, MT and data integration. Methodologies are developed collaboratively. MTAAC provides unified access to a highly representative corpus of early writing, and will employ MT and ML to facilitate its context-sensitive semantic interpretation. The project will foster an unprecedented scholarly cooperation among researchers in a variety of disciplines. As a result, lines of communication to the heritage of civilizations dead for many millennia will be made accessible to the networked public, contributing to a deeper appreciation and understanding of modern culture and its historical roots.
古代美索不达米亚是第一个帝国的故乡,也是文字的发源地,其历史和文化大多通过文学和皇家铭文而为人所知。然而,占所有楔形文字文件90%以上的行政文本受到的关注要少得多:即使经过音译和数字化,大多数文本仍然没有翻译,因此即使是密切相关领域的学者也无法访问。但这些文本是独特的、具有深刻洞察力的社会历史见证,因为它们记录了早期国家经济的日常管理。由于数量庞大,它们的人工翻译似乎是一项无法完成的任务。仅从公元前21世纪起,我们就可以接触到67,000多份数字抄本,这些抄本通常由特定文件子集的专家制作,但没有翻译,即使是亚述学其他分支的专家也难以解释。MTAAC将机器学习(ML)与统计和神经机器翻译(MT)的最新发展相结合,以促进对该材料的分析,从而从根本上扩大其对人文和社会科学的可及性。主要成果是一种方法、其实施以及在开放许可下发布的大量翻译和分析文本。除了楔形文字的研究之外,我们还为处理不同历史哲学中的大量可比数据集树立了榜样。因为文本如此之多,我们用自动化解决方案来补充人力。在过去的几十年里,统计和神经方法在自然语言处理中得到了成熟,并得到了广泛的应用,但很少被应用到主要的历史语言中。我们的目标是弥合这一差距,在人文学科中为ML和MT树立榜样,并促进楔形语言的研究。为了增加可重用性,我们调整和开发了基于链接的开放数据形式化的社区维护的规范,并为与其他数字人文参与者(如博物馆)和各种文献学的门户网站合作提出了最佳实践规则。加拿大多伦多大学的皮希瑟·贝克领导着MTAAC语言方面的工作。加州大学洛杉矶分校联合PI罗伯特·恩格伦德是楔形数字图书馆倡议的负责人,负责数据管理和托管。德国法兰克福歌德大学的共同Pi Christian Chiarcos负责ML、MT和数据集成。方法论是通过协作开发的。MTAAC提供对具有高度代表性的早期写作语料库的统一访问,并将使用机器翻译和最大似然语言来促进其上下文敏感的语义解释。该项目将促进不同学科研究人员之间前所未有的学术合作。因此,联网的公众将能够接触到与死了几千年的文明遗产有关的沟通渠道,有助于更深入地欣赏和理解现代文化及其历史根源。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr. Christian Chiarcos其他文献

Professor Dr. Christian Chiarcos的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr. Christian Chiarcos', 18)}}的其他基金

Continuation of the constitution of the "Virtual Library for General Linguistics and ComparativeLanguage Studies" in the context of the Special Subject Collection Linguistics 7.11 funded by theGerman Research Society
在德国研究学会资助的语言学专题馆藏 7.11 的背景下继续构建“普通语言学和比较语言研究虚拟图书馆”
  • 批准号:
    214512695
  • 财政年份:
    2011
  • 资助金额:
    --
  • 项目类别:
    Acquisition and Provision (Scientific Library Services and Information Systems)

相似海外基金

Towards automated Australian Sign Language translation
迈向自动化澳大利亚手语翻译
  • 批准号:
    DE230100049
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Discovery Early Career Researcher Award
Translation of the UVA Advanced Automated Insulin Delivery Systems to Clinical Care in Young Children: Glycemic Control, Regulatory Acceptance and Optimization of Day to Day Use
UVA 先进自动胰岛素输送系统在幼儿临床护理中的应用:血糖控制、监管接受和日常使用优化
  • 批准号:
    10474818
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
Translation of the UVA Advanced Automated Insulin Delivery Systems to Clinical Care in Young Children: Glycemic Control, Regulatory Acceptance and Optimization of Day to Day Use
UVA 先进自动胰岛素输送系统在幼儿临床护理中的应用:血糖控制、监管接受和日常使用优化
  • 批准号:
    10265602
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Translation of the UVA Advanced Automated Insulin Delivery Systems to Clinical Care in Young Children: Glycemic Control, Regulatory Acceptance and Optimization of Day to Day Use
UVA 先进自动胰岛素输送系统在幼儿临床护理中的应用:血糖控制、监管接受和日常使用优化
  • 批准号:
    10470808
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
AIR Option 1: Technology Translation: Automated Targeted Destination Recognition for the Blind with Motion Deblurring
AIR 选项 1:技术翻译:通过运动去模糊为盲人自动识别目标目的地
  • 批准号:
    1343402
  • 财政年份:
    2013
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Visual Solutions for Automated Translation Between Spoken and Signed Languages
口语和手语之间自动翻译的视觉解决方案
  • 批准号:
    ARC : DP0210118
  • 财政年份:
    2002
  • 资助金额:
    --
  • 项目类别:
    Discovery Projects
Visual Solutions for Automated Translation Between Spoken and Signed Languages
口语和手语之间自动翻译的视觉解决方案
  • 批准号:
    DP0210118
  • 财政年份:
    2002
  • 资助金额:
    --
  • 项目类别:
    Discovery Projects
A Study of Automated Editing Technologies and their Application to Machine Translation
自动编辑技术及其在机器翻译中的应用研究
  • 批准号:
    13480097
  • 财政年份:
    2001
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
TRANSLATION OF AUTOMATED SEUENCER DATA TO DNA SEQUENCES
自动测序仪数据到 DNA 序列的翻译
  • 批准号:
    2208900
  • 财政年份:
    1992
  • 资助金额:
    --
  • 项目类别:
TRANSLATION OF AUTOMATED SEQUENCER DATA TO DNA SEQUENCES
自动测序仪数据到 DNA 序列的翻译
  • 批准号:
    3333740
  • 财政年份:
    1992
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了