Unlocking Digital Texts: Towards an Interoperable Text Framework
解锁数字文本:迈向可互操作的文本框架
基本信息
- 批准号:AH/W005638/1
- 负责人:
- 金额:$ 25.63万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2022
- 资助国家:英国
- 起止时间:2022 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
A key challenge faced by digital text projects is encouraging cultural institutions, researchers and the wider-public to reuse and build upon their resources. Despite the scholarly effort put into creating them, digital texts do not seem to have the same 'long tail' use patterns that data from other disciplines have. One of the chief impediments has been that texts are produced and stored in formats that are hard to reuse. Many contain detailed contextual, semantic and presentational markup embedded with the underlying text. Even when these texts are encoded according to robust standards, such as Text Encoding Initiative (TEI), the content and style of the coding is fundamentally shaped by the editors' specific fields of study, languages or cultural norms. Reusing materials often requires project-specific code that embodies those principles and norms and sometimes even replicating the infrastructure that delivers them. This sets a high bar that only the most skilled, determined and well-funded researchers are able to surmount.We aim to rectify this situation by defining an Interoperable Text Framework (ITF) and implementing exemplary test cases to demonstrate its strengths. We are not proposing a new format for encoding or storing text but rather a method for accessing and delivering textual resources (either whole documents or fragments) that are both readable by humans and also machine-friendly for computational analysis. When ITF is combined with other frameworks, such as IIIF and the W3C Web Annotation Data Model, it becomes possible to link texts, images, annotations and other online resources to construct narratives that can be visualised and navigated online.ITF has the potential to transform online texts and editions into active online discourse by allowing multiple new narratives and analyses to be created around texts, without compromising the integrity of the originals. The partner projects all require such a capability and have already developed specific approaches that can inform the development of a more general and flexible standard.ITF will enable researchers studying Samuel Beckett's works on the Beckett Digital Manuscript Project to construct their own narratives about his writing process. They can connect, display and analyse fragments from books in Beckett's library, where they were copied in notebook(s), and Beckett's subsequent intertextual reuse in his works. Readers can see and compare these multiple narratives and make their own inferences.For digital pedagogy and for digital editions, ITF will be the starting point of unprecedented global and local collaborations. The rich but disorganized papers of the early modern mathematician Thomas Harriot, for instance, will finally benefit from a flexible framework that does not assume linearity from front cover to back cover, but rather enables multiple points of entry for various readers. Enhanced navigability and annotation will make Harriot's papers legible not only for researchers collaborating worldwide but for classrooms, where teachers seek ready ways to contextualize mathematical discoveries within their cultural moments. ITF will also enable users to apply computational analysis tools to heterogeneous collections of text from diverse sources, which would have typically been avoided because they are difficult to use. A researcher could use existing text mining and machine learning tools to study patterns of citation and reference in the correspondence collections catalogues in Early Modern Letters Online by performing comparative topic and sentiment analyses of letter texts and the referenced works digitised by the Text Creation Partnership. By removing the technical and infrastructural barriers, ITF will help to ensure that textual resources will then be better able to live up to the promises of the FAIR principles [https://www.go-fair.org/fair-principles/]; they will be Findable, Accessible, Interoperable, and Reusable.
数字文本项目面临的一个关键挑战是鼓励文化机构、研究人员和广大公众重复使用和利用他们的资源。尽管学者们投入了大量精力来创建它们,但数字文本似乎并不像其他学科的数据那样具有相同的“长尾”使用模式。主要障碍之一是文本以难以重复使用的格式生成和存储。许多文档包含嵌入基础文本的详细的上下文、语义和表征性标记。即使这些文本是根据文本编码倡议(TEI)等稳健的标准进行编码的,编码的内容和风格也基本上是由编辑的特定研究领域、语言或文化规范决定的。重用材料通常需要特定于项目的代码来体现这些原则和规范,有时甚至需要复制交付这些原则和规范的基础设施。这设置了一个只有最熟练、最坚定和资金最充裕的研究人员才能克服的高标准。我们的目标是通过定义可互操作的文本框架(ITF)和实施示范测试案例来证明其优势,以纠正这种情况。我们提出的不是编码或存储文本的新格式,而是一种访问和传递文本资源(整个文档或片段)的方法,这些文本资源既可供人类阅读,也便于计算分析。当ITF与其他框架(如IIIF和W3C Web Annotation Data Model)相结合时,可以链接文本、图像、注释和其他在线资源,以构建可在线可视化和导航的叙事。ITF有可能将在线文本和版本转换为活跃的在线话语,因为它允许围绕文本创建多个新的叙事和分析,而不会影响原始文本的完整性。合作项目都需要这样的能力,并已经开发了具体的方法,可以为制定更通用和灵活的标准提供信息。ITF将使研究塞缪尔·贝克特关于贝克特数字手稿项目的作品的研究人员能够构建他们自己的关于他的写作过程的叙述。他们可以连接、展示和分析贝克特图书馆中书籍的片段,这些片段被复制到笔记本(S)中,以及贝克特随后在他的作品中的互文重复使用。读者可以看到和比较这些多重叙事,并做出自己的推断。对于数字教育学和数字版本,ITF将是史无前例的全球和地方合作的起点。例如,早期现代数学家托马斯·哈里奥特的丰富但杂乱无章的论文最终将受益于一个灵活的框架,它不假设从封面到封底的线性,而是允许不同读者的多个切入点。增强的导航性和注解将使哈里奥特的论文不仅对全球合作的研究人员,而且对课堂来说都是易懂的,教师在课堂上寻找现成的方法,在他们的文化时刻将数学发现联系起来。创新及科技基金还将使用户能够将计算分析工具应用于来自不同来源的不同文本集合,这通常会被避免,因为它们很难使用。研究人员可以利用现有的文本挖掘和机器学习工具,通过对信件文本和由Text Creation Partnership数字化的参考作品进行比较主题和情感分析,研究在线早期现代信函书信收藏目录中的引文和参考文献模式。通过消除技术和基础设施障碍,创新及科技基金将有助于确保文本资源能够更好地履行公平原则的承诺[https://www.go-fair.org/fair-principles/];它们将是可查找、可访问、可互操作和可重复使用的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Neil Jefferies其他文献
Contextual and Provenance Metadata in the Oxford University Research Archive (ORA)
牛津大学研究档案馆 (ORA) 中的上下文和出处元数据
- DOI:
10.1007/978-3-319-24129-6_24 - 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Tanya Gray Jones;Lucie Burgess;Neil Jefferies;Anusha Ranganathan;S. Rumsey - 通讯作者:
S. Rumsey
From compliance to curation
从合规到策展
- DOI:
10.1177/0955749016657482 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Lucie Burgess;Neil Jefferies;S. Rumsey;John Southall;D. Tomkins;James A. J. Wilson - 通讯作者:
James A. J. Wilson
Neil Jefferies的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
超灵敏高分辨的Digital-CRISPR技术用于免扩增的多重核酸检测
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于Digital Twin的数控机床智能运行维护方法研究
- 批准号:51875323
- 批准年份:2018
- 资助金额:60.0 万元
- 项目类别:面上项目
基于数字PCR(digital-PCR)技术的耳聋无创产前检测研究
- 批准号:LQ19H040016
- 批准年份:2018
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于Digital LAMP技术的循环肿瘤细胞检测和分型新方法研究
- 批准号:81702102
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于表面工程的外泌体digital PCR定量分析体系的构建及转化医学研究
- 批准号:81702959
- 批准年份:2017
- 资助金额:10.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Jewish Babylonian Aramaic Bowl Texts: Uncovering the Operational Mechanisms of the Practice using Digital Methods
犹太巴比伦阿拉姆语碗文本:使用数字方法揭示实践的运作机制
- 批准号:
2406796 - 财政年份:2020
- 资助金额:
$ 25.63万 - 项目类别:
Studentship
Digital Collection German Colonialism - Establishment of a digital collection of texts and integration in the research Infrastructure CLARIN-D
德国殖民主义数字馆藏 - 建立数字文本馆藏并整合到研究基础设施 CLARIN-D 中
- 批准号:
324473798 - 财政年份:2017
- 资助金额:
$ 25.63万 - 项目类别:
Cataloguing and Digitisation (Scientific Library Services and Information Systems)
Creating a Chronotopic Ground for the Mapping of Literary Texts: Innovative Data Visualisation and Spatial Interpretation in the Digital Medium
为文学文本的映射创造时间主题基础:数字媒体中的创新数据可视化和空间解释
- 批准号:
AH/P00895X/1 - 财政年份:2017
- 资助金额:
$ 25.63万 - 项目类别:
Research Grant
Cognitive processing traits of multi-sensory digital texts
多感官数字文本的认知处理特征
- 批准号:
15K00389 - 财政年份:2015
- 资助金额:
$ 25.63万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
From Latin to French: Compilation and analysis of a digital corpus of late Latin and old French texts
从拉丁语到法语:晚期拉丁语和古法语文本数字语料库的编译和分析
- 批准号:
265325513 - 财政年份:2015
- 资助金额:
$ 25.63万 - 项目类别:
Research Grants
Research for Converting Post-Kyoho (1716-1736) Gidayu-Bushi Joruri Texts into Digital Archives
将后巨峰(1716-1736)义大友武士净琉璃文本转换为数字档案的研究
- 批准号:
24320052 - 财政年份:2012
- 资助金额:
$ 25.63万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Study of integrating digital texts into mobile learning under the ubiquitous environments
泛在环境下数字文本融入移动学习研究
- 批准号:
23520698 - 财政年份:2011
- 资助金额:
$ 25.63万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Philological Treatment and Complete Digital Publication of Hittite Ritual Texts (CTH 390-500)
赫梯仪式文本的语言学处理和完整数字出版 (CTH 390-500)
- 批准号:
178728154 - 财政年份:2010
- 资助金额:
$ 25.63万 - 项目类别:
Research Grants
Research on Adaptive Recommendation Technology in Bidirectional Recommendation System for Learning Web Digital Texts
学习网络数字文本双向推荐系统自适应推荐技术研究
- 批准号:
21500908 - 财政年份:2009
- 资助金额:
$ 25.63万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Literacy Teaching in the Changing Communications Environment: Reading and Writing Multimodal and Digital Texts
不断变化的通信环境中的识字教学:多模式和数字文本的阅读和写作
- 批准号:
DP0665618 - 财政年份:2006
- 资助金额:
$ 25.63万 - 项目类别:
Discovery Projects