权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Unlocking Digital Texts: Towards an Interoperable Text Framework

解锁数字文本：迈向可互操作的文本框架

基本信息

批准号：
AH/W005638/1
负责人：
Neil Jefferies
金额：
$ 25.63万
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=AH%2FW005638%2F1
关键词：
Unlocking Digital Texts Towards Interoperable

项目摘要

A key challenge faced by digital text projects is encouraging cultural institutions, researchers and the wider-public to reuse and build upon their resources. Despite the scholarly effort put into creating them, digital texts do not seem to have the same 'long tail' use patterns that data from other disciplines have. One of the chief impediments has been that texts are produced and stored in formats that are hard to reuse. Many contain detailed contextual, semantic and presentational markup embedded with the underlying text. Even when these texts are encoded according to robust standards, such as Text Encoding Initiative (TEI), the content and style of the coding is fundamentally shaped by the editors' specific fields of study, languages or cultural norms. Reusing materials often requires project-specific code that embodies those principles and norms and sometimes even replicating the infrastructure that delivers them. This sets a high bar that only the most skilled, determined and well-funded researchers are able to surmount.We aim to rectify this situation by defining an Interoperable Text Framework (ITF) and implementing exemplary test cases to demonstrate its strengths. We are not proposing a new format for encoding or storing text but rather a method for accessing and delivering textual resources (either whole documents or fragments) that are both readable by humans and also machine-friendly for computational analysis. When ITF is combined with other frameworks, such as IIIF and the W3C Web Annotation Data Model, it becomes possible to link texts, images, annotations and other online resources to construct narratives that can be visualised and navigated online.ITF has the potential to transform online texts and editions into active online discourse by allowing multiple new narratives and analyses to be created around texts, without compromising the integrity of the originals. The partner projects all require such a capability and have already developed specific approaches that can inform the development of a more general and flexible standard.ITF will enable researchers studying Samuel Beckett's works on the Beckett Digital Manuscript Project to construct their own narratives about his writing process. They can connect, display and analyse fragments from books in Beckett's library, where they were copied in notebook(s), and Beckett's subsequent intertextual reuse in his works. Readers can see and compare these multiple narratives and make their own inferences.For digital pedagogy and for digital editions, ITF will be the starting point of unprecedented global and local collaborations. The rich but disorganized papers of the early modern mathematician Thomas Harriot, for instance, will finally benefit from a flexible framework that does not assume linearity from front cover to back cover, but rather enables multiple points of entry for various readers. Enhanced navigability and annotation will make Harriot's papers legible not only for researchers collaborating worldwide but for classrooms, where teachers seek ready ways to contextualize mathematical discoveries within their cultural moments. ITF will also enable users to apply computational analysis tools to heterogeneous collections of text from diverse sources, which would have typically been avoided because they are difficult to use. A researcher could use existing text mining and machine learning tools to study patterns of citation and reference in the correspondence collections catalogues in Early Modern Letters Online by performing comparative topic and sentiment analyses of letter texts and the referenced works digitised by the Text Creation Partnership. By removing the technical and infrastructural barriers, ITF will help to ensure that textual resources will then be better able to live up to the promises of the FAIR principles [https://www.go-fair.org/fair-principles/]; they will be Findable, Accessible, Interoperable, and Reusable.

数字文本项目面临的一个关键挑战是鼓励文化机构、研究人员和广大公众重复使用和利用他们的资源。尽管学者们投入了大量精力来创建它们，但数字文本似乎并不像其他学科的数据那样具有相同的“长尾”使用模式。主要障碍之一是文本以难以重复使用的格式生成和存储。许多文档包含嵌入基础文本的详细的上下文、语义和表征性标记。即使这些文本是根据文本编码倡议(TEI)等稳健的标准进行编码的，编码的内容和风格也基本上是由编辑的特定研究领域、语言或文化规范决定的。重用材料通常需要特定于项目的代码来体现这些原则和规范，有时甚至需要复制交付这些原则和规范的基础设施。这设置了一个只有最熟练、最坚定和资金最充裕的研究人员才能克服的高标准。我们的目标是通过定义可互操作的文本框架(ITF)和实施示范测试案例来证明其优势，以纠正这种情况。我们提出的不是编码或存储文本的新格式，而是一种访问和传递文本资源(整个文档或片段)的方法，这些文本资源既可供人类阅读，也便于计算分析。当ITF与其他框架(如IIIF和W3C Web Annotation Data Model)相结合时，可以链接文本、图像、注释和其他在线资源，以构建可在线可视化和导航的叙事。ITF有可能将在线文本和版本转换为活跃的在线话语，因为它允许围绕文本创建多个新的叙事和分析，而不会影响原始文本的完整性。合作项目都需要这样的能力，并已经开发了具体的方法，可以为制定更通用和灵活的标准提供信息。ITF将使研究塞缪尔·贝克特关于贝克特数字手稿项目的作品的研究人员能够构建他们自己的关于他的写作过程的叙述。他们可以连接、展示和分析贝克特图书馆中书籍的片段，这些片段被复制到笔记本(S)中，以及贝克特随后在他的作品中的互文重复使用。读者可以看到和比较这些多重叙事，并做出自己的推断。对于数字教育学和数字版本，ITF将是史无前例的全球和地方合作的起点。例如，早期现代数学家托马斯·哈里奥特的丰富但杂乱无章的论文最终将受益于一个灵活的框架，它不假设从封面到封底的线性，而是允许不同读者的多个切入点。增强的导航性和注解将使哈里奥特的论文不仅对全球合作的研究人员，而且对课堂来说都是易懂的，教师在课堂上寻找现成的方法，在他们的文化时刻将数学发现联系起来。创新及科技基金还将使用户能够将计算分析工具应用于来自不同来源的不同文本集合，这通常会被避免，因为它们很难使用。研究人员可以利用现有的文本挖掘和机器学习工具，通过对信件文本和由Text Creation Partnership数字化的参考作品进行比较主题和情感分析，研究在线早期现代信函书信收藏目录中的引文和参考文献模式。通过消除技术和基础设施障碍，创新及科技基金将有助于确保文本资源能够更好地履行公平原则的承诺[https://www.go-fair.org/fair-principles/]；它们将是可查找、可访问、可互操作和可重复使用的。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Neil Jefferies其他文献

Contextual and Provenance Metadata in the Oxford University Research Archive (ORA)

牛津大学研究档案馆 (ORA) 中的上下文和出处元数据

DOI：
10.1007/978-3-319-24129-6_24
发表时间：
2015
期刊：
International Conference on Metadata and Semantics Research
影响因子：
0
作者：
Tanya Gray Jones;Lucie Burgess;Neil Jefferies;Anusha Ranganathan;S. Rumsey
通讯作者：
S. Rumsey

From compliance to curation

从合规到策展

DOI：
10.1177/0955749016657482
发表时间：
2016
期刊：
Alexandria: The Journal of National and International Library and Information Issues
影响因子：
0
作者：
Lucie Burgess;Neil Jefferies;S. Rumsey;John Southall;D. Tomkins;James A. J. Wilson
通讯作者：
James A. J. Wilson