权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Machines Reading Maps: Finding and Understanding Text on Maps

机器阅读地图：查找和理解地图上的文本

基本信息

批准号：
AH/V009400/1
负责人：
Katherine McDonough
金额：
$ 25.78万
依托单位：
The Alan Turing Institute
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=AH%2FV009400%2F1
关键词：
Machines Reading Maps Finding Understanding

项目摘要

'Machines Reading Maps' (MRM) aims to change the way that humanists and heritage professionals interact with map images. Maps constitute a significant body of global cultural heritage, and they are being scanned at a rapid pace in the US and UK. However, most critical investigation of maps continues on a small scale, through close 'readings' of a few maps. Individual maps communicate through visual grammars, supplemented by text. But text on maps is an almost entirely untapped source for understanding how knowledge of place is constructed. Investigating map content at scale can teach us about what has been preserved and omitted in the cartographic record. Such knowledge is a key starting point for understanding why using map text to enrich collection metadata may be advisable (when collection records lack any or only the most superficial geographic or locational information) or potentially harmful (when map text replicates colonial power structures). Additionally, the right maps can be hard to find. Large map collections tend to be among those that are rarely catalogued at item level: one record is meant to capture the metadata for dozens, if not thousands, of sheets. This is a well-known research obstacle, one that has made historical maps, like serialized sources such as newspapers, challenging sources in the humanities. We envision a future where map collections can be searched based on their spatial content, similar to the way that digitised newspaper collections enable full-text searching across scanned pages. This project contributes to reversing the fortunes of historic map collections at the moment when many of them are being made available online. MRM will enable researchers and cultural institutions to generate and analyze this data across collections and institutions, contributing to metadata creation and decolonization efforts, and enhancing accessibility and discoverability of un- or minimally-catalogued sheets. MRM builds on the project team's expertise around historical maps and map processing. Importantly, it refines an already robust tool for extracting text from maps (Strabo), developed by the US Co-I and colleagues on the Linked Maps project. Advancing software tools for handling new types of maps is essential to making text extraction a method that can be used in libraries and archives around the world. MRM generates data from scanned map collections and builds community among map and data curators, metadata and digital scholarship specialists, historians, and geographic information and data scientists. Working with partners at the National Library of Scotland (NLS), British Library (BL) and the Library of Congress (LC) who have extensive scanned map collections, this work unites research questions about the spatial experience of industrialization in 19th-c. Great Britain and social change in US cities during the 20th c. with GIScience expertise in using computational methods to process historical maps at scale. By predicting what type of content text on maps represents (roads, buildings, mountains, etc.) and linking to gazetteers (indexes of places and related metadata, like locations), we unlock the potential for users to find and interpret maps by the thousands. Cultural institutions can feed map text data back into their work to study the geographical coverage of their collections, or investigate differences between existing metadata and reported locations of map labels. On a sheet-by-sheet basis, for example, MARC fields for subjects and topics can be enriched by map text. After processing US and UK maps and linking them to historical gazetteers, we test linking UK map labels to Scottish Trade Directories, and matching Sanborn map data to US census records-making a significant contribution to both British and American digital historical data. Such research test cases exemplify the versatility of map labels as primary sources.

“机器阅读地图”（MRM）旨在改变人文主义者和遗产专家与地图图像互动的方式。地图是全球文化遗产的重要组成部分，在美国和英国，地图正在被快速扫描。然而，大多数对地图的批判性研究仍在小范围内继续，通过对少数地图的近距离“阅读”。各个地图通过视觉语法进行交流，并辅以文本。但是地图上的文字是一个几乎完全未开发的资源，可以用来理解地方知识是如何构建的。调查地图内容的比例可以告诉我们什么已经被保留和省略的制图记录。这些知识是理解为什么使用地图文本来丰富藏品元数据可能是可取的（当藏品记录缺乏任何或只有最肤浅的地理或位置信息时）或可能有害的（当地图文本复制殖民权力结构时）的关键起点。此外，正确的地图可能很难找到。大型地图集往往是那些很少在项目级别编目的地图集：一个记录意味着捕获数十张（如果不是数千张）图纸的元数据。这是一个众所周知的研究障碍，它使得历史地图，就像报纸等连载资料一样，挑战人文学科的资料。我们设想的未来，地图集可以根据其空间内容进行搜索，类似于数字化的报纸收藏，使扫描页面的全文搜索。这个项目有助于扭转历史地图收藏的命运，因为其中许多地图收藏都可以在网上获得。MRM将使研究人员和文化机构能够跨馆藏和机构生成和分析这些数据，为元数据创建和非殖民化工作做出贡献，并提高未编目或最低编目表的可访问性和可重复性。MRM建立在项目团队在历史地图和地图处理方面的专业知识基础上。重要的是，它改进了一个已经很强大的工具，用于从地图中提取文本（Strabo），该工具由美国Co-I及其同事在Linked Maps项目上开发。处理新型地图的先进软件工具对于使文本提取成为一种可用于世界各地图书馆和档案馆的方法至关重要。MRM从扫描的地图集合中生成数据，并在地图和数据策展人、元数据和数字奖学金专家、历史学家以及地理信息和数据科学家之间建立社区。与苏格兰国家图书馆（NLS），大英图书馆（BL）和国会图书馆（LC）的合作伙伴合作，他们拥有广泛的扫描地图收藏，这项工作结合了19世纪工业化空间经验的研究问题。20世纪英国与美国城市的社会变迁。GIScience在使用计算方法处理历史地图方面的专业知识。通过预测地图上的内容文本代表什么类型（道路、建筑物、山脉等），通过与地名录（地名索引和相关元数据，如位置）的链接，我们为用户找到和解释成千上万的地图释放了潜力。文化机构可以将地图文本数据反馈到其工作中，以研究其收藏的地理覆盖范围，或调查现有元数据与报告的地图标签位置之间的差异。例如，在逐页的基础上，可以通过地图文本丰富主题和主题的MARC字段。在处理美国和英国的地图，并将其链接到历史地名词典，我们测试英国地图标签链接到苏格兰贸易目录，并匹配桑伯恩地图数据，美国人口普查记录，使英国和美国的数字历史数据的重大贡献。这样的研究测试案例验证了地图标签作为主要来源的多功能性。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation

DOI：
10.48550/arxiv.2210.12213
发表时间：
2022-10
期刊：
ArXiv
影响因子：
0
作者：
Zekun Li;Jina Kim;Yao-Yi Chiang;Muhao Chen
通讯作者：
Zekun Li;Jina Kim;Yao-Yi Chiang;Muhao Chen

Incorporating spatial context for post-OCR in map images

在地图图像中纳入 OCR 后的空间上下文

DOI：
10.1145/3557918.3565864
发表时间：
2022
期刊：
影响因子：
0
作者：
Namgung M
通讯作者：
Namgung M

Poster: Machines Reading Maps: Finding and Understanding Text on Maps

海报：机器阅读地图：查找和理解地图上的文本

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Mcdonough K
通讯作者：
Mcdonough K

Machines Reading Maps: Unlocking Historical Maps with Machine Learning and Semantic Web Technologies

机器阅读地图：利用机器学习和语义网技术解锁历史地图

DOI：
10.5281/zenodo.6802039
发表时间：
2022
期刊：
影响因子：
0
作者：
Vitale V
通讯作者：
Vitale V

Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

DOI：
10.1145/3486635.3491070
发表时间：
2021-11
期刊：
Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery
影响因子：
0
作者：
Zekun Li;Runyu Guan;Qianmu Yu;Yao-Yi Chiang;Craig A. Knoblock
通讯作者：
Zekun Li;Runyu Guan;Qianmu Yu;Yao-Yi Chiang;Craig A. Knoblock

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Katherine McDonough其他文献

A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

十九世纪英文报纸地名解析数据集

DOI：
10.5334/johd.56
发表时间：
2022
期刊：
Journal of Open Humanities Data
影响因子：
0
作者：
Mariona Coll Ardanuy;D. Beavan;K. Beelen;Kasra Hosseini;J. Lawrence;Katherine McDonough;F. Nanni;Daniel Alexander van Strien;Daniel C. S. Wilson
通讯作者：
Daniel C. S. Wilson

Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions

对百科全书文章进行分类：比较机器和深度学习方法并探索它们的预测

DOI：
10.1016/j.datak.2022.102098
发表时间：
2022
期刊：
Data Knowl. Eng.
影响因子：
0
作者：
Alice Brenon;Ludovic Moncla;Katherine McDonough
通讯作者：
Katherine McDonough

Resolving places, past and present: toponym resolution in historical british newspapers using multiple resources

解析过去和现在的地点：使用多种资源的英国历史报纸中的地名解析

DOI：
10.1145/3371140.3371143
发表时间：
2019
期刊：
Proceedings of the 13th Workshop on Geographic Information Retrieval
影响因子：
0
作者：
Mariona Coll Ardanuy;Katherine McDonough;A. Krause;Daniel C. S. Wilson;Kasra Hosseini;Daniel Alexander van Strien
通讯作者：
Daniel Alexander van Strien