Machines Reading Maps: Finding and Understanding Text on Maps

机器阅读地图:查找和理解地图上的文本

基本信息

  • 批准号:
    AH/V009400/1
  • 负责人:
  • 金额:
    $ 25.78万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2021
  • 资助国家:
    英国
  • 起止时间:
    2021 至 无数据
  • 项目状态:
    已结题

项目摘要

'Machines Reading Maps' (MRM) aims to change the way that humanists and heritage professionals interact with map images. Maps constitute a significant body of global cultural heritage, and they are being scanned at a rapid pace in the US and UK. However, most critical investigation of maps continues on a small scale, through close 'readings' of a few maps. Individual maps communicate through visual grammars, supplemented by text. But text on maps is an almost entirely untapped source for understanding how knowledge of place is constructed. Investigating map content at scale can teach us about what has been preserved and omitted in the cartographic record. Such knowledge is a key starting point for understanding why using map text to enrich collection metadata may be advisable (when collection records lack any or only the most superficial geographic or locational information) or potentially harmful (when map text replicates colonial power structures). Additionally, the right maps can be hard to find. Large map collections tend to be among those that are rarely catalogued at item level: one record is meant to capture the metadata for dozens, if not thousands, of sheets. This is a well-known research obstacle, one that has made historical maps, like serialized sources such as newspapers, challenging sources in the humanities. We envision a future where map collections can be searched based on their spatial content, similar to the way that digitised newspaper collections enable full-text searching across scanned pages. This project contributes to reversing the fortunes of historic map collections at the moment when many of them are being made available online. MRM will enable researchers and cultural institutions to generate and analyze this data across collections and institutions, contributing to metadata creation and decolonization efforts, and enhancing accessibility and discoverability of un- or minimally-catalogued sheets. MRM builds on the project team's expertise around historical maps and map processing. Importantly, it refines an already robust tool for extracting text from maps (Strabo), developed by the US Co-I and colleagues on the Linked Maps project. Advancing software tools for handling new types of maps is essential to making text extraction a method that can be used in libraries and archives around the world. MRM generates data from scanned map collections and builds community among map and data curators, metadata and digital scholarship specialists, historians, and geographic information and data scientists. Working with partners at the National Library of Scotland (NLS), British Library (BL) and the Library of Congress (LC) who have extensive scanned map collections, this work unites research questions about the spatial experience of industrialization in 19th-c. Great Britain and social change in US cities during the 20th c. with GIScience expertise in using computational methods to process historical maps at scale. By predicting what type of content text on maps represents (roads, buildings, mountains, etc.) and linking to gazetteers (indexes of places and related metadata, like locations), we unlock the potential for users to find and interpret maps by the thousands. Cultural institutions can feed map text data back into their work to study the geographical coverage of their collections, or investigate differences between existing metadata and reported locations of map labels. On a sheet-by-sheet basis, for example, MARC fields for subjects and topics can be enriched by map text. After processing US and UK maps and linking them to historical gazetteers, we test linking UK map labels to Scottish Trade Directories, and matching Sanborn map data to US census records-making a significant contribution to both British and American digital historical data. Such research test cases exemplify the versatility of map labels as primary sources.
“机器阅读地图”(MRM)旨在改变人文主义者和遗产专家与地图图像互动的方式。地图是全球文化遗产的重要组成部分,在美国和英国,地图正在被快速扫描。然而,大多数对地图的批判性研究仍在小范围内继续,通过对少数地图的近距离“阅读”。各个地图通过视觉语法进行交流,并辅以文本。但是地图上的文字是一个几乎完全未开发的资源,可以用来理解地方知识是如何构建的。调查地图内容的比例可以告诉我们什么已经被保留和省略的制图记录。这些知识是理解为什么使用地图文本来丰富藏品元数据可能是可取的(当藏品记录缺乏任何或只有最肤浅的地理或位置信息时)或可能有害的(当地图文本复制殖民权力结构时)的关键起点。此外,正确的地图可能很难找到。大型地图集往往是那些很少在项目级别编目的地图集:一个记录意味着捕获数十张(如果不是数千张)图纸的元数据。这是一个众所周知的研究障碍,它使得历史地图,就像报纸等连载资料一样,挑战人文学科的资料。我们设想的未来,地图集可以根据其空间内容进行搜索,类似于数字化的报纸收藏,使扫描页面的全文搜索。这个项目有助于扭转历史地图收藏的命运,因为其中许多地图收藏都可以在网上获得。MRM将使研究人员和文化机构能够跨馆藏和机构生成和分析这些数据,为元数据创建和非殖民化工作做出贡献,并提高未编目或最低编目表的可访问性和可重复性。MRM建立在项目团队在历史地图和地图处理方面的专业知识基础上。重要的是,它改进了一个已经很强大的工具,用于从地图中提取文本(Strabo),该工具由美国Co-I及其同事在Linked Maps项目上开发。处理新型地图的先进软件工具对于使文本提取成为一种可用于世界各地图书馆和档案馆的方法至关重要。MRM从扫描的地图集合中生成数据,并在地图和数据策展人、元数据和数字奖学金专家、历史学家以及地理信息和数据科学家之间建立社区。与苏格兰国家图书馆(NLS),大英图书馆(BL)和国会图书馆(LC)的合作伙伴合作,他们拥有广泛的扫描地图收藏,这项工作结合了19世纪工业化空间经验的研究问题。20世纪英国与美国城市的社会变迁。GIScience在使用计算方法处理历史地图方面的专业知识。通过预测地图上的内容文本代表什么类型(道路、建筑物、山脉等),通过与地名录(地名索引和相关元数据,如位置)的链接,我们为用户找到和解释成千上万的地图释放了潜力。文化机构可以将地图文本数据反馈到其工作中,以研究其收藏的地理覆盖范围,或调查现有元数据与报告的地图标签位置之间的差异。例如,在逐页的基础上,可以通过地图文本丰富主题和主题的MARC字段。在处理美国和英国的地图,并将其链接到历史地名词典,我们测试英国地图标签链接到苏格兰贸易目录,并匹配桑伯恩地图数据,美国人口普查记录,使英国和美国的数字历史数据的重大贡献。这样的研究测试案例验证了地图标签作为主要来源的多功能性。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation
  • DOI:
    10.48550/arxiv.2210.12213
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zekun Li;Jina Kim;Yao-Yi Chiang;Muhao Chen
  • 通讯作者:
    Zekun Li;Jina Kim;Yao-Yi Chiang;Muhao Chen
Incorporating spatial context for post-OCR in map images
在地图图像中纳入 OCR 后的空间上下文
  • DOI:
    10.1145/3557918.3565864
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Namgung M
  • 通讯作者:
    Namgung M
Poster: Machines Reading Maps: Finding and Understanding Text on Maps
海报:机器阅读地图:查找和理解地图上的文本
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mcdonough K
  • 通讯作者:
    Mcdonough K
Machines Reading Maps: Unlocking Historical Maps with Machine Learning and Semantic Web Technologies
机器阅读地图:利用机器学习和语义网技术解锁历史地图
  • DOI:
    10.5281/zenodo.6802039
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Vitale V
  • 通讯作者:
    Vitale V
Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Katherine McDonough其他文献

A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers
十九世纪英文报纸地名解析数据集
  • DOI:
    10.5334/johd.56
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mariona Coll Ardanuy;D. Beavan;K. Beelen;Kasra Hosseini;J. Lawrence;Katherine McDonough;F. Nanni;Daniel Alexander van Strien;Daniel C. S. Wilson
  • 通讯作者:
    Daniel C. S. Wilson
Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions
对百科全书文章进行分类:比较机器和深度学习方法并探索它们的预测
  • DOI:
    10.1016/j.datak.2022.102098
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alice Brenon;Ludovic Moncla;Katherine McDonough
  • 通讯作者:
    Katherine McDonough
Resolving places, past and present: toponym resolution in historical british newspapers using multiple resources
解析过去和现在的地点:使用多种资源的英国历史报纸中的地名解析
Mapping the Encyclopédie: Working Towards an Early Modern Digital Gazetteer
绘制百科全书:致力于早期现代数字地名词典
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Katherine McDonough;M. V. D. Camp
  • 通讯作者:
    M. V. D. Camp

Katherine McDonough的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

精子发生中mRNA下游开放阅读框(downstream Open Reading Frame,dORF)的功能研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目

相似海外基金

HSI Implementation and Evaluation Project: Scaling and Extending Exploratory Reading Groups to Strengthen Computing Pathways
HSI 实施和评估项目:扩大和扩展探索性阅读小组以加强计算途径
  • 批准号:
    2414332
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Continuing Grant
I-Corps: Artificially Intelligent Dialogic Reading Aid
I-Corps:人工智能对话阅读辅助工具
  • 批准号:
    2349210
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Standard Grant
SBIR Phase II: Computer-based co-reading for students with reading disabilities
SBIR 第二阶段:为有阅读障碍的学生提供基于计算机的共同阅读
  • 批准号:
    2321439
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Cooperative Agreement
An Eye-Tracking Study: Exploring Integrated Reading Tasks in the New Format of the English Common Test for Japanese University Admissions
眼动追踪研究:探索日本大学入学英语通用考试新形式中的综合阅读任务
  • 批准号:
    24K04032
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
International Institutional Awards Tranche 2 Reading
国际机构奖第二期阅读
  • 批准号:
    BB/Z514615/1
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Research Grant
Readingに関する自律的英語学習メカニズムの解明
与阅读相关的英语自主学习机制阐释
  • 批准号:
    24K04152
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Digital building blocks of elementary school foreign language reading motivation
小学外语阅读动机的数字积木
  • 批准号:
    23K25344
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Affective Computing Models: from Facial Expression to Mind-Reading
情感计算模型:从面部表情到读心术
  • 批准号:
    EP/Y03726X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Research Grant
University of Reading and Starlight Xpress Limited KTP 23_24 R1
雷丁大学和 Starlight Xpress Limited KTP 23_24 R1
  • 批准号:
    10070106
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Knowledge Transfer Partnership
International Institutional Awards Tranche 1 Reading
国际机构奖第一期阅读
  • 批准号:
    BB/Y514184/1
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了