Collaborative Research: RI: Small: Unsupervised Islamicate Manuscript Transcription via Lacunae Reconstruction
合作研究:RI:小型:通过缺口重建进行无监督伊斯兰手稿转录
基本信息
- 批准号:2200334
- 负责人:
- 金额:$ 29.78万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-07-01 至 2025-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award tackles handwritten text recognition (HTR, the task of automatically transcribing images of handwritten manuscripts into symbolic text) for Islamicate manuscripts, a domain that encompasses Persian and Arabic written traditions originating in the premodern Islamic world (7th-19th centuries). HTR for modern text is itself a challenging problem that has received substantial attention from the fields of machine learning (ML) and artificial intelligence (AI). However, the predominance of modern text in HTR research is, to some extent, waning: current techniques are relatively robust on modern data, and contemporary written media production is already almost entirely digital. In contrast, historical manuscripts have received comparatively less attention from ML and AI, and at the same time represent both an exceptional opportunity for impact and a set of unique challenges for ML techniques. Specifically, the written traditions of the Islamicate world together form one of the largest -- if not the largest -- archives of human cultural production of the premodern world. Scanning and digitization efforts over the last decade have made images of Islamicate manuscripts in a large number of collections available to the public. However, this data remains ‘locked’ for most scholarly uses because it has not been transcribed into symbolic text which is required for many types of analysis. In fact, the script styles used in Islamicate manuscripts -- 'scribal hands' -- vary so widely and differ so substantially from modern forms that even manual close reading of these texts requires expert training and is thus limited to a small subset of researchers. The primary outcome of this project will be new techniques that 'unlock' the Islamicate written tradition by accurately transcribing it. As a result, this project has the potential to be transformative for humanities disciplines such as Islamic and Near Eastern Studies by enabling libraries to accurately transcribe entire collections and, further, by allowing individual researchers to accurately transcribe manuscripts outside the western canon. Finally, this research will also support interdisciplinary training of a diverse set of graduate students at the University of California San Diego and the University of Maryland.Current techniques for HTR require large amounts of in-domain supervised training data in order to produce highly accurate transcriptions. The neural architectures behind these modern methods are capable of generalizing, to some degree, across modern handwriting styles when trained on larger and more diverse collections of transcribed data. However, their limitations make these techniques impractical for large-scale transcription of Islamicate texts for two reasons: (1) scribal hand variation across Islamicate manuscripts is much more pronounced than stylistic variation in modern handwriting; and (2) transcriptions of Islamicate manuscripts that can be used as supervised training data are extremely scarce because accurate manual transcription requires expert training. This project will develop a new unsupervised learning framework for Islamicate HTR centered around a novel pretraining task: lacuna reconstruction. The new approach trains a neural encoder for images of manuscript text lines by learning to reconstruct masked regions -- i.e. lacaunae -- of unlabeled manuscript images. This completely unsupervised training criterion implicitly incentivizes the model to discover and encode discreteThis award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项致力于伊斯兰手稿的手写文本识别(HTR,自动将手写手稿图像转录为符号文本的任务),这一领域包括起源于前现代伊斯兰世界(7 -19世纪)的波斯和阿拉伯文字传统。现代文本的HTR本身就是一个具有挑战性的问题,已经受到了机器学习(ML)和人工智能(AI)领域的大量关注。然而,在某种程度上,现代文本在HTR研究中的主导地位正在减弱:当前的技术在现代数据上相对强大,当代书面媒体生产几乎已经完全数字化。相比之下,历史手稿受到ML和AI的关注相对较少,同时也代表了ML技术产生影响的特殊机会和一系列独特挑战。具体来说,伊斯兰世界的书面传统共同构成了前现代世界人类文化生产的最大——如果不是最大——档案之一。在过去的十年中,扫描和数字化的努力使公众可以获得大量收藏的伊斯兰手稿的图像。然而,对于大多数学术用途来说,这些数据仍然是“锁定”的,因为它没有被转录成许多类型分析所需的符号文本。事实上,伊斯兰手稿中使用的文字风格——“抄写手”——差异如此之大,与现代形式差异如此之大,以至于即使是手工细读这些文本也需要专家培训,因此仅限于一小部分研究人员。该项目的主要成果将是通过准确地转录伊斯兰文字传统来“解锁”它的新技术。因此,该项目有可能对伊斯兰和近东研究等人文学科产生变革,使图书馆能够准确地转录整个收藏,进一步说,允许个人研究人员准确地转录西方正典以外的手稿。最后,这项研究还将支持加州大学圣地亚哥分校和马里兰大学的研究生的跨学科培训。当前的HTR技术需要大量的域内监督训练数据,以产生高度准确的转录。在某种程度上,这些现代方法背后的神经架构能够在更大、更多样化的转录数据集合上进行训练,从而泛化现代手写风格。然而,它们的局限性使这些技术不适合大规模抄写伊斯兰文本,原因有二:(1)伊斯兰手稿的抄写笔迹变化比现代笔迹的风格变化要明显得多;(2)可作为监督训练数据的伊斯兰手稿转录极其稀缺,因为准确的人工转录需要专家培训。该项目将为伊斯兰HTR开发一个新的无监督学习框架,围绕一个新的预训练任务:空隙重建。新方法通过学习重建未标记的手稿图像的掩膜区域(即lacacaunae)来训练手稿文本行图像的神经编码器。这个完全无监督的训练标准隐含地激励了模型去发现和编码离散。这个奖项反映了NSF的法定使命,并且通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Matthew Miller其他文献
Calibration of a crop model to irrigated water use using a genetic algorithm
使用遗传算法校准作物模型以适应灌溉用水
- DOI:
10.5194/hess-13-1467-2009 - 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Tom Bulatewicz;Wei Jin;S. Staggenborg;S. Lauwo;Matthew Miller;Sanjoy Das;Daniel Andresen;J. Peterson;D. R. Steward;S. Welch - 通讯作者:
S. Welch
Exploring Religion and Christianity as Points of Diversity Within Counseling Training Programs
探索宗教和基督教作为咨询培训项目中的多样性点
- DOI:
10.1080/15507394.2007.10012397 - 发表时间:
2007 - 期刊:
- 影响因子:0.6
- 作者:
R. Steward;Matthew Miller;Amber Roberts;Rebecca Slavin;Alfiee Breeland;Douglas Neil - 通讯作者:
Douglas Neil
Reducing Suicide Without Affecting Underlying Mental Health Theoretical Underpinnings and a Review of the Evidence Base Linking the Availability of Lethal Means and Suicide
在不影响潜在心理健康理论基础的情况下减少自杀以及对将致命手段的可用性与自杀联系起来的证据基础的审查
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
D. Azrael;Matthew Miller - 通讯作者:
Matthew Miller
Time‐resolved MR angiography of renal artery stenosis in a swine model at 3 Tesla using gadobutrol with digital subtraction angiography correlation
使用钆布醇与数字减影血管造影相关性,在 3 特斯拉下对猪模型中的肾动脉狭窄进行时间分辨 MR 血管造影
- DOI:
10.1002/jmri.23696 - 发表时间:
2012 - 期刊:
- 影响因子:4.4
- 作者:
J. Morelli;F. Ai;V. Runge;Wei Zhang;Xiaoming Li;P. Schmitt;G. McNeal;Henrick J. Michaely;S. Schoenberg;Matthew Miller;Clint M Gerdes;Spencer T. Sincleair;H. Spratt;U. Attenberger - 通讯作者:
U. Attenberger
Guns and Gun Threats at College
大学里的枪支和枪支威胁
- DOI:
- 发表时间:
2002 - 期刊:
- 影响因子:2.4
- 作者:
Matthew Miller;D. Hemenway;H. Wechsler - 通讯作者:
H. Wechsler
Matthew Miller的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Matthew Miller', 18)}}的其他基金
Collaborative Research: CyberTraining: Implementation: Medium: CyberInfrastructure Training and Education for Synchrotron X-Ray Science (X-CITE)
合作研究:网络培训:实施:媒介:同步加速器 X 射线科学网络基础设施培训和教育 (X-CITE)
- 批准号:
2320374 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
SBIR Phase II: Redefining Air Conditioning: Commercializing Hyper-Efficient Rotary Heat Exchanger for Residential and Commercial HVAC Energy Reduction
SBIR 第二阶段:重新定义空调:将超高效旋转式热交换器商业化,用于住宅和商业 HVAC 节能
- 批准号:
2026074 - 财政年份:2020
- 资助金额:
$ 29.78万 - 项目类别:
Cooperative Agreement
Collaborative Research: Evolution of Strain and Microstructures in the Presence of Solute Hydrogen - a Mulitscale Experimental Investigation
合作研究:溶质氢存在下应变和微观结构的演化——多尺度实验研究
- 批准号:
1406978 - 财政年份:2014
- 资助金额:
$ 29.78万 - 项目类别:
Continuing Grant
Workshop: Promote Use of High Energy X-Ray Diffraction Experiments & Detailed Computational Analyses for Understanding Multiscale Phenomena in Crystalline Materials, 13-15 Oct 2011
研讨会:推广高能 X 射线衍射实验的使用
- 批准号:
1132808 - 财政年份:2011
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Fostering an Induction into Authentic Research in the Freshman/Sophomore Sequence
促进大一/大二学生进行真实研究
- 批准号:
1044419 - 财政年份:2011
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
GOALI/Collaborative Research: Understanding Cracking and Defect Formation During AlN Crystal Growth
GOALI/合作研究:了解 AlN 晶体生长过程中的裂纹和缺陷形成
- 批准号:
0928257 - 财政年份:2009
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
SBIR Phase I: A WebTurbine for Lightweight, Ubiquitous Internet Publishing
SBIR 第一阶段:用于轻量级、无处不在的互联网出版的 WebTurbine
- 批准号:
0232377 - 财政年份:2003
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
A Methodology for Designing Fatigue Resistant Materials
抗疲劳材料的设计方法
- 批准号:
0301635 - 财政年份:2003
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
CAREER: Linking Processing Practice to the Performance of Materials in Design
职业:将加工实践与设计中材料的性能联系起来
- 批准号:
9702017 - 财政年份:1997
- 资助金额:
$ 29.78万 - 项目类别:
Continuing Grant
Mathematical Sciences: Applications of Linkage
数学科学:联动的应用
- 批准号:
8802888 - 财政年份:1988
- 资助金额:
$ 29.78万 - 项目类别:
Continuing Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312841 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312842 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
- 批准号:
2313131 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313151 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Continuing Grant
Collaborative Research: RI: Medium: Principles for Optimization, Generalization, and Transferability via Deep Neural Collapse
合作研究:RI:中:通过深度神经崩溃实现优化、泛化和可迁移性的原理
- 批准号:
2312840 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Deep Constrained Learning for Power Systems
合作研究:RI:小型:电力系统的深度约束学习
- 批准号:
2345528 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
- 批准号:
2232298 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
- 批准号:
2232055 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant
Collaborative Research: RI: Medium: Lie group representation learning for vision
协作研究:RI:中:视觉的李群表示学习
- 批准号:
2313149 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Continuing Grant
Collaborative Research: CompCog: RI: Medium: Understanding human planning through AI-assisted analysis of a massive chess dataset
合作研究:CompCog:RI:中:通过人工智能辅助分析海量国际象棋数据集了解人类规划
- 批准号:
2312374 - 财政年份:2023
- 资助金额:
$ 29.78万 - 项目类别:
Standard Grant