Robust and high-performance methods for layout analysis in OCR-D

用于 OCR-D 中布局分析的稳健且高性能的方法

基本信息

项目摘要

The project aims to improve the quality and robustness of layout analysis for historical documents and thus ensure their aptitude for mass digitization. To achieve this, existing approaches will be optimized and extended, and promising new methods will be integrated. First, a sample-based analysis of the bibliography of books printed in the German-speaking countries in the 16th, 17th and 18th century (VD) will serve to identify (and quantify) those classes of documents for which the results of existing methods for layout analysis are still insufficient. Likewise, suitable training data will be identified and harmonized, and their preparation and generation will be organized more efficiently. The main focus of the work is the further development of complementary methods for layout analysis. On the one hand, a broad coverage for as many documents as possible in the VD is to be achieved by optimizing generic methods and models. On the other hand, this will be complemented by approaches that help to specifically address identified weaknesses by significantly improving the adaptability of the methods and models for new materials and challenges. Furthermore, heuristics are (further) developed in order to optimize the results of different deep learning methods in a rule-based manner. The developments will be accompanied by a detailed evaluation for which scientific standard metrics and tools for layout evaluation will be implemented and integrated in OCR-D, respectively. Last but not least, it must be ensured that all procedures are equipped as modular components with OCR-D interfaces for individual processing steps. This will allow the flexible combination of the procedures to achieve the best possible results and ensure adaptability and sustainability with regard to new developments.
该项目旨在提高历史文献版面分析的质量和稳健性,从而确保其大规模数字化的能力。为了实现这一目标,将对现有方法进行优化和扩展,并将整合有前途的新方法。首先,对16世纪、17世纪和18世纪(VD)德语国家印刷的图书书目进行基于样本的分析,将有助于确定(并量化)现有版面分析方法仍不足以解决的文献类别。同样,将确定和协调适当的培训数据,并将更有效地组织这些数据的编制和生成。这项工作的主要重点是进一步发展布局分析的补充方法。一方面,要通过优化通用方法和模型来实现对VD中尽可能多的文件的广泛覆盖。另一方面,将通过显著提高方法和模式对新材料和挑战的适应性,帮助具体解决已查明的弱点的办法加以补充。此外,为了以基于规则的方式优化不同深度学习方法的结果,(进一步)开发了启发式算法。这些发展将伴随着详细的评估,其中将分别在OCR-D中实施和整合布局评估的科学标准度量和工具。最后但并非最不重要的一点是,必须确保所有程序都作为模块化组件配备,带有用于各个处理步骤的OCR-D接口。这将使程序的灵活组合能够取得最好的结果,并确保对新的事态发展的适应性和可持续性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Professor Dr. Achim Bonte其他文献

Professor Dr. Achim Bonte的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Professor Dr. Achim Bonte', 18)}}的其他基金

Cataloguing and digitization of manuscripts in italian language in the Saxon State and University Library Dresden (SLUB)
德累斯顿萨克森州立大学图书馆 (SLUB) 意大利语手稿的编目和数字化
  • 批准号:
    274200166
  • 财政年份:
    2015
  • 资助金额:
    --
  • 项目类别:
    Cataloguing and Digitisation (Scientific Library Services and Information Systems)
Digital Edition of the Correspondence of August Wilhelm Schlegel
奥古斯特·威廉·施莱格尔的书信数字版
  • 批准号:
    204094077
  • 财政年份:
    2012
  • 资助金额:
    --
  • 项目类别:
    Research Grants
Further development of the Specialised Information Service for Slavic languages, literature and ethnology in close cooperation with the Slavists in Germany
与德国斯拉夫主义者密切合作,进一步发展斯拉夫语言、文学和民族学专业信息服务
  • 批准号:
    285797103
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Acquisition and Provision (Scientific Library Services and Information Systems)
Regional economic development in a qualitative-quantitative dual perspective – the annual reports of German Chambers of Commerce of the long 19th century in open access
定性与定量双重视角下的区域经济发展——19世纪漫长的德国商会开放获取年度报告
  • 批准号:
    529670445
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Cataloguing and Digitisation (Scientific Library Services and Information Systems)
Pop-up 3D –Digitization and Interactive Visualization of Historical Movable Books in Science-Guided Practice
弹出式 3D 历史可移动书籍在科学指导实践中的数字化和交互式可视化
  • 批准号:
    527302005
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Cataloguing and Digitisation (Scientific Library Services and Information Systems)
Cartography and Geo Data (Ordnance Survey)
制图和地理数据(地形测量)
  • 批准号:
    285727181
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Acquisition and Provision (Scientific Library Services and Information Systems)
QalamosEstablishment of a union catalog of Oriental manuscripts by setting cataloging standards, conversion of printed catalogs, and integration of existing databases
Qalamos通过制定编目标准、转换印刷目录以及整合现有数据库,建立东方手稿联合目录
  • 批准号:
    430973116
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Cataloguing and Digitisation (Scientific Library Services and Information Systems)
Specialised Information Service Asia (FID Asia) – Chinese language and culture region, Japan, Korea, Central Asia (Mongolia and Central Asian regions in the People's Republic of China), Southeast Asia
亚洲专业信息服务(FID Asia) - 中国语言文化地区、日本、韩国、中亚(中华人民共和国蒙古及中亚地区)、东南亚
  • 批准号:
    471539089
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Acquisition and Provision (Scientific Library Services and Information Systems)
Digitisation and indexing of the Leni Riefenstahl estateJoint project of the Staatliche Museen zu Berlin (National Museums Berlin), the Staatsbibliothek zu Berlin (National Library Berlin) - Stiftung Preußischer Kulturbesitz and the Stiftung Deutsche Kine
莱尼·里芬斯塔尔庄园的数字化和索引柏林国家博物馆(柏林国家博物馆)、柏林国家图书馆(柏林国家图书馆)的联合项目 - Stiftung Preuíischer Kulturgenossenschaft 和 Stiftung Deutsche Kine
  • 批准号:
    512444797
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Cataloguing and Digitisation (Scientific Library Services and Information Systems)
“that you approach everything you do with love and passion”. Cataloguing and digitisation of Claudio Abbado's correspondence at the Staatsbibliothek zu Berlin – Stiftung Preußischer Kulturbesitz
“你以爱和热情对待你所做的一切”。
  • 批准号:
    512473891
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
    Cataloguing and Digitisation (Scientific Library Services and Information Systems)

相似国自然基金

CuAgSe基热电材料的结构特性与构效关系研究
  • 批准号:
    22375214
  • 批准年份:
    2023
  • 资助金额:
    50.00 万元
  • 项目类别:
    面上项目
海洋微藻生物固定燃煤烟气中CO2的性能与机理研究
  • 批准号:
    50806049
  • 批准年份:
    2008
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
Web服务质量(QoS)控制的策略、模型及其性能评价研究
  • 批准号:
    60373013
  • 批准年份:
    2003
  • 资助金额:
    20.0 万元
  • 项目类别:
    面上项目

相似海外基金

Opportunistic Atherosclerotic Cardiovascular Disease Risk Estimation at Abdominal CTs with Robust and Unbiased Deep Learning
通过稳健且公正的深度学习进行腹部 CT 机会性动脉粥样硬化性心血管疾病风险评估
  • 批准号:
    10636536
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Gelbrane: Combined Gel and Membrane for Robust Western Blotting
Gelbrane:结合凝胶和膜实现稳健的蛋白质印迹
  • 批准号:
    10759072
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Neural and computational mechanisms underlying robust object recognition
鲁棒物体识别背后的神经和计算机制
  • 批准号:
    10682285
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Robust and Efficient Learning of High-Resolution Brain MRI Reconstruction from Small Referenceless Data
从小型无参考数据中稳健而高效地学习高分辨率脑 MRI 重建
  • 批准号:
    10584324
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Rapid Motion-Robust and Easy-to-Use Dynamic Contrast-Enhanced MRI for Liver Perfusion Quantification
用于肝脏灌注定量的快速运动稳健且易于使用的动态对比增强 MRI
  • 批准号:
    10831643
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Robust Mass Spectrometric Protein/Peptide Assays for Type 1 Diabetes Clinical Applications
适用于 1 型糖尿病临床应用的稳健质谱蛋白质/肽检测
  • 批准号:
    10730900
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
A web-based platform for robust single-cell analysis, bulk data deconvolution and system-level analysis
基于网络的平台,用于强大的单细胞分析、批量数据反卷积和系统级分析
  • 批准号:
    10766073
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Robust and highly selective proton MRSI on a clinical 3 T system using a second order gradient insert, for application in schizophrenia
使用二阶梯度插入的临床 3 T 系统上的鲁棒性和高选择性质子 MRSI,用于精神分裂症的应用
  • 批准号:
    10741355
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
An electrophysiology platform that enables robust, scalable and long-term intracellular recording of cardiomyocytes
一个电生理学平台,能够对心肌细胞进行稳健、可扩展和长期的细胞内记录
  • 批准号:
    10500961
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
Statistical methods for analyzing messy microbiome data: detection of hidden artifacts and robust modeling approaches
分析杂乱微生物组数据的统计方法:隐藏伪影的检测和稳健的建模方法
  • 批准号:
    10708908
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了