Data-Driven Approach to Understanding Ancient Manuscripts

理解古代手稿的数据驱动方法

基本信息

  • 批准号:
    RGPIN-2014-04649
  • 负责人:
  • 金额:
    $ 3.35万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2014
  • 资助国家:
    加拿大
  • 起止时间:
    2014-01-01 至 2015-12-31
  • 项目状态:
    已结题

项目摘要

Ancient manuscripts constitute a primary carrier of cultural heritage globally, and they are currently being intensively digitized all over the world to ensure their preservation, and, ultimately, wide accessibility to their content. Critical to this research process are the legibility of the documents in image form and access to live texts. Several state-of-the-art methods and approaches have been proposed and developed to address the challenges associated with processing these manuscripts. However, there is a huge amount of data involved, and also the high cost and scarcity of human expert feedback and reference data call for the development of fundamental approaches that encompass all these aspects in an objective and tractable manner. In this research, we propose one such approach, which is a novel framework for the computational pattern analysis of ancient manuscripts that is data-driven, multilevel, self-sustaining and learning-based, and takes advantage of the large quantities of unprocessed data available. Unlike many approaches, which fast-forward to the analysis of feature vectors, our proposed framework represents a new perspective on the task, which starts from ground zero of the problem, which is the definition of objects. In addition, it leverages the data-driven mining of relations among objects to discover hidden but persistent links between them. The problem is addressed at three main levels. At the lowest level, that of images, we tackle the automatic, data-driven enhancement and restoration of document images using spatial, spectral, sparse and graph-based representations of visual objects with a focus on spatial graphs of patch-based representations empowered with spatial proximity and local similarities. In terms of degradation modeling, our approach is to perform a cluster analysis on the data point manifolds. Transfer learning approaches will be also used to discover, model and correct unseen degradation holistically. At the second level, which is transliteration, we use directed graphical models, HMMs, Undirected Random Fields and spatial relations models of patch-based representations in an active learning framework to recognize the live text in manuscript images, in an effort to drastically reduce dependency on human experts and on reference data that are rarely available. In addition, cross-lingual approaches to adaptation in spoken language translations, such as transform mapping at state level, are also considered, in order to allow adaptation across writing styles and even across written languages. Finally, at the highest level, that of network analysis of the relations among objects (from patches and words to manuscripts and writers), we search for ‘social networks’ linking manuscripts. Considering this data-driven approach under the heading of Visual Language Processing (VLP), we hope that it will pave the way for new research in Canada’s upcoming data stewardship plan. This research program will lead to novel paradigms for processing and understanding ancient manuscripts using coherent, data-driven frameworks with tractable solutions. The multi-level structure of the proposed approach enables researchers to collaboratively mine, model and interpret digitized manuscripts, all of which can be achieved thanks to data-driven approaches, which have been largely absent from the field up to now. Empowered by the concept of life-long learning, our research makes the methods and models developed transferable across collections. It will advance the paradigms of image processing, pattern recognition, machine learning and network science, with the potential to impact huge collections of manuscripts and leverage different representation spaces and metrics, and their associated relationships and links.
古代手稿是全球文化遗产的主要载体,目前世界各地正在对它们进行大规模数字化,以确保它们得到保存,并最终使其内容广泛可及。对这个研究过程至关重要的是图像形式的文件的易读性和对实时文本的访问。已经提出并开发了几种最先进的方法和方法来解决与处理这些手稿相关的挑战。然而,涉及的数据量巨大,而且人类专家反馈和参考数据的高成本和稀缺要求开发以客观和可处理的方式涵盖所有这些方面的基本方法。在本研究中,我们提出了一种基于数据驱动、多层次、自我维持和基于学习的古代手稿计算模式分析新框架,并利用了大量可用的未处理数据。与许多快速推进到特征向量分析的方法不同,我们提出的框架代表了对任务的新视角,它从问题的起点开始,即对象的定义。此外,它利用数据驱动的对象之间关系挖掘来发现它们之间隐藏但持久的链接。这个问题在三个主要层面上得到解决。在图像的最低层次,我们使用视觉对象的空间、光谱、稀疏和基于图形的表示来处理文档图像的自动、数据驱动的增强和恢复,重点是基于空间接近性和局部相似性的基于补丁的表示的空间图。在退化建模方面,我们的方法是对数据点流形执行聚类分析。迁移学习方法也将用于整体地发现、建模和纠正看不见的退化。在第二个层面,即音译,我们在主动学习框架中使用有向图形模型、hmm、无向随机场和基于补丁的表示的空间关系模型来识别手稿图像中的实时文本,从而大大减少对人类专家和很少可用的参考数据的依赖。此外,还考虑了口语翻译中的跨语言适应方法,例如州一级的转换映射,以便允许跨写作风格甚至跨书面语言的适应。最后,在最高层次,即对象之间关系的网络分析(从补丁和文字到手稿和作家),我们搜索连接手稿的“社会网络”。考虑到这种数据驱动的方法在视觉语言处理(VLP)的标题下,我们希望它将为加拿大即将到来的数据管理计划的新研究铺平道路。该研究项目将为使用连贯的、数据驱动的框架和易于处理的解决方案来处理和理解古代手稿带来新的范例。该方法的多层次结构使研究人员能够协同挖掘、建模和解释数字化手稿,所有这些都可以通过数据驱动的方法实现,这在很大程度上是目前该领域所缺乏的。在终身学习概念的推动下,我们的研究使所开发的方法和模型可以在各个馆藏之间进行转移。它将推进图像处理、模式识别、机器学习和网络科学的范式,有可能影响大量手稿,并利用不同的表示空间和指标,以及它们之间的相关关系和联系。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Cheriet, Mohamed其他文献

Model selection for the LS-SVM. Application to handwriting recognition
  • DOI:
    10.1016/j.patcog.2008.10.023
  • 发表时间:
    2009-12-01
  • 期刊:
  • 影响因子:
    8
  • 作者:
    Adankon, Mathias M.;Cheriet, Mohamed
  • 通讯作者:
    Cheriet, Mohamed
Deep Learning-Based Resource Allocation for 5G Broadband TV Service
基于深度学习的5G宽带电视业务资源分配
  • DOI:
    10.1109/tbc.2020.2968730
  • 发表时间:
    2020-12-01
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Yu, Peng;Zhou, Fanqin;Cheriet, Mohamed
  • 通讯作者:
    Cheriet, Mohamed
LSTM-based indoor air temperature prediction framework for HVAC systems in smart buildings
  • DOI:
    10.1007/s00521-020-04926-3
  • 发表时间:
    2020-05-04
  • 期刊:
  • 影响因子:
    6
  • 作者:
    Mtibaa, Fatma;Nguyen, Kim-Khoa;Cheriet, Mohamed
  • 通讯作者:
    Cheriet, Mohamed
Arabic word descriptor for handwritten word indexing and lexicon reduction
  • DOI:
    10.1016/j.patcog.2014.04.025
  • 发表时间:
    2014-10-01
  • 期刊:
  • 影响因子:
    8
  • 作者:
    Chherawala, Youssouf;Cheriet, Mohamed
  • 通讯作者:
    Cheriet, Mohamed
Automatic evaluation of vessel diameter variation from 2D X-ray angiography

Cheriet, Mohamed的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Cheriet, Mohamed', 18)}}的其他基金

Data-driven modeling for understanding ancient documents from multimodal images
用于从多模态图像理解古代文献的数据驱动建模
  • 批准号:
    RGPIN-2019-05230
  • 财政年份:
    2022
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Discovery Grants Program - Individual
Data-driven modeling for understanding ancient documents from multimodal images
用于从多模态图像理解古代文献的数据驱动建模
  • 批准号:
    RGPIN-2019-05230
  • 财政年份:
    2021
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Discovery Grants Program - Individual
Data-driven modeling for understanding ancient documents from multimodal images
用于从多模态图像理解古代文献的数据驱动建模
  • 批准号:
    RGPIN-2019-05230
  • 财政年份:
    2020
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Discovery Grants Program - Individual
Sustainable Smart Eco-Cloud
可持续智慧生态云
  • 批准号:
    1000229052-2012
  • 财政年份:
    2019
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Canada Research Chairs
Data-driven modeling for understanding ancient documents from multimodal images
用于从多模态图像理解古代文献的数据驱动建模
  • 批准号:
    RGPIN-2019-05230
  • 财政年份:
    2019
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Discovery Grants Program - Individual
Sustainable Smart Eco-Cloud
可持续智慧生态云
  • 批准号:
    1000229052-2012
  • 财政年份:
    2018
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Canada Research Chairs
Sustainable cloud-based M2M smart home
基于云的可持续 M2M 智能家居
  • 批准号:
    469977-2014
  • 财政年份:
    2017
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Collaborative Research and Development Grants
Data-Driven Approach to Understanding Ancient Manuscripts
理解古代手稿的数据驱动方法
  • 批准号:
    RGPIN-2014-04649
  • 财政年份:
    2017
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Discovery Grants Program - Individual
Smart virtual wan hypervisor for environmental-aware inter-data network
用于环境感知数据间网络的智能虚拟广域网管理程序
  • 批准号:
    461084-2013
  • 财政年份:
    2017
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Collaborative Research and Development Grants
Sustainable Smart Eco-Cloud
可持续智慧生态云
  • 批准号:
    1000229052-2012
  • 财政年份:
    2017
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Canada Research Chairs

相似国自然基金

Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研究基金项目

相似海外基金

A data-driven modeling approach for augmenting climate model simulations and its application to Pacific-Atlantic interbasin interactions
增强气候模型模拟的数据驱动建模方法及其在太平洋-大西洋跨流域相互作用中的应用
  • 批准号:
    23K25946
  • 财政年份:
    2024
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
A new data-driven approach to bring humanity into virtual worlds with computer vision
一种新的数据驱动方法,通过计算机视觉将人类带入虚拟世界
  • 批准号:
    23K28129
  • 财政年份:
    2024
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
A data-driven modeling approach for augmenting climate model simulations and its application to Pacific-Atlantic interbasin interactions
增强气候模型模拟的数据驱动建模方法及其在太平洋-大西洋跨流域相互作用中的应用
  • 批准号:
    23H01250
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Data-driven design of Next Generation Cross-Coupling catalysts by Ligand Parameterisation: A Combined Experimental and Computational Approach.
通过配体参数化进行下一代交叉偶联催化剂的数据驱动设计:实验和计算相结合的方法。
  • 批准号:
    2896325
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Studentship
EAGER: Development of a Hybrid Knowledge- and Data-Driven Approach to Guide the Design of Immunotherapeutic Cells
EAGER:开发混合知识和数据驱动的方法来指导免疫治疗细胞的设计
  • 批准号:
    2324742
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Continuing Grant
Study on Heavy Rainfall Mechanism by Mathematical and Data-Driven Approach Using Large Ensemble
利用大集合的数学和数据驱动方法研究强降雨机制
  • 批准号:
    23KF0161
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Enabling the mortgage industry to drive net zero retrofitting through a data-driven portfolio approach
使抵押贷款行业能够通过数据驱动的投资组合方法推动净零改造
  • 批准号:
    10092176
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Collaborative R&D
Moving Beyond the Individual- A Data-driven Approach to Improving the Evidence on the Role of Community and Societal Determinants of HIV among Adolescent Girls and Young Women in Sub-Saharan Africa
超越个人——采用数据驱动的方法来改善关于艾滋病毒在撒哈拉以南非洲地区少女和年轻妇女中的社区和社会决定因素的作用的证据
  • 批准号:
    10619319
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
Data-driven approach for advanced control of immune system by using low-temperature plasma
利用低温等离子体对免疫系统进行高级控制的数据驱动方法
  • 批准号:
    23H01404
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
CAREER: A Holistic Developer-Centered Approach to Enhance Privacy for Data-Driven Applications
职业:以开发人员为中心的整体方法来增强数据驱动应用程序的隐私
  • 批准号:
    2238047
  • 财政年份:
    2023
  • 资助金额:
    $ 3.35万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了