Study on the Digitization Support System for Historical Documents
历史文献数字化支撑系统研究
基本信息
- 批准号:14310166
- 负责人:
- 金额:$ 6.21万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (B)
- 财政年份:2002
- 资助国家:日本
- 起止时间:2002 至 2004
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Classical papers often suffer from wormholes and discoloration due to aging and there are sometimes seals and annotations overlapped on characters. These make character extraction and recognition difficult. Moreover, characters in classical texts are often cursive.Thus, the segmentation of a cursive string into characters is important. Due to the aforementioned problems in preprocessing historical documents, new means of character segmentation are examined.The proposed method begins with some filtering, i.e., a color filter to extract candidate pixels of characters according to their color, some noise reduction filters, a converters to create gray images then binarization images. Layout information as to whether a text is written vertically or horizontally as well as average character size in a page are obtained from the analysis of a peripherally projected histogram. A character is constructed gradually from pixels. At last, segmentation of a cursive string is done basically along the … More line connecting the nearest concavities on the same contour. The strength of the new methods avoids the need for language specific character style knowledge and layout information.The defect of aforementioned procedure is that results are strongly affected by local shapes of contours. To compensate for this problem, a kind of multi-resolution analysis method is introduced. The basic idea is that an original image I is blurred by convoluting a Gaussian function G such as G^*I, then Laplacian operator ▽^2 is applied such as ▽^2(G^*I) = (▽^2G)^*I=O to get edges. The Gaussian function behaves as a band pass filter that wipes out small structures at scale less than the parameter σ (standard deviation). When σ becomes larger, a picture becomes rougher. If a concavity conserved in a rougher picture means that shape changed largely around there. The important issue is that the large change of shape in a rougher picture is also conserved in the detail picture, that is, separation lines in a rougher picture must exist in the detail picture. Experiments showed that this method is robuster than aforementioned to choose appropriate lines to segment cursive string. Multi-resolution analysis by wavelet is introduced to facilitate this procedure.Also, extracting titles using page layout information and recognizing hand-written characters using n-gram were done as the preliminary examinations for full character recognition. Less
古典文献因年代久远,常出现虫蛀、变色等现象,文字上有时会出现印章、批注重叠的现象。这使得字符提取和识别变得困难。此外,古典文献中的字符通常是草书,因此,将草书字符串分割成字符是很重要的。针对历史文档预处理中存在的上述问题,本文提出了一种新的字符分割方法,该方法首先进行滤波,一彩色滤波器,用以根据字符的颜色来抽取字符的候选像素;一些噪声降低滤波器;一转换器,用以产生灰度图像,然后二值化图像。从外围投影直方图的分析中获得关于文本是垂直还是水平书写的布局信息以及页面中的平均字符大小。一个字符是由像素逐渐构成的。最后,基本上沿着沿着 ...更多信息 连接同一轮廓上最近的凹部的线。新方法的优点是避免了对特定语言的字符风格知识和布局信息的需要,而上述方法的缺点是结果受轮廓局部形状的强烈影响。为了弥补这一问题,引入了一种多分辨率分析方法。其基本思想是,通过卷积高斯函数G(例如G^*I)来模糊原始图像I,然后应用拉普拉斯算子(例如)来获得边缘。高斯函数表现为带通滤波器,其消除尺度小于参数σ(标准偏差)的小结构。当σ变大时,图像变得粗糙。如果在粗糙的图片中保留一个常数意味着形状在那里发生了很大的变化。重要的问题是,在粗糙图像中的大的形状变化在细节图像中也是保守的,即,在粗糙图像中的分离线必须存在于细节图像中。实验结果表明,该方法在选取合适的线段对草书字符串进行分割方面比上述方法具有更好的鲁棒性。本文引入了小波多分辨率分析方法,并利用版面信息提取标题,利用n-gram识别手写体字符,作为全字符识别的初步试验。少
项目成果
期刊论文数量(15)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
原 正一郎: "古文書OCRのための文字切り出し"情報処理学会研究報告2002-CH-55. Vol.2002,No.73. 51-56 (2002)
Shoichiro Hara:“古代文献 OCR 的字符提取”日本信息处理学会研究报告 2002-CH-55,第 73 期(2002 年)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Segmentation of Cursive Character for Classical Literal OCR
经典文字 OCR 的草书字符分割
- DOI:
- 发表时间:2002
- 期刊:
- 影响因子:0
- 作者:Umata;Ayako;Shoichiro HARA
- 通讯作者:Shoichiro HARA
OCR for Japanese Classical Documents
日本古典文献 OCR
- DOI:
- 发表时间:2003
- 期刊:
- 影响因子:0
- 作者:Shoichiro Hara;Mamoru Shibayama
- 通讯作者:Mamoru Shibayama
古文書OCRのための文字切り出し
古文献OCR字符提取
- DOI:
- 发表时间:2002
- 期刊:
- 影响因子:0
- 作者:Shoichiro Hara;Mamoru Shibayama;石川 禎浩;白川琢磨;原 正一郎
- 通讯作者:原 正一郎
OCR for Japanese Classical Documents - Segmentation of Cursive Characters -
日本古典文献 OCR - 草书字符分割 -
- DOI:
- 发表时间:2002
- 期刊:
- 影响因子:0
- 作者:Mani;A.;中西裕二;和崎 春日;Shoichiro Hara
- 通讯作者:Shoichiro Hara
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
HARA Shoichiro其他文献
Inter-institutional Database Unification by Meatadata--Standardization for Humanities Data Sharing--
元数据跨机构数据库统一--人文数据共享标准化--
- DOI:
- 发表时间:
2003 - 期刊:
- 影响因子:0
- 作者:
HARA Shoichiro;SHIBAYAMA Mamoru;YASUNAGA Hisashi - 通讯作者:
YASUNAGA Hisashi
HARA Shoichiro的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('HARA Shoichiro', 18)}}的其他基金
Development of quantitative analysis method of "knowledge of the area" that focuses on community health activities as an index - Case Study in Northeast Thailand -
开发以社区健康活动为指标的“地区知识”定量分析方法 - 泰国东北部案例研究 -
- 批准号:
23241080 - 财政年份:2011
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
Development of Area Health Informatics : A Quantitative Comparative Area Studies regarding Disease Structure
区域健康信息学的发展:关于疾病结构的定量比较区域研究
- 批准号:
19201051 - 财政年份:2007
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (A)
On the Study of Information Model for Literary Data Unification
文献数据统一信息模型研究
- 批准号:
12680427 - 财政年份:2000
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
"On the effective construction of catalog databases for Japanese classical literature based on the new model -Case study of the reconstruction of microfilm catalog data to multimedia data-"
《论基于新模型的日本古典文献目录数据库的有效构建——缩微胶片目录数据向多媒体数据重构案例——》
- 批准号:
06610417 - 财政年份:1994
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
ON THE OCR APPROACH TO CREATING FULL-TEXT DATA BASE OF JAPANESE CLASSICAL LITERATURE
浅谈OCR方法创建日本古典文学全文数据库
- 批准号:
04610271 - 财政年份:1992
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for General Scientific Research (C)
相似海外基金
Development of a real-time evaluation method using an AI-based image processing system in liver tumor ablaition
基于人工智能的图像处理系统开发肝脏肿瘤消融实时评估方法
- 批准号:
23K11923 - 财政年份:2023
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of big data analysis platform for fish behavior in the sea by image processing, change detection, and machine learning techniques
利用图像处理、变化检测、机器学习技术构建海洋鱼类行为大数据分析平台
- 批准号:
23K14005 - 财政年份:2023
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Developing next-generation, AI-enabled, medical image processing for multiple sclerosis clinical trials and routine care.
开发用于多发性硬化症临床试验和日常护理的下一代人工智能医学图像处理。
- 批准号:
2877679 - 财政年份:2023
- 资助金额:
$ 6.21万 - 项目类别:
Studentship
CRII: SHF: RUI: Custom Hardware Accelerators for Privacy-Preserving Image Processing
CRII:SHF:RUI:用于保护隐私的图像处理的定制硬件加速器
- 批准号:
2347253 - 财政年份:2023
- 资助金额:
$ 6.21万 - 项目类别:
Standard Grant
Creating a Technological Infrastructure for the Digital Restoration of Cultural Properties Using Statistical Image Processing and Deep Learning
利用统计图像处理和深度学习为文化财产数字化修复创建技术基础设施
- 批准号:
22H00748 - 财政年份:2022
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
I-Corps: Image processing platform to identify photosynthetic pigment density to measure nitrogen content and manage fertilizer (Smart Sustainable Fertilizer Manager)
I-Corps:图像处理平台,用于识别光合色素密度,测量氮含量并管理肥料(智能可持续肥料管理器)
- 批准号:
2227256 - 财政年份:2022
- 资助金额:
$ 6.21万 - 项目类别:
Standard Grant
Deep graphical models and methods for multi-modal biomedical image processing, analysis, and interpretation
用于多模态生物医学图像处理、分析和解释的深度图形模型和方法
- 批准号:
RGPIN-2018-03966 - 财政年份:2022
- 资助金额:
$ 6.21万 - 项目类别:
Discovery Grants Program - Individual
Development of an Image Processing Algorithm to Customize Cochlear Implant Procedures
开发用于定制人工耳蜗植入程序的图像处理算法
- 批准号:
548003-2020 - 财政年份:2022
- 资助金额:
$ 6.21万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Image Processing Techniques for the Segmentation, Fusion, and Registration of Ultrasound Images from Intracavitary Gynecologic Brachytherapy Procedures
用于腔内妇科近距离放射治疗过程中超声图像的分割、融合和配准的图像处理技术
- 批准号:
557687-2021 - 财政年份:2022
- 资助金额:
$ 6.21万 - 项目类别:
Postdoctoral Fellowships
Development and utilization of digital / analog teaching materials using 3D image processing and 3D printers for biological education
利用3D图像处理和3D打印机进行生物教育的数字/模拟教材的开发和利用
- 批准号:
22K02988 - 财政年份:2022
- 资助金额:
$ 6.21万 - 项目类别:
Grant-in-Aid for Scientific Research (C)














{{item.name}}会员




