ON THE OCR APPROACH TO CREATING FULL-TEXT DATA BASE OF JAPANESE CLASSICAL LITERATURE

浅谈OCR方法创建日本古典文学全文数据库

基本信息

  • 批准号:
    04610271
  • 负责人:
  • 金额:
    $ 1.28万
  • 依托单位:
  • 依托单位国家:
    日本
  • 项目类别:
    Grant-in-Aid for General Scientific Research (C)
  • 财政年份:
    1992
  • 资助国家:
    日本
  • 起止时间:
    1992 至 无数据
  • 项目状态:
    已结题

项目摘要

A new approach to reducing image noises which disturb the optical character recognition has been studied. A peculiarity of the study is to use information about color to improve classification of "true" letters from image noises such as red letters, paper, pseudo-letters which are written on the reverse side of translucent papers and so on. Japanese original classical books written by the Chinese black ink on the white Japanese classical papers were selected as the research samples.The results are as follows :(1) Characteristics of Color Distribution : Original images were digitized by the color image scanner (100dpi, 256 gray-levels/R,G,B). and each picture cells are represented as 3-dimensional vector in the RGB-chromaticity coordinates then analyzed. The characteristics of the color distribution are, (a) many of the picture cells have the color distribution along with the line of R=G=B, (b)red letters have the different color distribution from (a), (c) brightness histograms of R,G and B colors are almost bimodal.(2) Classification of Images : (a) The characteristic of (a) and (b) in (1) are useful to distinguish red letters from another images. (b) The discriminant threshold selection method (Ohtu's method) was applied to each brightness histograms to determine thresholds between black letters and paper segments. This method can classify both segments sharply, but it is inclined to slices off the peripheral picture cells of the "true" black letters. (c) The cluster analysis was introduced to classify "true" black letters and paper segments more precisely, which gives better result.This study verify usefulness of the color information to eliminate image noise.
研究了一种新的消除干扰光学字符识别的图像噪声的方法。本研究的一个特点是利用颜色信息从红色字母、纸张、书写在半透明纸张背面的伪字母等图像噪声中提高“真”字母的分类能力,并以中国黑墨水书写在白色日本古典纸张上的日本原版古典书籍为研究样本,研究结果如下:(1)颜色分布特征:原始图像经彩色图像扫描仪(100 dpi,256灰度/R,G,B)数字化。并且每个图像单元被表示为RGB色度坐标中的三维矢量,然后被分析。颜色分布的特征是,(a)许多图像单元具有沿着R=G=B的线的颜色分布,(B)红色字母具有与(a)不同的颜色分布,(c)R、G和B颜色的亮度直方图几乎是双峰的。(2)图像的分类:(a)(1)中的(a)和(B)的特征对于区分红色字母和其他图像是有用的。(b)将判别阈值选择方法(Ohtu方法)应用于每个亮度直方图以确定黑色字母和纸段之间的阈值。该方法能很好地对两段文字进行分类,但容易切掉“真”黑色文字的外围图像单元。(c)通过引入聚类分析方法,对“真”黑色字母和纸段进行了更准确的分类,取得了较好的效果,验证了颜色信息对去除图像噪声的有效性。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

HARA Shoichiro其他文献

Inter-institutional Database Unification by Meatadata--Standardization for Humanities Data Sharing--
元数据跨机构数据库统一--人文数据共享标准化--

HARA Shoichiro的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('HARA Shoichiro', 18)}}的其他基金

Development of quantitative analysis method of "knowledge of the area" that focuses on community health activities as an index - Case Study in Northeast Thailand -
开发以社区健康活动为指标的“地区知识”定量分析方法 - 泰国东北部案例研究 -
  • 批准号:
    23241080
  • 财政年份:
    2011
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Development of Area Health Informatics : A Quantitative Comparative Area Studies regarding Disease Structure
区域健康信息学的发展:关于疾病结构的定量比较区域研究
  • 批准号:
    19201051
  • 财政年份:
    2007
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Study on the Digitization Support System for Historical Documents
历史文献数字化支撑系统研究
  • 批准号:
    14310166
  • 财政年份:
    2002
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
On the Study of Information Model for Literary Data Unification
文献数据统一信息模型研究
  • 批准号:
    12680427
  • 财政年份:
    2000
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
"On the effective construction of catalog databases for Japanese classical literature based on the new model -Case study of the reconstruction of microfilm catalog data to multimedia data-"
《论基于新模型的日本古典文献目录数据库的有效构建——缩微胶片目录数据向多媒体数据重构案例——》
  • 批准号:
    06610417
  • 财政年份:
    1994
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)

相似海外基金

Construction of a Digital Literary Map and its Application to the Study and Education of Japanese Classical Literature.
数字文学地图的构建及其在日本古典文学研究与教育中的应用。
  • 批准号:
    21H00506
  • 财政年份:
    2021
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Global circulation of Japanese classical literature: Tracing Hojoki's Western reception
日本古典文学的全球流通:追踪北条记的西方接受
  • 批准号:
    19K23065
  • 财政年份:
    2019
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Research Activity Start-up
Natural disasters, Man-made disasters and Children in Japanese classical literature. Stories of children who overcome a disaster as teaching material.
日本古典文学中的天灾、人祸和儿童。
  • 批准号:
    18K02637
  • 财政年份:
    2018
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Research on the genealogy of annotative research of early modern Japanese classical literature
日本近代古典文学注释研究的谱系研究
  • 批准号:
    16K02404
  • 财政年份:
    2016
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Regions and Children in Japanese classical literature: Research to compile "Stories of children who live in symbiosis with the regional environments" as teaching material
日本古典文学中的地域与儿童:《与地域环境共生的儿童故事》教材编写研究
  • 批准号:
    15K04508
  • 财政年份:
    2015
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A Comparative Literature Study on Modern Chinese Translations of Japanese Classical Literature
日本古典文学现代汉译比较文学研究
  • 批准号:
    24520405
  • 财政年份:
    2012
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Children and Environment in Japanese classical literature: Clinical research to compile "Stories of children who overcome hardships" as teaching material
日本古典文学中的儿童与环境:编写《克服困难的儿童故事》教材的临床研究
  • 批准号:
    24531225
  • 财政年份:
    2012
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Research on the influence of Kinso (Qing Cao) and its tales to Japanese classical literature
《青草》及其故事对日本古典文学的影响研究
  • 批准号:
    23520267
  • 财政年份:
    2011
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Researches on the sentences in classical Japanese written by the persons of letters who studied Japanese classical literature and culture
研究日本古典文学和文化的文人墨客的古典日语句子研究
  • 批准号:
    22520171
  • 财政年份:
    2010
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Comprehensive Study of the previous fiscal year study of Japanese classical literature in the early modern period
日本近代古典文学历年研究综合研究
  • 批准号:
    22320130
  • 财政年份:
    2010
  • 资助金额:
    $ 1.28万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了