权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

半構造データからのスキーマ情報抽出アルゴリズムの開発

半结构化数据模式信息提取算法的开发

基本信息

批准号：
12780317
负责人：
鈴木伸崇
金额：
$ 1.54万
依托单位：
Okayama Prefectural University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Encouragement of Young Scientists (A)
财政年份：
2000
资助国家：
日本
起止时间：
2000 至 2001
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/en/grant/KAKENHI-PROJECT-12780317/
关键词：
半構造データスキーマ抽出問題アルゴリズム NP困難性

项目摘要

本研究では,半構造データから「各クラスの密度が与えられた閾値以上,かつ,クラス数が最小である」という条件を満たすデータベーススキーマ(以下,スキーマ)を抽出する最適化問題について考察している.ここで,クラス密度とはクラスの型とそれに属する各オブジェクトの型の類似度を表す尺度であり,クラス密度が大きいほどその類似度が高いことを表す.本年度の主な研究実績は以下の通りである.1.スキーマ抽出問題の計算量申請者らは,これまでに上記最適化問題が強NP困難であることを示している.本年度は,同最適化問題に関して,「各抽出クラスの型が最適である(型を表す属性系列が最短である)」という条件を加えた場合の計算量について考察した.そして,この場合,同問題が強NP困難でありかつΔ_2Pに属することを示した.2.スキーマ抽出アルゴリズムの改良上記の結果から,厳密な最適解(スキーマ)を効率良く抽出することは困難である.そこで,昨年度,申請者らは根付クラス(rooted class)というクラスを提案し,それを用いてスキーマを効率良く抽出する多項式時間アルゴリズムを構成した.本年度は,この根付クラスをより一般的な構造をもつクラス(bounded class)に拡張することにより,よりサイズ(クラス数)の小さいスキーマが抽出できるようにアルゴリズムの改良を行った.具体的には,根付クラスは「そのクラスに属するどの基本クラスも,一つのクラス(そのクラスの根)の下位クラスである」という制限をもつが,今回拡張されたクラスではそのような根に相当するクラスを複数もつことが許されている.そのような拡張を行った場合でも,スキーマ抽出アルゴリズムが多項式時間で実行時間可能であることを示した.

In this study, the semi-structural structure is investigated as an optimization problem under the condition that the density of each class is higher than the threshold value, and the number of classes is minimum. The similarity of the type of each species is expressed in terms of scale. The similarity of the type of each species is expressed in terms of scale. The main achievements of this year's research are as follows: 1. The calculation amount of the extraction problem is requested, and the optimization problem is recorded. This year, in connection with the optimization problem,"each type of extraction is optimal (the type of table attribute series is the shortest)" and the calculation amount of the case is added. In this case, the same problem has strong NP difficulty. 2. The optimal solution has good efficiency. Therefore, last year, the applicant proposed to pay for the "rooted class" and "rooted class", and he used the "rooted class" to effectively allocate a polynomial time to form a complete system. This year, the root of the problem is to improve the general structure of the problem (bounded class), to improve the problem, to improve the problem (bounded class). Specifically, the root of the problem is "the root of the problem belongs to the basic problem, the root of the problem (the root of the problem) and the lower problem". The time of the polynomial is not enough.