权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Autonomous Data Mining System based on Constructive Learning

基于建构性学习的自主数据挖掘系统

基本信息

批准号：
09680359
负责人：
SUZUKI Einoshin
金额：
$ 2.11万
依托单位：
Yokohama National University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (C)
财政年份：
1997
资助国家：
日本
起止时间：
1997 至 1998
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-09680359/
关键词：
rule discovery machine learning data mining constructive induction dynamic bias data-driven

项目摘要

This research presents an autonomous method for discovering prediction rules with dynamic bias selection. A prototype system has been developed, and its effectiveness was demonstrated by experiments.In current data mining systems, an user is involved in both pre-processing of a data set and knowledge discovery. In order to reduce his burden of choosing and adjusting multiple mining algorithms, we propose a knowledge discovery system, which autonomously selects learning methods based on constructive induction. Our task is prediction rule discovery. A prediction rule, which is aimed at predicting the class of an unseen example, deserves special attention due to its usefulness in various domains such as exploratory data analysis and automatic construction of a knowledge base.Our method consists of two phases : 1) pre-processing of a data set by autonomous discretization ; 2) knowledge discovery by autonomous decision of knowledge representation and autonomous adjustment of evaluation criteria. Our method, based on novel data-driven criteria and constraints, selects appropriate biases, each of which is a component of a learning algorithm. Available biases are an equal-frequency method and a minimum entropy method for discretization ; a conjunction rule and an M of N rule for knowledge representation ; J-measure and predictiveness for evaluation criterion.Our approach has been validated using 47 discovery tasks with real-world data sets such as retail sale data. We have discussed quantitative evaluation criteria for prediction rule discovery, and proposed J-measure with cross-validation. Our method) compared with the best combinations of biases, achieved more than 90% J-measure with cross-validation in 30 tasks. Careful analysis revealed that our approach is effective unless provided data set is extremely small. We have also assumed a large-scale data set, and developed a parallel system on multiple personal computers.

本研究提出一种具有动态偏差选择的预测规则自动发现方法。在现有的数据挖掘系统中，用户既要参与数据集的预处理，又要参与知识发现。为了减少他的负担，选择和调整多个挖掘算法，我们提出了一个知识发现系统，自主选择学习方法的基础上建设性归纳。我们的任务是预测规则发现。预测规则是一种用于预测未知样本类别的规则，它在探索性数据分析和知识库自动构建等领域具有重要的应用价值，该方法包括两个阶段：1）对数据集进行自主离散化预处理;（2）通过自主决定知识表示和自主调整评价标准来进行知识发现。我们的方法基于新的数据驱动的标准和约束，选择适当的偏差，每个偏差都是学习算法的一个组成部分。可用的偏见是一个等频率的方法和最小熵方法的离散化;一个连接规则和N规则的知识表示; J-措施和预测性的评价criterion. We方法已被验证使用47发现任务与现实世界的数据集，如零售销售数据。我们讨论了预测规则发现的定量评价标准，并提出了交叉验证的J-测度。与最佳偏差组合相比，在30个任务中，交叉验证的J-测度达到90%以上。仔细的分析表明，我们的方法是有效的，除非提供的数据集是非常小的。我们还假设了一个大规模的数据集，并在多台个人计算机上开发了一个并行系统。