Distributed Data Mining Systems for Structured Web Data

结构化 Web 数据的分布式数据挖掘系统

基本信息

  • 批准号:
    14580423
  • 负责人:
  • 金额:
    $ 1.92万
  • 依托单位:
  • 依托单位国家:
    日本
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
  • 财政年份:
    2002
  • 资助国家:
    日本
  • 起止时间:
    2002 至 2004
  • 项目状态:
    已结题

项目摘要

In this research, we studied knowledge discovery from semistructured Web documents such as HTML/XML files. Graph or tree-based data mining and discovery of frequent structures in graph or tree structured data have been extensively studied. Our target of discovery is neither a simply frequent pattern nor a maximally frequent pattern with respect to syntactic sizes of patterns such as the number of vertices. In order to extract useful information from heterogeneous semistructured Web documents, our target of discovery is a semantically and maximally tree structured pattern which represents a common characteristic in semistructured documents. As a representation of a tree structured pattern, we proposed an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables.A term tree is different from other representations of tree structured patterns in that a term tree has structured variables which can be substituted by arbitrary trees. First of all, we deeply studied the learnabilities of classes of term tree languages and gave fundamental classes of term tree languages which are polynomial time learnable. We proved that some classes of term tree languages are polynomial time inductively inferable from positive data, which include the class of linear term tree languages with multiple child-port variables, the class of linear term tree languages with contractible variables which are adjacent to leaves, and the class of linear term tree languages with height-constrained variables and no variable chain. Moreover, we showed that some classes of linear term tree languages are exactly learnable in polynomial time using queries.Finally, we presented a metasearch system which uses our efficient learning algorithms for term trees. We implemented this system and showed that it provides an effective unified access to multiple existing search sites.
在这项研究中,我们研究了半结构化 Web 文档(例如 HTML/XML 文件)中的知识发现。基于图或树的数据挖掘以及图或树结构数据中频繁结构的发现已经被广泛研究。我们的发现目标既不是简单的频繁模式,也不是就模式的句法大小(例如顶点数量)而言的最大频繁模式。为了从异构半结构化 Web 文档中提取有用信息,我们的发现目标是语义上最大的树结构模式,它代表了半结构化文档的共同特征。作为树结构模式的表示,我们提出了一种有序树模式,称为术语树,它是由有序子项和内部结构化变量组成的有根树模式。术语树与树结构模式的其他表示不同,因为术语树具有可以用任意树替换的结构化变量。首先,我们深入研究了术语树语言类别的可学习性,并给出了多项式时间可学习的术语树语言的基本类别。我们证明了某些类别的项树语言是多项式时间可从正数据归纳推断的,其中包括具有多个子端口变量的线性项树语言类、具有与叶子相邻的可收缩变量的线性项树语言类以及具有高度受限变量且无变量链的线性项树语言类。此外,我们还表明,某些类别的线性术语树语言可以使用查询在多项式时间内完全学习。最后,我们提出了一个元搜索系统,该系统使用我们针对术语树的高效学习算法。我们实施了这个系统,并表明它提供了对多个现有搜索站点的有效统一访问。

项目成果

期刊论文数量(68)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Learning of Ordered Tree Languages with Height-Bounded Variables Using Queries
使用查询学习具有高度限制变量的有序树语言
Y.Suzuki, T.Shoudai, T.Miyahara, T.Uchida: "Ordered Term Tree Languages Which Are Polynomial Time Inductively Inferable from Positive Data"Proc.Algorithmic Learning Theory 2002, Lecture Notes in Artificial Intelligence. 2533. 188-202 (2002)
Y.Suzuki、T.Shoudai、T.Miyahara、T.Uchida:“从正数据中可归纳推断出多项式时间的有序术语树语言”Proc.算法学习理论 2002 年,人工智能讲义。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Ordered term tree languages which are polynomial time inductively inferable from positive data
  • DOI:
    10.1016/j.tcs.2005.10.022
  • 发表时间:
    2002-11
  • 期刊:
  • 影响因子:
    2.8
  • 作者:
    Yusuke Suzuki;Takayoshi Shoudai;Tomoyuki Uchida;T. Miyahara
  • 通讯作者:
    Yusuke Suzuki;Takayoshi Shoudai;Tomoyuki Uchida;T. Miyahara
Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistruc-tured Documents.
从半结构化文档中发现具有可收缩变量的最大频繁标签树模式。
Efficient learning of ordered and unordered tree patterns with contractible variables
具有可收缩变量的有序和无序树模式的高效学习
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

SHOUDAI Takayoshi其他文献

Exact Learning of Primitive Formal Systems Defining Labeled Ordered Tree Languages via Queries
通过查询定义带标签有序树语言的原始形式系统的精确学习
  • DOI:
    10.1587/transinf.2018fcp0011
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0.7
  • 作者:
    UCHIDA Tomoyuki;MATSUMOTO Satoshi;SHOUDAI Takayoshi;SUZUKI Yusuke;MIYAHARA Tetsuhiro
  • 通讯作者:
    MIYAHARA Tetsuhiro
An Efficient Pattern Matching Algorithm for Unordered Term Tree Patterns of Bounded Dimension
有界维无序词树模式的高效模式匹配算法

SHOUDAI Takayoshi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('SHOUDAI Takayoshi', 18)}}的其他基金

Design and Analysis of Efficient Class-oriented Graph Mining Systems
高效的面向类的图挖掘系统的设计与分析
  • 批准号:
    23500182
  • 财政年份:
    2011
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Machine learning theory for graph pattern languages and its applications to graph mining
图模式语言的机器学习理论及其在图挖掘中的应用
  • 批准号:
    20500016
  • 财政年份:
    2008
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Polynomial Time Algorithms for Learning Graph Structured Pattern Languages and its Applications
图结构化模式语言学习的多项式时间算法及其应用
  • 批准号:
    17500009
  • 财政年份:
    2005
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)

相似国自然基金

Understanding structural evolution of galaxies with machine learning
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目

相似海外基金

TRUST2 - Improving TRUST in artificial intelligence and machine learning for critical building management
TRUST2 - 提高关键建筑管理的人工智能和机器学习的信任度
  • 批准号:
    10093095
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Collaborative R&D
Quantum Machine Learning for Financial Data Streams
金融数据流的量子机器学习
  • 批准号:
    10073285
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Feasibility Studies
Explainable machine learning for electrification of everything
可解释的机器学习,实现万物电气化
  • 批准号:
    LP230100439
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Linkage Projects
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Research Grant
Machine Learning for Computational Water Treatment
用于计算水处理的机器学习
  • 批准号:
    EP/X033244/1
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Research Grant
Postdoctoral Fellowship: OPP-PRF: Leveraging Community Structure Data and Machine Learning Techniques to Improve Microbial Functional Diversity in an Arctic Ocean Ecosystem Model
博士后奖学金:OPP-PRF:利用群落结构数据和机器学习技术改善北冰洋生态系统模型中的微生物功能多样性
  • 批准号:
    2317681
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Standard Grant
RII Track-4:NSF: Physics-Informed Machine Learning with Organ-on-a-Chip Data for an In-Depth Understanding of Disease Progression and Drug Delivery Dynamics
RII Track-4:NSF:利用器官芯片数据进行物理信息机器学习,深入了解疾病进展和药物输送动力学
  • 批准号:
    2327473
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Standard Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Continuing Grant
CC* Campus Compute: UTEP Cyberinfrastructure for Scientific and Machine Learning Applications
CC* 校园计算:用于科学和机器学习应用的 UTEP 网络基础设施
  • 批准号:
    2346717
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Standard Grant
Learning to create Intelligent Solutions with Machine Learning and Computer Vision: A Pathway to AI Careers for Diverse High School Students
学习利用机器学习和计算机视觉创建智能解决方案:多元化高中生的人工智能职业之路
  • 批准号:
    2342574
  • 财政年份:
    2024
  • 资助金额:
    $ 1.92万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了