Distributed Data Mining Systems for Structured Web Data
结构化 Web 数据的分布式数据挖掘系统
基本信息
- 批准号:14580423
- 负责人:
- 金额:$ 1.92万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (C)
- 财政年份:2002
- 资助国家:日本
- 起止时间:2002 至 2004
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In this research, we studied knowledge discovery from semistructured Web documents such as HTML/XML files. Graph or tree-based data mining and discovery of frequent structures in graph or tree structured data have been extensively studied. Our target of discovery is neither a simply frequent pattern nor a maximally frequent pattern with respect to syntactic sizes of patterns such as the number of vertices. In order to extract useful information from heterogeneous semistructured Web documents, our target of discovery is a semantically and maximally tree structured pattern which represents a common characteristic in semistructured documents. As a representation of a tree structured pattern, we proposed an ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and internal structured variables.A term tree is different from other representations of tree structured patterns in that a term tree has structured variables which can be substituted by arbitrary trees. First of all, we deeply studied the learnabilities of classes of term tree languages and gave fundamental classes of term tree languages which are polynomial time learnable. We proved that some classes of term tree languages are polynomial time inductively inferable from positive data, which include the class of linear term tree languages with multiple child-port variables, the class of linear term tree languages with contractible variables which are adjacent to leaves, and the class of linear term tree languages with height-constrained variables and no variable chain. Moreover, we showed that some classes of linear term tree languages are exactly learnable in polynomial time using queries.Finally, we presented a metasearch system which uses our efficient learning algorithms for term trees. We implemented this system and showed that it provides an effective unified access to multiple existing search sites.
在这项研究中,我们研究了从半结构化的Web文档,如HTML/XML文件的知识发现。基于图或树的数据挖掘和发现图或树结构数据中的频繁结构已被广泛研究。我们的目标是发现既不是一个简单的频繁模式,也不是一个最大频繁模式的语法大小的模式,如顶点的数量。为了从异构的半结构化Web文档中提取有用的信息,我们的目标是发现一个语义和最大树结构模式,它代表了半结构化文档中的一个共同特征。作为树结构模式的一种表示,我们提出了一种有序树模式,称为术语树,它是由有序子树和内部结构变量组成的有根树模式,术语树与其他树结构模式表示的不同之处在于,术语树中的结构变量可以被任意树替换.首先,我们深入研究了术语树语言类的可学习性,给出了多项式时间可学习的术语树语言的基本类。我们证明了某些类型的项树语言是多项式时间归纳推理的,这些类型包括具有多个子端口变量的线性项树语言,具有与叶子相邻的可收缩变量的线性项树语言,以及具有高度约束变量且没有变量链的线性项树语言.此外,我们还证明了某些线性项树语言是可以在多项式时间内通过查询精确学习的。最后,我们给出了一个元数据库系统,该系统使用了我们的有效的项树学习算法。我们实现了这个系统,并表明它提供了一个有效的统一访问多个现有的搜索网站。
项目成果
期刊论文数量(68)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Y.Suzuki, T.Shoudai, T.Miyahara, T.Uchida: "Ordered Term Tree Languages Which Are Polynomial Time Inductively Inferable from Positive Data"Proc.Algorithmic Learning Theory 2002, Lecture Notes in Artificial Intelligence. 2533. 188-202 (2002)
Y.Suzuki、T.Shoudai、T.Miyahara、T.Uchida:“从正数据中可归纳推断出多项式时间的有序术语树语言”Proc.算法学习理论 2002 年,人工智能讲义。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Learning of Ordered Tree Languages with Height-Bounded Variables Using Queries
使用查询学习具有高度限制变量的有序树语言
- DOI:
- 发表时间:2004
- 期刊:
- 影响因子:0
- 作者:Daisuke Ibuki;Souya Michitsuji;Norihiko Ono;Isao Ono;Satoshi Matsumoto
- 通讯作者:Satoshi Matsumoto
Ordered term tree languages which are polynomial time inductively inferable from positive data
- DOI:10.1016/j.tcs.2005.10.022
- 发表时间:2002-11
- 期刊:
- 影响因子:2.8
- 作者:Yusuke Suzuki;Takayoshi Shoudai;Tomoyuki Uchida;T. Miyahara
- 通讯作者:Yusuke Suzuki;Takayoshi Shoudai;Tomoyuki Uchida;T. Miyahara
Discovery of Maximally Frequent Tag Tree Patterns with Contractible Variables from Semistruc-tured Documents.
从半结构化文档中发现具有可收缩变量的最大频繁标签树模式。
- DOI:
- 发表时间:2004
- 期刊:
- 影响因子:0
- 作者:T.Miyahara;Y.Suzuki;T.Shoudai;T.Uchida;K.Takahashi;H.Ueda
- 通讯作者:H.Ueda
Efficient learning of ordered and unordered tree patterns with contractible variables
具有可收缩变量的有序和无序树模式的高效学习
- DOI:
- 发表时间:2003
- 期刊:
- 影响因子:0
- 作者:Yusuke Suzuki
- 通讯作者:Yusuke Suzuki
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
SHOUDAI Takayoshi其他文献
Exact Learning of Primitive Formal Systems Defining Labeled Ordered Tree Languages via Queries
通过查询定义带标签有序树语言的原始形式系统的精确学习
- DOI:
10.1587/transinf.2018fcp0011 - 发表时间:
2019 - 期刊:
- 影响因子:0.7
- 作者:
UCHIDA Tomoyuki;MATSUMOTO Satoshi;SHOUDAI Takayoshi;SUZUKI Yusuke;MIYAHARA Tetsuhiro - 通讯作者:
MIYAHARA Tetsuhiro
An Efficient Pattern Matching Algorithm for Unordered Term Tree Patterns of Bounded Dimension
有界维无序词树模式的高效模式匹配算法
- DOI:
10.1587/transfun.e101.a.1344 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
SHOUDAI Takayoshi;MIYAHARA Tetsuhiro;UCHIDA Tomoyuki;MATSUMOTO Satoshi;SUZUKI Yusuke - 通讯作者:
SUZUKI Yusuke
SHOUDAI Takayoshi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('SHOUDAI Takayoshi', 18)}}的其他基金
Design and Analysis of Efficient Class-oriented Graph Mining Systems
高效的面向类的图挖掘系统的设计与分析
- 批准号:
23500182 - 财政年份:2011
- 资助金额:
$ 1.92万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Machine learning theory for graph pattern languages and its applications to graph mining
图模式语言的机器学习理论及其在图挖掘中的应用
- 批准号:
20500016 - 财政年份:2008
- 资助金额:
$ 1.92万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Polynomial Time Algorithms for Learning Graph Structured Pattern Languages and its Applications
图结构化模式语言学习的多项式时间算法及其应用
- 批准号:
17500009 - 财政年份:2005
- 资助金额:
$ 1.92万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
相似国自然基金
Understanding structural evolution of galaxies with machine learning
- 批准号:
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Continuing Grant
RII Track-4:NSF: Physics-Informed Machine Learning with Organ-on-a-Chip Data for an In-Depth Understanding of Disease Progression and Drug Delivery Dynamics
RII Track-4:NSF:利用器官芯片数据进行物理信息机器学习,深入了解疾病进展和药物输送动力学
- 批准号:
2327473 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
CC* Campus Compute: UTEP Cyberinfrastructure for Scientific and Machine Learning Applications
CC* 校园计算:用于科学和机器学习应用的 UTEP 网络基础设施
- 批准号:
2346717 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
Learning to create Intelligent Solutions with Machine Learning and Computer Vision: A Pathway to AI Careers for Diverse High School Students
学习利用机器学习和计算机视觉创建智能解决方案:多元化高中生的人工智能职业之路
- 批准号:
2342574 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
Collaborative Research: Conference: DESC: Type III: Eco Edge - Advancing Sustainable Machine Learning at the Edge
协作研究:会议:DESC:类型 III:生态边缘 - 推进边缘的可持续机器学习
- 批准号:
2342498 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
Excellence in Research:Towards Data and Machine Learning Fairness in Smart Mobility
卓越研究:实现智能移动中的数据和机器学习公平
- 批准号:
2401655 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
I-Corps: Translation potential of using machine learning to predict oxaliplatin chemotherapy benefit in early colon cancer
I-Corps:利用机器学习预测奥沙利铂化疗对早期结肠癌疗效的转化潜力
- 批准号:
2425300 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
CAREER: Mitigating the Lack of Labeled Training Data in Machine Learning Based on Multi-level Optimization
职业:基于多级优化缓解机器学习中标记训练数据的缺乏
- 批准号:
2339216 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Continuing Grant
Postdoctoral Fellowship: OPP-PRF: Leveraging Community Structure Data and Machine Learning Techniques to Improve Microbial Functional Diversity in an Arctic Ocean Ecosystem Model
博士后奖学金:OPP-PRF:利用群落结构数据和机器学习技术改善北冰洋生态系统模型中的微生物功能多样性
- 批准号:
2317681 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Standard Grant
Accelerated discovery of ultra-fast ionic conductors with machine learning
通过机器学习加速超快离子导体的发现
- 批准号:
24K08582 - 财政年份:2024
- 资助金额:
$ 1.92万 - 项目类别:
Grant-in-Aid for Scientific Research (C)














{{item.name}}会员




