Study of High-speed Data Mining Algorithms from Massive Data Streams
海量数据流高速数据挖掘算法研究
基本信息
- 批准号:15300036
- 负责人:
- 金额:$ 9.98万
- 依托单位:
- 依托单位国家:日本
- 项目类别:Grant-in-Aid for Scientific Research (B)
- 财政年份:2003
- 资助国家:日本
- 起止时间:2003 至 2005
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In this research, we investigated high-speed online knowledge discovery system for extracting useful information from massive semi-structured data streams. Particularly in this year, as theoretical researches, we extended further the theory of efficient pattern matching and pattern discovery methods for online streams. As application studies, we made a series of experiments on collection and analysis of network data from real high-speed networks in a huge organization. We have also published the results obtained in the research period of the last three years. In particular, we proceed the studies on the following issues:(1)Survey on semi-structured data : We have summarized and published a survey on stream data mining in an academic journal, which has been studied through this project for the last three years.(2)Study on streaming pattern matching technology for semi-structured data : We developed an efficient method for performing tree pattern matching with horizontal wildcards by bit parallel technology, which potentially gives drastic speed-up for Xpath and XQuery pattern matching languages for huge XML data.(3)Study on sequential and streaming pattern discovery technology for semi-structured data : We developed efficient algorithms for finding interesting patterns from massive data streams for various classes of complex patterns/motifs. In this year, we also published pattern discovery algorithms developed in the last year. Also, one of them got awarded for 2004 JSAI SIG AWARD.(4)Empirical study on knowledge discovery from real massive network data : As applications, we performed a series of surveys on data collection and online analysis of high-speed large-scale network for middle sized organization at Kyushu University. These experiments will give insights for future research on the development of efficient pattern matching/discovery algorithms for high-speed streaming data.
在本研究中,我们研究了高速在线知识发现系统,从大量的半结构化数据流中提取有用的信息。特别是在今年,作为理论研究,我们进一步扩展了在线流的有效模式匹配和模式发现方法的理论。作为应用研究,我们在一个大型组织中对真实的高速网络进行了一系列的网络数据采集和分析实验。我们还公布了过去三年研究期间取得的成果。(1)半结构化数据综述:我们总结并在学术期刊上发表了一份关于流数据挖掘的综述,该综述是本项目近三年来的研究成果。(2)半结构化数据流模式匹配技术的研究:提出了一种基于位并行技术的水平通配符树模式匹配方法,该方法可以大大提高XPath和XQuery模式匹配语言在处理海量XML数据时的速度。(3)半结构化数据的顺序和流式模式发现技术研究:针对各类复杂模式/基元,提出了从海量数据流中发现感兴趣模式的有效算法。今年,我们还发表了去年开发的模式发现算法。此外,其中一人获得了2004年JSAI SIG奖。(4)从真实的海量网络数据中进行知识发现的实证研究:作为应用,我们在九州大学进行了一系列针对中型组织的高速大规模网络数据收集和在线分析的调查。这些实验将为未来的研究提供见解,为高速流数据的高效模式匹配/发现算法的发展。
项目成果
期刊论文数量(40)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Kensuke Baba et al.: "On the Length of the Minimum Solution of Word Equations in One Variable"Lecture Notes in Computer Science. 2747. 189-197 (2003)
Kensuke Baba 等人:“论单变量词方程的最小解的长度”计算机科学讲义。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Shunsuke Inenaga et al.: "Linear-time off-line text compression by longest-first substitution"Lecture Notes in Computer Science. 8572. 137-152 (2003)
Shunsuke Inenaga 等人:“通过最长优先替换进行线性时间离线文本压缩”计算机科学讲义。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Special Issue on Algorithmic Learning Thoery
算法学习理论特刊
- DOI:
- 发表时间:2005
- 期刊:
- 影响因子:0
- 作者:Sanjay Jain;Hiroki Arimura
- 通讯作者:Hiroki Arimura
Faster Pattern Matching Algorithm for Arc-Annotated Sequences
用于弧注释序列的更快模式匹配算法
- DOI:
- 发表时间:2006
- 期刊:
- 影响因子:0
- 作者:Heikki Hyyro;他2名;遠藤基郎;Takuya Kida
- 通讯作者:Takuya Kida
A Polynomial Space and Polynomial Delay Algorithm for Enumeration of Maximal Motifs in a Sequence
用于枚举序列中最大图案的多项式空间和多项式延迟算法
- DOI:
- 发表时间:2005
- 期刊:
- 影响因子:0
- 作者:H.Arimura;T.Uno
- 通讯作者:T.Uno
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
IKEDA Daisuke其他文献
IKEDA Daisuke的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('IKEDA Daisuke', 18)}}的其他基金
Hierarchical Discovery of Sub-structures and Rare Patterns of Them in Large Text Data
大文本数据中子结构及其罕见模式的分层发现
- 批准号:
24300059 - 财政年份:2012
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Test of Radar Echo Detection using Electron Beam for Future Large Air Shower Observatory
未来大型风淋室天文台电子束雷达回波探测试验
- 批准号:
23654078 - 财政年份:2011
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Evolution of fast skeletal myosin heavy chain genes of fish
鱼类快速骨骼肌球蛋白重链基因的进化
- 批准号:
23780214 - 财政年份:2011
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Research on statistical discovery of a wide var i ety of patterns with low frequencies and its applications
多种低频模式的统计发现及其应用研究
- 批准号:
21650031 - 财政年份:2009
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
The origin and purpose of fast skeletal muscle myosin heavy chain gene cluster of vertebrates
脊椎动物快骨骼肌肌球蛋白重链基因簇的起源和目的
- 批准号:
21780198 - 财政年份:2009
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Pattern Discovery from Large Text Data Based on the Property of Languages Being Scale-Free
基于语言无标度特性的大文本数据模式发现
- 批准号:
19700150 - 财政年份:2007
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
相似海外基金
FightAMR: Novel global One Health surveillance approach to fight AMR using Artificial Intelligence and big data mining
FightAMR:利用人工智能和大数据挖掘对抗 AMR 的新型全球统一健康监测方法
- 批准号:
MR/Y034422/1 - 财政年份:2024
- 资助金额:
$ 9.98万 - 项目类别:
Research Grant
Travel: Student Support for the 2023 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023)
旅行:2023 年 ACM SIGKDD 知识发现和数据挖掘会议 (KDD 2023) 的学生支持
- 批准号:
2323492 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Standard Grant
BIGDATA: IA: Collaborative Research: Asynchronous Distributed Machine Learning Framework for Multi-Site Collaborative Brain Big Data Mining
BIGDATA:IA:协作研究:用于多站点协作大脑大数据挖掘的异步分布式机器学习框架
- 批准号:
2348159 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Standard Grant
Travel: III: Student Travel Support for 2023 ACM International Conference on Web Search and Data Mining (WSDM)
差旅:III:2023 年 ACM 网络搜索和数据挖掘国际会议 (WSDM) 学生差旅支持
- 批准号:
2245056 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Standard Grant
Collaborative Research: Quantifying the Global Electric Circuit by Data Mining of Electric Field and Radar Observations from Ground Based, Airborne and Satellite Platforms
合作研究:通过地面、机载和卫星平台的电场和雷达观测数据挖掘来量化全球电路
- 批准号:
2328464 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Standard Grant
Learning Precision Medicine for Rare Diseases Empowered by Knowledge-driven Data Mining
通过知识驱动的数据挖掘学习罕见疾病的精准医学
- 批准号:
10732934 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Screening of transition metal oxide electrocatalysts in alkaline media based on data mining and theoretical analysis
基于数据挖掘和理论分析的碱性介质过渡金属氧化物电催化剂筛选
- 批准号:
23K13599 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Research Initiation Award: Thermal Decomposition of Four-membered Heterocyclic Peroxides, Data Mining in Nonadiabatic Trajectories, and Chemiexcitation Efficiency
研究启动奖:四元杂环过氧化物的热分解、非绝热轨迹数据挖掘、化学激发效率
- 批准号:
2300321 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Standard Grant
Travel: NSF Student Travel Support for the 2023 IEEE International Conference on Data Mining (IEEE ICDM 2023)
旅行:2023 年 IEEE 国际数据挖掘会议 (IEEE ICDM 2023) 的 NSF 学生旅行支持
- 批准号:
2324784 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Standard Grant
Deep Pattern Mining for Brain Graph Analysis: A Data Mining Perspective
用于脑图分析的深度模式挖掘:数据挖掘的视角
- 批准号:
LP210301259 - 财政年份:2023
- 资助金额:
$ 9.98万 - 项目类别:
Linkage Projects