Online Mining of Big Data Streams Using Cloud Computing

使用云计算在线挖掘大数据流

基本信息

  • 批准号:
    RGPIN-2014-06565
  • 负责人:
  • 金额:
    $ 3.93万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2017
  • 资助国家:
    加拿大
  • 起止时间:
    2017-01-01 至 2018-12-31
  • 项目状态:
    已结题

项目摘要

In a world where data are growing at extraordinary rates, there is huge demand for fast and effective analysis of big data to discover useful information for making business decisions. This research program tackles the problem of discovering useful information from big data streams. Big data streams, characterized by high volume and high velocity, have become ubiquitous as many sources (such as social networks, sensor networks and financial markets) produce data continuously and rapidly. Effectively and efficiently discovering patterns from such massive and fast-evolving data will allow businesses to quickly react to their dynamically changing environment to, for example, perform fraud detection at a point of sale, determine which ad to show, or detect spam in comments on news in which trends change quickly in time. Many challenges exist in discovering useful information from big data streams. To handle very fast data, systems have to process the data as fast as the arriving data. However, most existing data stream mining methods are sequential algorithms that run on a single machine and are limited by the memory and speed of the machine. To mine massive data, parallel and distributed computing over a cloud of computers has become a mainstream solution to achieve low latency and high scalability, and MapReduce has become a popular programming paradigm for easily writing applications that process massive data in parallel in a fault-tolerant manner. However, converting a stream mining algorithm into an online parallel MapReduce-style algorithm poses challenges. Most learning algorithms are highly sequential. Parallelizing such algorithms needs considerable efforts and may require the design of new algorithms. In addition, in stream environments, data flow into the system at a rate over which we have no control. The processing system must keep up with the data rate or degrade gracefully. Resource adaptive online learning with bounded approximation is highly needed, which has not been addressed adequately in the MapReduce-style data processing model.To address the above challenges, we will develop parallel versions of stream-mining algorithms using MapReduce-style distributed stream-processing platforms. We will build on our previous and on-going research in data mining and parallelize the stream-mining algorithms that we have developed recently, which include, but not limited to, classification rule learning, high utility pattern mining, and Monte Carlo based learning algorithms. In addition, we will develop resource-adaptive techniques for learning from big data streams. Adaptive data structures and anytime learning algorithms will be developed that can produce best possible answers under resource constraints and can utilize the extra time and memory, if given, to increase the quality of the answers. Moreover, we will identify the pros and cons in developing parallel stream mining algorithms using the state-of-the-art MapReduce-style stream processing platforms and provide feedbacks to the community as to what is further needed in these platforms for them to better serve online learning of big data streams in the cloud.Mining big and fast data streams using cloud computing is still in its infancy. The proposed research will advance the field by proposing novel solutions to its open challenges and will have a wide range of applications in various fields that produce massive data streams.
在数据以超乎寻常的速度增长的世界里,对大数据的快速有效分析有着巨大的需求,以发现有助于制定商业决策的有用信息。这项研究计划解决了从大数据流中发现有用信息的问题。随着社交网络、传感器网络和金融市场等多种来源(如社交网络、传感器网络和金融市场)不断快速地产生数据,以大容量和高速度为特征的大数据流已经变得无处不在。有效和高效地从这种海量和快速发展的数据中发现模式将使企业能够对其动态变化的环境做出快速反应,例如,在销售点执行欺诈检测,确定要显示的广告,或在趋势快速变化的新闻评论中检测垃圾邮件。从大数据流中发现有用的信息存在许多挑战。为了处理非常快的数据,系统必须与到达的数据一样快地处理数据。然而,大多数现有的数据流挖掘方法都是在单机上运行的顺序算法,并且受到机器内存和速度的限制。为了挖掘海量数据,计算机云上的并行和分布式计算已经成为实现低延迟和高可伸缩性的主流解决方案,而MapReduce已经成为一种流行的编程范例,可以轻松地编写以容错方式并行处理海量数据的应用程序。然而,将流挖掘算法转换为在线并行MapReduce式算法会带来挑战。大多数学习算法都是高度连续的。将这些算法并行化需要付出相当大的努力,并且可能需要设计新的算法。此外,在流环境中,数据以我们无法控制的速度流入系统。处理系统必须跟上数据速率或优雅地降级。为了解决上述问题,我们将利用MapReduce型分布式流处理平台开发流挖掘算法的并行版本。我们将在之前和正在进行的数据挖掘研究的基础上,并行化我们最近开发的流挖掘算法,包括但不限于分类规则学习、高实用模式挖掘和基于蒙特卡洛的学习算法。此外,我们将开发资源自适应技术,以从大数据流中学习。将开发自适应数据结构和随时学习算法,这些算法可以在资源限制的情况下产生最佳可能的答案,并可以利用额外的时间和内存(如果给定)来提高答案的质量。此外,我们会找出使用最先进的MapReduce式数据流处理平台开发并行数据流挖掘算法的优缺点,并向社区提供反馈,说明这些平台还需要哪些东西来更好地服务于云中大数据流的在线学习。使用云计算挖掘大数据和快速数据流仍处于初级阶段。拟议的研究将通过为该领域的开放挑战提出新的解决方案来推动该领域的发展,并将在产生海量数据流的各个领域中有广泛的应用。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

An, Aijun其他文献

Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
将集成采样与 SVM 集成相结合,从不平衡数据集中学习
  • DOI:
    10.1016/j.ipm.2010.11.007
  • 发表时间:
    2011-07-01
  • 期刊:
  • 影响因子:
    8.6
  • 作者:
    Liu, Yang;Yu, Xiaohui;An, Aijun
  • 通讯作者:
    An, Aijun
Finding molecular complexes through multiple layer clustering of protein interaction networks.
Clustering by common friends finds locally significant proteins mediating modules
  • DOI:
    10.1093/bioinformatics/btm064
  • 发表时间:
    2007-05-01
  • 期刊:
  • 影响因子:
    5.8
  • 作者:
    Andreopoulos, Bill;An, Aijun;Schroeder, Michael
  • 通讯作者:
    Schroeder, Michael
Detection of malicious and non-malicious website visitors using unsupervised neural network learning
  • DOI:
    10.1016/j.asoc.2012.08.028
  • 发表时间:
    2013-01-01
  • 期刊:
  • 影响因子:
    8.7
  • 作者:
    Stevanovic, Dusan;Vlajic, Natalija;An, Aijun
  • 通讯作者:
    An, Aijun
Memory-adaptive high utility sequential pattern mining over data streams
  • DOI:
    10.1007/s10994-016-5617-1
  • 发表时间:
    2017-06-01
  • 期刊:
  • 影响因子:
    7.5
  • 作者:
    Zihayat, Morteza;Chen, Yan;An, Aijun
  • 通讯作者:
    An, Aijun

An, Aijun的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('An, Aijun', 18)}}的其他基金

Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2022
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2021
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Knowledge based neural question generation from text
从文本生成基于知识的神经问题
  • 批准号:
    560815-2020
  • 财政年份:
    2021
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Alliance Grants
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPAS-2019-00082
  • 财政年份:
    2020
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2020
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2019
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPAS-2019-00082
  • 财政年份:
    2019
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Data and visual analytics for decision making in next generation media properties
用于下一代媒体资产决策的数据和视觉分析
  • 批准号:
    461898-2013
  • 财政年份:
    2019
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Collaborative Research and Development Grants
Applications of IBM Platform Computing solutions for solving graphic analytics and 3D scalable video cloud transcoder problems
应用 IBM 平台计算解决方案解决图形分析和 3D 可扩展视频云转码器问题
  • 批准号:
    461882-2013
  • 财政年份:
    2018
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Collaborative Research and Development Grants
An online integrated health risk assessment tool
在线综合健康风险评估工具
  • 批准号:
    461870-2013
  • 财政年份:
    2018
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Collaborative Research and Development Grants

相似国自然基金

基于Genome mining技术研究抑制表皮葡萄球菌生物膜形成的次级代谢产物
  • 批准号:
    21242003
  • 批准年份:
    2012
  • 资助金额:
    10.0 万元
  • 项目类别:
    专项基金项目

相似海外基金

Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2022
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2021
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPAS-2019-00082
  • 财政年份:
    2020
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2020
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPIN-2019-06799
  • 财政年份:
    2019
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
  • 批准号:
    RGPAS-2019-00082
  • 财政年份:
    2019
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Online Mining of Big Data Streams Using Cloud Computing
使用云计算在线挖掘大数据流
  • 批准号:
    RGPIN-2014-06565
  • 财政年份:
    2018
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
Online Mining of Big Data Streams Using Cloud Computing
使用云计算在线挖掘大数据流
  • 批准号:
    462308-2014
  • 财政年份:
    2016
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Accelerator Supplements
Online Mining of Big Data Streams Using Cloud Computing
使用云计算在线挖掘大数据流
  • 批准号:
    RGPIN-2014-06565
  • 财政年份:
    2016
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Discovery Grants Program - Individual
CIF: Small: Online Algorithms for Streaming Structured Big-Data Mining
CIF:小型:流式结构化大数据挖掘在线算法
  • 批准号:
    1526870
  • 财政年份:
    2015
  • 资助金额:
    $ 3.93万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了