Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
基本信息
- 批准号:RGPIN-2019-06799
- 负责人:
- 金额:$ 3.5万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Big data streams are continuous flows of data that arrive in high volume and high velocity. Such data streams have become ubiquitous as many sources, such as sensor networks, business transactions and surveillance cameras, produce data continuously and rapidly. There has been a growing demand for real-time online analyzing and learning of big data streams so that patterns and models can be learned from such data in a timely manner to support fast decision making in a dynamic environment, for example, performing fraud detection at a point of sale. However, although machine learning techniques have become very effective in many applications, such as computer vision and speech recognition, learning a complex model (such as a deep neural network) from big data can be very time-consuming, making it impractical to work online. The objective of this research program is to accelerate machine learning programs and make it work in an online fashion. We will develop novel techniques for parallelizing machine learning models to speed up the learning process. We will address the following challenges in online learning of data streams. First, online algorithms are often constrained by space and time. Not all data can be stored. Algorithms often need to process the data in a single pass. However, many machine learning algorithms require a large number of passes over the data to find good solutions. How to adapt such learning algorithms to work online is an open challenge. Second, streaming data evolve over time with unknown dynamics, a phenomenon known as concept drift. Online learning from data streams should keep the model up to date and quickly adapt it to concept drift. Although methods have been proposed to train drift-adaptive online models, little has been done on adaptively learning complex nonlinear functions. Third, in streaming environments, data flow into the system at an unpredictable rate and the available resource may change dynamically due to resource sharing with other computing jobs. The processing system must keep up with the data rate and resource change. Resource adaptive online learning is highly needed. We will develop novel strategies for handling concept drift and resource constraints for learning complex functions online. Parallel and distributed solutions over multiple processors or machines will be developed to speed up learning. Strategies that make trade-offs between resource consumption and the accuracy of a learned model according to resource conditions will be investigated. Anytime learning algorithms will be designed that can produce a best possible answer according to real-time constraints. Distributed online mining of big and fast data streams is still far from mature. The proposed research will advance the field by proposing novel solutions to its open challenges and will have a wide range of applications in various fields, e.g., fraud detection in real time and image recognition in a dynamic environment.
大数据流是以高容量和高速度到达的连续数据流。这样的数据流已经变得无处不在,因为许多源(诸如传感器网络、商业交易和监控摄像机)连续且快速地产生数据。对大数据流的实时在线分析和学习的需求不断增长,以便可以及时地从这些数据中学习模式和模型,以支持动态环境中的快速决策,例如,在销售点执行欺诈检测。然而,尽管机器学习技术在许多应用中变得非常有效,例如计算机视觉和语音识别,但从大数据中学习复杂模型(例如深度神经网络)可能非常耗时,使得在线工作变得不切实际。 该研究计划的目标是加速机器学习程序,并使其以在线方式工作。我们将开发并行化机器学习模型的新技术,以加快学习过程。我们将解决数据流在线学习中的以下挑战。首先,在线算法通常受到空间和时间的限制。并非所有数据都可以存储。算法通常需要在单次传递中处理数据。然而,许多机器学习算法需要对数据进行大量的遍历才能找到好的解决方案。如何使这种学习算法适应在线工作是一个开放的挑战。第二,流数据随着时间的推移以未知的动态演变,这种现象称为概念漂移。从数据流中进行在线学习应该使模型保持最新,并快速适应概念漂移。虽然已经提出了训练漂移自适应在线模型的方法,但对复杂非线性函数的自适应学习却做得很少。第三,在流环境中,数据以不可预测的速率流入系统,并且由于与其他计算作业的资源共享,可用资源可能动态地改变。处理系统必须跟上数据速率和资源变化。资源适应性在线学习是非常必要的。 我们将开发新的策略来处理概念漂移和资源限制,以在线学习复杂的功能。将开发多处理器或机器上的并行和分布式解决方案,以加快学习速度。将研究根据资源条件在资源消耗和学习模型的准确性之间进行权衡的策略。随时学习算法将被设计成可以根据实时约束产生最佳可能答案。 大规模快速数据流的分布式在线挖掘还远未成熟。拟议的研究将通过提出新的解决方案来推动该领域的开放性挑战,并将在各个领域具有广泛的应用,例如,真实的实时欺诈检测和动态环境中的图像识别。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
An, Aijun其他文献
Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
将集成采样与 SVM 集成相结合,从不平衡数据集中学习
- DOI:
10.1016/j.ipm.2010.11.007 - 发表时间:
2011-07-01 - 期刊:
- 影响因子:8.6
- 作者:
Liu, Yang;Yu, Xiaohui;An, Aijun - 通讯作者:
An, Aijun
Finding molecular complexes through multiple layer clustering of protein interaction networks.
- DOI:
10.1504/ijbra.2007.011835 - 发表时间:
2007-01-01 - 期刊:
- 影响因子:0
- 作者:
Andreopoulos, Bill;An, Aijun;Wang, Xiaogang - 通讯作者:
Wang, Xiaogang
Clustering by common friends finds locally significant proteins mediating modules
- DOI:
10.1093/bioinformatics/btm064 - 发表时间:
2007-05-01 - 期刊:
- 影响因子:5.8
- 作者:
Andreopoulos, Bill;An, Aijun;Schroeder, Michael - 通讯作者:
Schroeder, Michael
Detection of malicious and non-malicious website visitors using unsupervised neural network learning
- DOI:
10.1016/j.asoc.2012.08.028 - 发表时间:
2013-01-01 - 期刊:
- 影响因子:8.7
- 作者:
Stevanovic, Dusan;Vlajic, Natalija;An, Aijun - 通讯作者:
An, Aijun
Memory-adaptive high utility sequential pattern mining over data streams
- DOI:
10.1007/s10994-016-5617-1 - 发表时间:
2017-06-01 - 期刊:
- 影响因子:7.5
- 作者:
Zihayat, Morteza;Chen, Yan;An, Aijun - 通讯作者:
An, Aijun
An, Aijun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('An, Aijun', 18)}}的其他基金
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPIN-2019-06799 - 财政年份:2022
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Knowledge based neural question generation from text
从文本生成基于知识的神经问题
- 批准号:
560815-2020 - 财政年份:2021
- 资助金额:
$ 3.5万 - 项目类别:
Alliance Grants
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPAS-2019-00082 - 财政年份:2020
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPIN-2019-06799 - 财政年份:2020
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPIN-2019-06799 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPAS-2019-00082 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Data and visual analytics for decision making in next generation media properties
用于下一代媒体资产决策的数据和视觉分析
- 批准号:
461898-2013 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative Research and Development Grants
Applications of IBM Platform Computing solutions for solving graphic analytics and 3D scalable video cloud transcoder problems
应用 IBM 平台计算解决方案解决图形分析和 3D 可扩展视频云转码器问题
- 批准号:
461882-2013 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative Research and Development Grants
An online integrated health risk assessment tool
在线综合健康风险评估工具
- 批准号:
461870-2013 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Collaborative Research and Development Grants
Online Mining of Big Data Streams Using Cloud Computing
使用云计算在线挖掘大数据流
- 批准号:
RGPIN-2014-06565 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
online SPE/HPLC-ICP-MS多元素形态分析新方法研究荷塘中铬砷镉汞铅的迁移转化规律
- 批准号:21976048
- 批准年份:2019
- 资助金额:65.0 万元
- 项目类别:面上项目
双积分政策下基于Online Review的新能源汽车企业跨链决策优化研究
- 批准号:71964023
- 批准年份:2019
- 资助金额:27.5 万元
- 项目类别:地区科学基金项目
面向Online-to-Offline智能商务的大数据融合与应用
- 批准号:91646204
- 批准年份:2016
- 资助金额:201.0 万元
- 项目类别:重大研究计划
Online-to-Offline商务环境下"切客"一族生活模式挖掘研究
- 批准号:71172046
- 批准年份:2011
- 资助金额:41.0 万元
- 项目类别:面上项目
相似海外基金
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPIN-2019-06799 - 财政年份:2022
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPAS-2019-00082 - 财政年份:2020
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPIN-2019-06799 - 财政年份:2020
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPIN-2019-06799 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Adaptive Online Mining of Big Data Streams
大数据流的自适应在线挖掘
- 批准号:
RGPAS-2019-00082 - 财政年份:2019
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Online Mining of Big Data Streams Using Cloud Computing
使用云计算在线挖掘大数据流
- 批准号:
RGPIN-2014-06565 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Mining Online Social Networks and Hidden Web Data Sources by Sampling
通过采样挖掘在线社交网络和隐藏的网络数据源
- 批准号:
RGPIN-2014-04463 - 财政年份:2018
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Online Mining of Big Data Streams Using Cloud Computing
使用云计算在线挖掘大数据流
- 批准号:
RGPIN-2014-06565 - 财政年份:2017
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Mining Online Social Networks and Hidden Web Data Sources by Sampling
通过采样挖掘在线社交网络和隐藏的网络数据源
- 批准号:
RGPIN-2014-04463 - 财政年份:2017
- 资助金额:
$ 3.5万 - 项目类别:
Discovery Grants Program - Individual
Realtime Data Mining for Online Activities
在线活动的实时数据挖掘
- 批准号:
16K12430 - 财政年份:2016
- 资助金额:
$ 3.5万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research