Outlier Detection in High-Dimensional Big Data using Bio-Inspired Methods for Emerging Applications in Engineering, Healthcare, and Business
使用仿生方法进行高维大数据中的异常值检测,用于工程、医疗保健和商业领域的新兴应用
基本信息
- 批准号:RGPIN-2017-04192
- 负责人:
- 金额:$ 1.75万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In this research program, I will explore, design, and analyze innovative algorithms for outlier detection in high-dimensional big data using bio-inspired approaches, and apply the new methods to emerging applications in engineering (namely, Intrusion detection in computer networks), business (namely, fraud detection in corporate financial statements), and healthcare (namely, detecting abnormalities in patient's vital signals).
Big data, characterized by four (and sometimes more) V's of Volume, Velocity, Variety, and Veracity, is defined as a collection of data sets so large, dynamic, and complex that it becomes difficult to process using traditional data analytics techniques. In high dimensional spaces, distances between points become relatively uniform, and the notion of the nearest neighbors of a data point becomes meaningless. A high dimensional data has also numerous permutations of sub-spaces which are practically infeasible to be examined all. Processing such large-scale multi-dimensional data is computationally complex and expensive.
Detecting outliers (objects considerably dissimilar and inconsistent with respect to the majority of data) in Big data, especially in high-dimensional data and in the presence of noise, is an important research problem which has drawn many attentions in research community due to scientific challenges it introduces, and a wide range of real-world applications it supports including in engineering, healthcare, business, environment, and public security.
In this research program, I will explore novel techniques for dimension reduction, data summarization, and feature transformation running on distributed platforms, combined with ensemble of models to make fast and accurate detection of outliers. I will explore bio-inspired algorithms to search a large space of permutations with fitness functions minimizing sparsity of the samples in selected sub-spaces.
Analysis of Big data relies on scalable distributed platforms such as Hadoop (which supports MapReduce structure for analysis of large data in parallel), and Spark (a fast in-memory engine for large scale data processing.). In my Knowledge Discovery and Data Mining Lab, we have experimented with processing tasks in parallel using Hadoop and Spark. Building on these experiences, we will design and implement our novel solutions on distributed platforms.
The solutions and algorithms discovered in this research program will be applied to emerging applications in 3 areas of engineering (analyzing large volume of high-dimensional data generated by Internet traffic to detect intrusion in the network), business (detecting financial fraudulent activities in a real dataset of more than 4000 firms provided by Bloomberg, and CompuStat), and healthcare (analyzing vital signals collected from patients including temperature, heartbeat, blood pressure, and ECG signals to detect anomalies).
在这项研究计划中,我将探索,设计和分析创新算法,用于使用生物启发的方法在高维大数据中进行离群值检测,并将新方法应用于工程(即计算机网络中的入侵检测),商业(即公司财务报表中的欺诈检测)和医疗保健(即检测患者生命信号中的异常)中的新兴应用。
大数据的特征是四个(有时甚至更多)V,即数量、速度、多样性和准确性,它被定义为一组如此庞大、动态和复杂的数据集,以至于使用传统的数据分析技术很难处理。在高维空间中,点之间的距离变得相对均匀,数据点的最近邻居的概念变得毫无意义。 一个高维数据也有许多排列的子空间,这是实际上是不可行的,以检查所有。 处理这样的大规模多维数据在计算上是复杂且昂贵的。
检测大数据中的异常值(相对于大多数数据非常不相似和不一致的对象),特别是在高维数据和存在噪声的情况下,是一个重要的研究问题,由于其引入的科学挑战,以及它支持的广泛的现实世界应用,包括工程,医疗保健,商业,环境和公共安全,因此引起了研究界的广泛关注。
在这个研究项目中,我将探索在分布式平台上运行的降维,数据汇总和特征转换的新技术,结合模型集成来快速准确地检测离群值。我将探索生物启发的算法,以搜索一个大的空间的排列与健身功能,最大限度地减少稀疏的样本在选定的子空间。
大数据分析依赖于可扩展的分布式平台,如Hadoop(支持MapReduce结构,用于并行分析大数据)和Spark(用于大规模数据处理的快速内存引擎)。在我的知识发现和数据挖掘实验室中,我们已经尝试使用Hadoop和Spark并行处理任务。在这些经验的基础上,我们将在分布式平台上设计和实施我们的新解决方案。
在这个研究项目中发现的解决方案和算法将应用于3个工程领域的新兴应用(分析互联网流量产生的大量高维数据,以检测网络中的入侵),业务(在彭博社和CompuStat提供的4000多家公司的真实的数据集中检测金融欺诈活动),和医疗保健(分析从患者收集的生命信号,包括体温、心跳、血压和ECG信号,以检测异常)。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Raahemi, Bijan其他文献
Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method
使用增量三训练方法利用未标记数据改进点对点流量分类
- DOI:
10.1007/s12083-008-0022-6 - 发表时间:
2009-06-01 - 期刊:
- 影响因子:4.2
- 作者:
Raahemi, Bijan;Zhong, Weicai;Liu, Jing - 通讯作者:
Liu, Jing
A new density-based subspace selection method using mutual information for high dimensional outlier detection
- DOI:
10.1016/j.knosys.2020.106733 - 发表时间:
2021-01-23 - 期刊:
- 影响因子:8.8
- 作者:
Riahi-Madvar, Mahboobeh;Azirani, Ahmad Akbari;Raahemi, Bijan - 通讯作者:
Raahemi, Bijan
Identifying high-cost patients using data mining techniques and a small set of non-trivial attributes
- DOI:
10.1016/j.compbiomed.2014.07.005 - 发表时间:
2014-10-01 - 期刊:
- 影响因子:7.7
- 作者:
Shenas, Seyed Abdolmotalleb Izad;Raahemi, Bijan;Kuziemsky, Craig - 通讯作者:
Kuziemsky, Craig
Detecting financial restatements using data mining techniques
- DOI:
10.1016/j.eswa.2017.08.030 - 发表时间:
2017-12-30 - 期刊:
- 影响因子:8.5
- 作者:
Dutta, Ila;Dutta, Shantanu;Raahemi, Bijan - 通讯作者:
Raahemi, Bijan
Raahemi, Bijan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Raahemi, Bijan', 18)}}的其他基金
Outlier Detection in High-Dimensional Big Data using Bio-Inspired Methods for Emerging Applications in Engineering, Healthcare, and Business
使用仿生方法进行高维大数据中的异常值检测,用于工程、医疗保健和商业领域的新兴应用
- 批准号:
RGPIN-2017-04192 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Outlier Detection in High-Dimensional Big Data using Bio-Inspired Methods for Emerging Applications in Engineering, Healthcare, and Business
使用仿生方法进行高维大数据中的异常值检测,用于工程、医疗保健和商业领域的新兴应用
- 批准号:
RGPIN-2017-04192 - 财政年份:2019
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Outlier Detection in High-Dimensional Big Data using Bio-Inspired Methods for Emerging Applications in Engineering, Healthcare, and Business
使用仿生方法进行高维大数据中的异常值检测,用于工程、医疗保健和商业领域的新兴应用
- 批准号:
RGPIN-2017-04192 - 财政年份:2018
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Outlier Detection in High-Dimensional Big Data using Bio-Inspired Methods for Emerging Applications in Engineering, Healthcare, and Business
使用仿生方法进行高维大数据中的异常值检测,用于工程、医疗保健和商业领域的新兴应用
- 批准号:
RGPIN-2017-04192 - 财政年份:2017
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Estimating Bus Passengers' Origin Destination Travel Route using Data Analytics on Wi-Fi and Bluetooth Signals
利用 Wi-Fi 和蓝牙信号的数据分析来估计公交车乘客的出发地、目的地旅行路线
- 批准号:
514854-2017 - 财政年份:2017
- 资助金额:
$ 1.75万 - 项目类别:
Engage Grants Program
Feature Engineering using Bio-Inspired Methods for the Internet Data Analytics
使用仿生方法进行互联网数据分析的特征工程
- 批准号:
341811-2012 - 财政年份:2016
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Feature Engineering using Bio-Inspired Methods for the Internet Data Analytics
使用仿生方法进行互联网数据分析的特征工程
- 批准号:
341811-2012 - 财政年份:2015
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
Capturing and analyzing data from SensorSuite's sevices using big data analytics techniques
使用大数据分析技术从 SensorSuite 服务捕获和分析数据
- 批准号:
477440-2014 - 财政年份:2014
- 资助金额:
$ 1.75万 - 项目类别:
Engage Grants Program
Capturing and analyzing data from Giatec's testing devices using web applications
使用 Web 应用程序从 Giatec 测试设备捕获和分析数据
- 批准号:
463717-2014 - 财政年份:2014
- 资助金额:
$ 1.75万 - 项目类别:
Engage Grants Program
Feature Engineering using Bio-Inspired Methods for the Internet Data Analytics
使用仿生方法进行互联网数据分析的特征工程
- 批准号:
341811-2012 - 财政年份:2014
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Graphon mean field games with partial observation and application to failure detection in distributed systems
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
Mixed-Dimensional 2D/0D Heterostructures for Infrared Detection
用于红外检测的混合维 2D/0D 异质结构
- 批准号:
DP230101847 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Projects
ATD: Efficient and Effective Algorithms for Detection of Anomalies in High-dimensional Spatiotemporal Data with Large Amounts of Missing Data
ATD:高效且有效的高维时空数据异常检测算法
- 批准号:
2318925 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Repurposing low-dimensional hybrid perovskites for the detection of low-energy photons
重新利用低维杂化钙钛矿来检测低能光子
- 批准号:
2313648 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Continuing Grant
Integration of a deep probabilistic model and an outlier detection method with an attention mechanism and its application to super-high dimensional time series data
深度概率模型与带有注意力机制的异常值检测方法的集成及其在超高维时间序列数据中的应用
- 批准号:
23H03357 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
ATD: Diffusion and Transport on Graphs: Active Learning, Low-Dimensional Representations, and Anomaly Detection
ATD:图上的扩散和传输:主动学习、低维表示和异常检测
- 批准号:
2318894 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Standard Grant
Bayesian Modeling and Inference for High-Dimensional Disease Mapping and Boundary Detection"
用于高维疾病绘图和边界检测的贝叶斯建模和推理”
- 批准号:
10568797 - 财政年份:2023
- 资助金额:
$ 1.75万 - 项目类别:
Thermal hotspots detection in nanoscale two-dimensional electronics
纳米级二维电子学中的热热点检测
- 批准号:
DE220100487 - 财政年份:2022
- 资助金额:
$ 1.75万 - 项目类别:
Discovery Early Career Researcher Award
Optical generation and detection of GHz-THz acoustic waves in 1- and 2-dimensional nano-scale periodic structures and their application
一维和二维纳米级周期性结构中GHz-THz声波的光学产生和检测及其应用
- 批准号:
22K04938 - 财政年份:2022
- 资助金额:
$ 1.75万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
DImensional Attention MOdelling for Neglect Detection (DIAMOND): A novel application for brain injury
用于忽视检测的维度注意力模型(DIAMOND):脑损伤的新应用
- 批准号:
nhmrc : 2002362 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Ideas Grants
A comprehensive study of the three-dimensional left ventricular flow subject to aortic valve regurgitation: Toward a predictive model for earlier disease detection and treatment
对主动脉瓣关闭不全的三维左心室血流的综合研究:建立早期疾病检测和治疗的预测模型
- 批准号:
546163-2020 - 财政年份:2021
- 资助金额:
$ 1.75万 - 项目类别:
Postdoctoral Fellowships