Dealing with Extreme Class Imbalance Learning in Defense and Security Applications
处理国防和安全应用中的极端类别不平衡学习
基本信息
- 批准号:RGPIN-2014-04889
- 负责人:
- 金额:$ 1.89万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2014
- 资助国家:加拿大
- 起止时间:2014-01-01 至 2015-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Defense and Security applications, such as threat monitoring, e.g., the detection of hazardous atmospheric emissions, underwater mines, or computer network attacks, must all deal with the same underpinning problem: the extreme scarcity and disparity of data describing an event in need of detection. This creates a significant challenge for machine learning applications. Sampling methods have aimed to increase the amount of event data, but in extreme cases in which only a handful of event data are available, current methods of this kind have not yet been successful. The defense and security community's approach has typically been to generate simulated data, based on domain knowledge, to 'complement' the missing event data. However, given the usual simplicity of the simulation models, this type of response is generally unsatisfactory. Another machine learning response has been to use one-class learning (outlier detection) approaches to model background data, which is typically plentiful, and to send a signal when a 'suspected' outlier is encountered. However, current approaches of this kind are typically much less powerful than their binary-class counterparts. The research program that I propose in this Discovery Grant application will effectively address the extreme class imbalance problem. In particular, I propose a new approach called Negative Learning. Negative Learning consists of recognizing that although we do not have enough instances of abnormal/threat data, we have many instances of normal/background ones. Based on this fact, we consider the abnormal class to be any and all instances missing from the normal class, and we propose to appropriately sample from this “negative of the normal class”. The technique will be based on the following generative approach composed of four steps. In the first step, a probability density function will be derived from the background data. In the second step, new event data will be artificially generated in the low probability regions of that model, because data in those regions can be thought of as borderline instances of the abnormal class. This means that unlike in simple methods like the Synthetic Minority Over-sampling Technique (SMOTE) approach, which only generates new samples within the convex hull described by the available data, I propose to generate data far beyond the convex hull of the available ones. The third step will consist of refining the artificial data set produced by the first and second step using domain knowledge and user guidance. In particular, I will use active learning and its derivatives as well as domain knowledge integration to augment the abnormal class with data not generated by the first two steps, and to trim it by eliminating redundant or implausible data. The last step of the process will be the application of binary classifiers to the newly generated data set and the evaluation of the overall system as compared to other techniques currently available (binary- and one-class based). This research should have a significant impact on Defense and Security because it will make the application of Machine Learning techniques to the problems encountered in the field much more realistic. The techniques developed for this purpose will also find applications in other fields such as in the medical domain and in text mining. This research will allow two Ph.D. and three Master’s student to study under my supervision and carry out their studies from beginning to end.
国防和安全应用,如威胁监控,例如,对有害大气排放物、水下地雷或计算机网络攻击的探测都必须处理同一个基本问题:描述需要探测的事件的数据极其稀缺和不一致。这给机器学习应用带来了巨大的挑战。采样方法旨在增加事件数据量,但在只有少数事件数据可用的极端情况下,目前这种方法尚未成功。国防和安全界的方法通常是基于领域知识生成模拟数据,以“补充”缺失的事件数据。然而,考虑到模拟模型通常的简单性,这种类型的响应通常不能令人满意。另一种机器学习响应是使用一类学习(异常值检测)方法来对通常丰富的背景数据进行建模,并在遇到“疑似”异常值时发送信号。然而,目前这种方法通常比它们的二进制类对应物功能要弱得多。我在这个发现补助金申请中提出的研究计划将有效地解决极端的阶级不平衡问题。特别是,我提出了一种新的方法,称为消极学习。消极学习包括认识到,虽然我们没有足够的异常/威胁数据实例,但我们有许多正常/背景数据实例。基于这一事实,我们认为异常类是正常类中缺失的任何和所有实例,并且我们建议从这个“正常类的阴性”中适当地采样。该技术将基于由四个步骤组成的以下生成方法。在第一步中,将从背景数据导出概率密度函数。在第二步中,将在该模型的低概率区域中人工生成新的事件数据,因为这些区域中的数据可以被认为是异常类的边界实例。这意味着,与合成少数过采样技术(SMOTE)方法等简单方法不同,该方法仅在可用数据描述的凸船体内生成新样本,我建议生成远远超出可用数据的凸船体的数据。第三步将包括使用领域知识和用户指导来完善第一步和第二步产生的人工数据集。特别是,我将使用主动学习及其衍生工具以及领域知识集成来使用前两步未生成的数据来增强异常类,并通过消除冗余或不可信的数据来修剪它。该过程的最后一步将是对新生成的数据集应用二元分类器,并与目前可用的其他技术(二元和单类)相比,对整个系统进行评价。这项研究应该对国防和安全产生重大影响,因为它将使机器学习技术在该领域遇到的问题中的应用更加现实。为此目的而开发的技术也将在其他领域中找到应用,例如在医疗领域和文本挖掘中。这项研究将允许两个博士。和三名硕士生在我的指导下学习,并从头到尾进行他们的学习。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Japkowicz, Nathalie其他文献
Threaded ensembles of autoencoders for stream learning
- DOI:
10.1111/coin.12146 - 发表时间:
2018-02-01 - 期刊:
- 影响因子:2.8
- 作者:
Dong, Yue;Japkowicz, Nathalie - 通讯作者:
Japkowicz, Nathalie
Anomaly Detection and Repair for Accurate Predictions in Geo-distributed Big Data
- DOI:
10.1016/j.bdr.2019.04.001 - 发表时间:
2019-07-01 - 期刊:
- 影响因子:3.3
- 作者:
Corizzo, Roberto;Ceci, Michelangelo;Japkowicz, Nathalie - 通讯作者:
Japkowicz, Nathalie
Warning: statistical benchmarking is addictive. Kicking the habit in machine learning
- DOI:
10.1080/09528130903010295 - 发表时间:
2010-01-01 - 期刊:
- 影响因子:2.2
- 作者:
Drummond, Chris;Japkowicz, Nathalie - 通讯作者:
Japkowicz, Nathalie
The class imbalance problem in deep learning
- DOI:
10.1007/s10994-022-06268-8 - 发表时间:
2022-12-28 - 期刊:
- 影响因子:7.5
- 作者:
Ghosh, Kushankur;Bellinger, Colin;Japkowicz, Nathalie - 通讯作者:
Japkowicz, Nathalie
Scalable auto-encoders for gravitational waves detection from time series data
- DOI:
10.1016/j.eswa.2020.113378 - 发表时间:
2020-08-01 - 期刊:
- 影响因子:8.5
- 作者:
Corizzo, Roberto;Ceci, Michelangelo;Japkowicz, Nathalie - 通讯作者:
Japkowicz, Nathalie
Japkowicz, Nathalie的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Japkowicz, Nathalie', 18)}}的其他基金
Dealing with Extreme Class Imbalance Learning in Defense and Security Applications
处理国防和安全应用中的极端类别不平衡学习
- 批准号:
RGPIN-2014-04889 - 财政年份:2016
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Dealing with Extreme Class Imbalance Learning in Defense and Security Applications
处理国防和安全应用中的极端类别不平衡学习
- 批准号:
RGPIN-2014-04889 - 财政年份:2015
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Predicting traffic safety based on weather events
根据天气事件预测交通安全
- 批准号:
484326-2015 - 财政年份:2015
- 资助金额:
$ 1.89万 - 项目类别:
Engage Grants Program
Predicting network failures using anomaly detection methods
使用异常检测方法预测网络故障
- 批准号:
485098-2015 - 财政年份:2015
- 资助金额:
$ 1.89万 - 项目类别:
Engage Grants Program
A visualization framework for machine learning evaluation
机器学习评估的可视化框架
- 批准号:
228118-2009 - 财政年份:2013
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A visualization framework for machine learning evaluation
机器学习评估的可视化框架
- 批准号:
228118-2009 - 财政年份:2012
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Track correlation and association using GMTI/AIS/ARPA
使用 GMTI/AIS/ARPA 跟踪相关性和关联性
- 批准号:
442461-2012 - 财政年份:2012
- 资助金额:
$ 1.89万 - 项目类别:
Engage Grants Program
Developing advanced techniques for sampling online social networks
开发在线社交网络采样的先进技术
- 批准号:
431154-2012 - 财政年份:2012
- 资助金额:
$ 1.89万 - 项目类别:
Engage Grants Program
A visualization framework for machine learning evaluation
机器学习评估的可视化框架
- 批准号:
228118-2009 - 财政年份:2011
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A visualization framework for machine learning evaluation
机器学习评估的可视化框架
- 批准号:
228118-2009 - 财政年份:2010
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
The demographic consequences of extreme weather events in Australia
澳大利亚极端天气事件对人口的影响
- 批准号:
DP240102733 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Projects
Attributable impacts from extreme weather events
极端天气事件的影响
- 批准号:
NE/Z000203/1 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Research Grant
MCA: Cellular Responses to Thermal Stress in Antarctic Fishes: Dynamic Re-structuring of the Proteome in Extreme Stenotherms
MCA:南极鱼类对热应激的细胞反应:极端钝温鱼蛋白质组的动态重组
- 批准号:
2322117 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
RII Track-4:NSF: Improving subseasonal-to-seasonal forecasts of Central Pacific extreme hydrometeorological events and their impacts in Hawaii
RII Track-4:NSF:改进中太平洋极端水文气象事件的次季节到季节预报及其对夏威夷的影响
- 批准号:
2327232 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
Collaborative Research: Extreme Mechanics of the Human Brain via Integrated In Vivo and Ex Vivo Mechanical Experiments
合作研究:通过体内和离体综合力学实验研究人脑的极限力学
- 批准号:
2331294 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
Rossbypalooza 2024: A Student-led Summer School on Climate and Extreme Events Conference; Chicago, Illinois; July 22-August 2, 2024
Rossbypalooza 2024:学生主导的气候和极端事件暑期学校会议;
- 批准号:
2406927 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
Collaborative Research: DMREF: Closed-Loop Design of Polymers with Adaptive Networks for Extreme Mechanics
合作研究:DMREF:采用自适应网络进行极限力学的聚合物闭环设计
- 批准号:
2413579 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
REU Site: Research Experience for Undergraduates in Resilience Against Extreme Weather Events
REU 网站:本科生抵御极端天气事件的研究经验
- 批准号:
2349250 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
Advancing understanding of interannual variability and extreme events in the thermal structure of large lakes under historical and future climate scenarios
增进对历史和未来气候情景下大型湖泊热结构的年际变化和极端事件的了解
- 批准号:
2319044 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
New biocatalysts for selective chemical oxidations under extreme conditions
用于极端条件下选择性化学氧化的新型生物催化剂
- 批准号:
DP240101500 - 财政年份:2024
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Projects