权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

ATD: Anomaly Detection with Confidence and Precision

ATD：充满信心且精确的异常检测

基本信息

批准号：
2027855
负责人：
Minge Xie
金额：
$ 37.22万
依托单位：
Rutgers University New Brunswick
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-08-01 至 2024-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2027855&HistoricalAwards=false
关键词：
ATD Anomaly Detection Confidence Precision

项目摘要

Detection of threats is critical to national and global security. With tremendous amounts of surveillance and other types of data currently available, a systematic and formal quantitative approach to threat detection is needed. Threats are often preceded by abnormal behavior; early threat detection thus becomes detection of abnormal behavior. Statistically, detection of abnormal behavior is essentially detection of outliers from a "usual" distribution or a "usual" relationship. While missing a threat may have devastating impacts to society, a false detection also has negative impacts. This project aims to develop a new outlier detection framework under which the confidence level of detection is formulated in terms of the level of false positives and is precisely determined to aid decision makers for more informed resource planning. The development provides novel statistical approaches for threat detection utilizing a wide range of data sources. This outlier detection method and the confidence level determination tool are general statistical methods that form a useful framework for many threat detection and risk assessment problems. They will enrich the theory and methodology of statistics, produce a new statistical analytical toolkit, and contribute to general data science, since outlier identification is a crucial stage of data cleaning for valid downstream analysis. The investigators will actively engage in activities related to education and research training of graduate and undergraduate students, especially attracting minority and women students into the fields of statistics and statistical applications, and introducing them to areas that are important to global and national security.Although outlier detection algorithms have been extensively studied, most existing methods do not provide an uncertainty assessment and rely on an ad hoc rule to make judgment calls. The “Conformity Outlier Detection” (COD) framework under development in this project can overcome the shortcoming and provide detection with a theoretically guaranteed confidence level. This development is based on a state-of-art non-parametric predictive inference tool in machine learning and statistics, known as conformal prediction. It can provide accurate assessment of risk and uncertainty with little assumption on the data and can be applied broadly. Under this new COD framework, the project explores two outlier detection procedures. The first is distribution-free and is suitable for any data set that provides pairwise similarity measures between subjects. It can be used for outlier detection of a broad class of unconventional data sets often encountered in counter-terrorism surveillance (e.g. text data with word-use frequency similarity measure, communication pattern changes, voice similarities, network changes, and many others). The second is a model-based procedure to detect an abnormal deviation from a "usual" relationship or behavior. The detection method is robust against model misspecification under certain settings. In addition, since heterogeneity is commonly seen in large data sets, the project includes extension of the COD procedures to precision contextual outlier detection under the general individualized learning framework. Lastly, the project aims to demonstrate the approaches in a specific setting with sparely observed spatial and temporal count processes that are commonly encountered in surveillance of remote areas. This project will support one graduate student per year.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

发现威胁对国家和全球安全至关重要。由于目前有大量的监视和其他类型的数据可用，因此需要一种系统和正式的定量方法来检测威胁。威胁之前通常会有异常行为；因此，早期的威胁检测变成了异常行为的检测。统计上，异常行为的检测本质上是检测“通常”分布或“通常”关系中的异常值。虽然错过威胁可能会对社会造成毁灭性的影响，但错误的检测也会产生负面影响。该项目旨在开发一种新的异常值检测框架，在该框架下，根据假阳性水平制定检测的置信度，并精确确定，以帮助决策者进行更明智的资源规划。该发展为利用广泛的数据源进行威胁检测提供了新的统计方法。这种异常值检测方法和置信水平确定工具是一种通用的统计方法，为许多威胁检测和风险评估问题提供了有用的框架。他们将丰富统计学的理论和方法，产生新的统计分析工具包，并为一般数据科学做出贡献，因为离群值识别是有效下游分析数据清理的关键阶段。调查人员将积极参与研究生和本科生的教育和研究培训活动，特别是吸引少数民族和妇女学生进入统计和统计应用领域，并将他们介绍给对全球和国家安全重要的领域。尽管离群值检测算法已经得到了广泛的研究，但大多数现有方法都没有提供不确定性评估，而是依赖于一个特定的规则来进行判断。本项目正在开发的“一致性异常点检测”（COD）框架可以克服这一缺点，为检测提供理论上有保证的置信度。这一发展是基于机器学习和统计学中最先进的非参数预测推理工具，称为保形预测。它可以提供准确的风险和不确定性评估，对数据的假设很少，可以广泛应用。在这个新的COD框架下，该项目探索了两种异常值检测程序。第一种是无分布的，适用于任何提供主题之间成对相似性度量的数据集。它可以用于在反恐监视中经常遇到的广泛类别的非常规数据集的异常值检测（例如具有单词使用频率相似度量的文本数据，通信模式变化，语音相似度，网络变化等）。第二种是基于模型的程序，用于检测与“通常”关系或行为的异常偏差。在一定的设定下，该检测方法对模型错配具有鲁棒性。此外，由于异质性在大型数据集中很常见，该项目包括将COD程序扩展到一般个性化学习框架下的精确上下文异常值检测。最后，该项目旨在展示在偏远地区监测中经常遇到的具有稀疏观测的空间和时间计数过程的特定环境中的方法。该项目每年将资助一名研究生。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。