权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Medium: Collaborative Research: Algorithms and Cyberinfrastructure for High-Precision Automated Quality Control of Hydro-Meteo Sensor Networks

III：媒介：合作研究：Hydro-Meteo 传感器网络高精度自动化质量控制的算法和网络基础设施

基本信息

批准号：
1514550
负责人：
Thomas Dietterich
金额：
$ 63.55万
依托单位：
Oregon State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-09-01 至 2019-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1514550&HistoricalAwards=false
关键词：
III Medium Collaborative Research Algorithms

项目摘要

Advances in sensor technology are greatly expanding the range of quantities that can be measured while simultaneously reducing the cost. However, deployed sensors drift out of calibration and fail, so every sensor network requires quality control (QC) procedures to promptly detect these failures. Existing QC methods rely on human experts to carefully examine the data, which means that when the number of sensors in a network doubles, the number of experts must double too. This project will develop algorithms and software to increase the level of automation in sensor QC so that a smaller number of experts can manage a much larger network of sensors. The methods will be tested on weather data from Oklahoma (the Oklahoma Mesonet), Oregon (the Andrews Long-Term Ecological Network site), the US (the Earth Networks "WeatherBug" network), and sub-Saharan Africa (the TAHMO project), and if the methods are found to work well, they will be deployed in these networks at at the CUAHSI Water Data Center. Accurate weather data could significantly increase the productivity of farms and improve food security, particularly in Africa.The project will develop an open-source standards-compliant system, SENSOR-DX, that implements automated data QC. Existing probabilistic QC methods assume that correct sensor readings are jointly Gaussian and readings from broken sensors obey a uniform distribution. These assumptions lead to many QC mistakes. This project will develop a new approach in which novel nonparametric anomaly detection algorithms analyze the sensor data. Correct sensor readings have low anomaly scores, while broken sensor readings have high scores; both follow parametric distributions. Probabilistic methods can therefore model the distribution of the resulting anomaly scores instead of the joint distribution of the original sensor readings and infer (probabilistically) whether each sensor is working correctly. To enhance the fault-detection capability of the anomaly detection algorithms, the raw sensor data will be detrended and assembled into multiple views that highlight various correlations among sensor values. The project will develop a novel View-Anomaly-Diagnosis (VAD) framework in which anomaly detection algorithms are applied to the tuples in each view, and then the anomaly scores are combined via a probabilistic diagnostic model to infer which sensors are broken and which are functioning correctly. The project will study how good the detrending models need to be in order to enhance the accuracy of anomaly detection. The new anomaly detection algorithms are based on a new anomaly detection principle: "anomaly detection by overfitting". Existing methods fit a statistical model to "normal" behavior and then identify data points that do not fit well ("are underfit") and mark them as anomalies. The new principle measures how easy it is to "overfit" a statistical model that separates candidate anomalies from the rest of the data. The project will develop new algorithms based on this principle and understand how they relate to existing methods of anomaly detection by underfitting. The VAD framework will be implemented in the SENSOR-DX system: a series of Kepler workflows that provide support for connecting a new sensor network, training the detrending and anomaly detection models, performing real-time anomaly detection, and repairing bad sensor readings using predictive models. SENSOR-DX will also support semantic matching of new sensor data streams by extending the EnvThs controlled vocabulary thesaurus.For further information see the project web site at http://tahmo.org/sensor-dx

传感器技术的进步极大地扩大了可以测量的数量范围，同时降低了成本。然而，部署的传感器偏离校准并出现故障，因此每个传感器网络都需要质量控制（QC）程序来及时检测这些故障。现有的质量控制方法依赖于人类专家仔细检查数据，这意味着当网络中的传感器数量增加一倍时，专家的数量也必须增加一倍。该项目将开发算法和软件，以提高传感器QC的自动化水平，以便少数专家可以管理更大的传感器网络。这些方法将在来自俄克拉何马州（俄克拉何马Mesonet）、俄勒冈州（安德鲁斯长期生态网络站点）、美国（地球网络“WeatherBug”网络）和撒哈拉以南非洲（TAHMO项目）的天气数据上进行测试，如果发现这些方法效果良好，它们将被部署在CUAHSI水数据中心的这些网络中。准确的天气数据可以显著提高农场的生产力，改善粮食安全，特别是在非洲。该项目将开发一个符合开源标准的系统SENSOR-DX，实现自动化数据质量控制。现有的概率QC方法假设正确的传感器读数是共同的高斯分布，而损坏传感器的读数服从均匀分布。这些假设导致了许多QC错误。该项目将开发一种新的方法，其中新的非参数异常检测算法分析传感器数据。正确的传感器读数异常分数低，而损坏的传感器读数异常分数高；两者都遵循参数分布。因此，概率方法可以模拟结果异常分数的分布，而不是原始传感器读数的联合分布，并（概率地）推断每个传感器是否正常工作。为了提高异常检测算法的故障检测能力，原始传感器数据将被去趋势化并组装成多个视图，以突出传感器值之间的各种相关性。该项目将开发一种新的视图异常诊断（VAD）框架，其中将异常检测算法应用于每个视图中的元组，然后通过概率诊断模型将异常分数组合起来，以推断哪些传感器损坏，哪些传感器正常工作。该项目将研究需要多好的趋势模型来提高异常检测的准确性。新的异常检测算法基于一种新的异常检测原理：“过拟合异常检测”。现有的方法将统计模型拟合为“正常”行为，然后识别不太适合的数据点（“欠拟合”）并将其标记为异常。新原则衡量了将候选异常数据与其他数据分开的统计模型“过拟合”的难易程度。该项目将基于这一原理开发新的算法，并了解它们与现有的欠拟合异常检测方法之间的关系。VAD框架将在sensor - dx系统中实施：一系列开普勒工作流程为连接新的传感器网络、训练趋势和异常检测模型、执行实时异常检测以及使用预测模型修复不良传感器读数提供支持。sensor - dx还将通过扩展EnvThs受控词汇表来支持新的传感器数据流的语义匹配。欲了解更多信息，请参阅该项目的网站http://tahmo.org/sensor-dx