权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Medium: Collaborative Research: Algorithms and Cyberinfrastructure for High-Precision Automated Quality Control of Hydro-Meteo Sensor Networks

III：媒介：合作研究：Hydro-Meteo 传感器网络高精度自动化质量控制的算法和网络基础设施

基本信息

批准号：
1513512
负责人：
Michael Piasecki
金额：
$ 36.35万
依托单位：
CUNY City College
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-09-01 至 2019-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1513512&HistoricalAwards=false
关键词：
III Medium Collaborative Research Algorithms

项目摘要

Advances in sensor technology are greatly expanding the range of quantities that can be measured while simultaneously reducing the cost. However, deployed sensors drift out of calibration and fail, so every sensor network requires quality control (QC) procedures to promptly detect these failures. Existing QC methods rely on human experts to carefully examine the data, which means that when the number of sensors in a network doubles, the number of experts must double too. This project will develop algorithms and software to increase the level of automation in sensor QC so that a smaller number of experts can manage a much larger network of sensors. The methods will be tested on weather data from Oklahoma (the Oklahoma Mesonet), Oregon (the Andrews Long-Term Ecological Network site), the US (the Earth Networks "WeatherBug" network), and sub-Saharan Africa (the TAHMO project), and if the methods are found to work well, they will be deployed in these networks at at the CUAHSI Water Data Center. Accurate weather data could significantly increase the productivity of farms and improve food security, particularly in Africa.The project will develop an open-source standards-compliant system, SENSOR-DX, that implements automated data QC. Existing probabilistic QC methods assume that correct sensor readings are jointly Gaussian and readings from broken sensors obey a uniform distribution. These assumptions lead to many QC mistakes. This project will develop a new approach in which novel nonparametric anomaly detection algorithms analyze the sensor data. Correct sensor readings have low anomaly scores, while broken sensor readings have high scores; both follow parametric distributions. Probabilistic methods can therefore model the distribution of the resulting anomaly scores instead of the joint distribution of the original sensor readings and infer (probabilistically) whether each sensor is working correctly. To enhance the fault-detection capability of the anomaly detection algorithms, the raw sensor data will be detrended and assembled into multiple views that highlight various correlations among sensor values. The project will develop a novel View-Anomaly-Diagnosis (VAD) framework in which anomaly detection algorithms are applied to the tuples in each view, and then the anomaly scores are combined via a probabilistic diagnostic model to infer which sensors are broken and which are functioning correctly. The project will study how good the detrending models need to be in order to enhance the accuracy of anomaly detection. The new anomaly detection algorithms are based on a new anomaly detection principle: "anomaly detection by overfitting". Existing methods fit a statistical model to "normal" behavior and then identify data points that do not fit well ("are underfit") and mark them as anomalies. The new principle measures how easy it is to "overfit" a statistical model that separates candidate anomalies from the rest of the data. The project will develop new algorithms based on this principle and understand how they relate to existing methods of anomaly detection by underfitting. The VAD framework will be implemented in the SENSOR-DX system: a series of Kepler workflows that provide support for connecting a new sensor network, training the detrending and anomaly detection models, performing real-time anomaly detection, and repairing bad sensor readings using predictive models. SENSOR-DX will also support semantic matching of new sensor data streams by extending the EnvThs controlled vocabulary thesaurus.For further information see the project web site at http://tahmo.org/sensor-dx

传感器技术的进步极大地扩展了可测量的量的范围，同时降低了成本。然而，部署的传感器会偏离校准并发生故障，因此每个传感器网络都需要质量控制（QC）程序来及时检测这些故障。现有的质量控制方法依赖于人类专家仔细检查数据，这意味着当网络中的传感器数量增加一倍时，专家的数量也必须增加一倍。该项目将开发算法和软件，以提高传感器质量控制的自动化水平，以便更少的专家可以管理更大的传感器网络。这些方法将在来自俄克拉荷马州（俄克拉荷马州Mesonet）、俄勒冈州（安德鲁斯长期生态网络站点）、美国（地球网络“WeatherBug”网络）和撒哈拉以南非洲（TAHMO项目）的天气数据上进行测试，如果发现这些方法效果良好，它们将部署在CUAHSI水数据中心的这些网络中。准确的天气数据可以显著提高农场的生产力，改善粮食安全，特别是在非洲。该项目将开发一个符合标准的开源系统SENSOR-DX，实现自动数据质量控制。现有的概率QC方法假设正确的传感器读数是联合高斯和读数从损坏的传感器服从均匀分布。这些假设导致许多QC错误。本项目将开发一种新的方法，其中新的非参数异常检测算法分析传感器数据。正确的传感器读数具有较低的异常分数，而损坏的传感器读数具有较高的分数;两者都遵循参数分布。因此，概率方法可以对所得到的异常分数的分布而不是原始传感器读数的联合分布进行建模，并（概率地）推断每个传感器是否正确工作。为了增强异常检测算法的故障检测能力，原始传感器数据将被去除趋势并组装成多个视图，这些视图突出传感器值之间的各种相关性。该项目将开发一种新的视图-异常-诊断（VAD）框架，其中异常检测算法应用于每个视图中的元组，然后通过概率诊断模型组合异常分数，以推断哪些传感器损坏，哪些传感器正常工作。该项目将研究去趋势模型需要有多好，以提高异常检测的准确性。新的异常检测算法基于一种新的异常检测原理：“过拟合异常检测”。现有的方法将统计模型拟合到“正常”行为，然后识别出拟合不好的数据点（“拟合不足”）并将其标记为异常。新的原则衡量了“过拟合”统计模型的容易程度，该模型将候选异常与其余数据分开。该项目将根据这一原则开发新的算法，并了解它们与现有的欠拟合异常检测方法的关系。 VAD框架将在SENSOR-DX系统中实施：一系列Kepler工作流，为连接新的传感器网络、训练去趋势和异常检测模型、执行实时异常检测以及使用预测模型修复不良传感器读数提供支持。SENSOR-DX还将通过扩展EnvThs控制词汇词库来支持新传感器数据流的语义匹配。有关更多信息，请访问项目网站http://tahmo.org/sensor-dx