Imperfect data: accuracy, impacts and extraction of meaningful information

不完美数据:准确性、影响和有意义信息的提取

基本信息

  • 批准号:
    EP/J020230/1
  • 负责人:
  • 金额:
    $ 8.82万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2012
  • 资助国家:
    英国
  • 起止时间:
    2012 至 无数据
  • 项目状态:
    已结题

项目摘要

Meaningful information is a fundamental requirement for informed, logical and reasoned activity. Extracting meaningful information from data can, however, be a challenge, especially given problems that data may, amongst other things, be inaccurate, incomplete, and possibly contradictory as arise from a variety of sources of variable quality and trust level. Data imperfections are a generic problem in information extraction and decision making and so the work is relevant in many disciplines. Imperfect data are, for example, evident in medical diagnosis (e.g. a patient's test results are typically only an imperfect indicator of a condition), in defining nature reserves for species conservation (e.g. the species distribution maps and models are often highly sensitive to 'absence' data - was the species actually present but not observed?) and in security and defence applications (e.g. sub-pixel target detection algorithms applied to surveillance imagery vary in performance and utility between environments). Some problems with imperfect data were recently highly apparent in relation to the response to the Haiti earthquake of 2010, especially in relation to damage mapping to inform relief activities. Vast amounts of well-intentioned assistance was provided by numerous professional and amateur bodies with unprecedented data rates but the volumes of data and the problems with them were a concerns. Key problems were that maps were inaccurate, inconsistent and sometimes contradictory. As such a major mapping challenges arises in how to work with such data. One key issue is the need for information on the accuracy of data sources and methods to help use imperfect data. This project seeks to contribute to this task. It aims to illustrate the impacts of using imperfect data, explore methods to characterise the quality of the data and methods to combine data sources to yield an enhanced product of known accuracy.A range of methods will be used but the core focus is on the use of latent class modelling. This type of analysis is based on multiple observations or data from a variety of sources. The relationships between the observers/data sources are used to attempt to explain their quality and suggest how the data could be interpreted to yield information. The approach is a form of statistical modelling and is highly attractive for the specific research proposal because if a model can be formed that fits the observed data, then model's parameters define the accuracy of the data sources and its outputs can be used to form new products of known accuracy. As such the modelling analysis may add value to data by indicating its quality and combining it usefully for extraction of information.As the problems of imperfect data are generic the proposal has broad potential impacts. For the specific DaISy call there are clear impacts in relation to security and defence. For example methods that enable rapid and qualified information to be derived from sources of variable accuracy, completeness and trust level will increase effectiveness and the quality of decision making. Additionally as a model based approach it removes/reduces the need for reference data to be acquired for validation which could otherwise require deployment of personnel to dangerous locations and so of considerable benefit to health and well-being.
有意义的信息是知情、逻辑和理性活动的基本要求。然而,从数据中提取有意义的信息可能是一项挑战,特别是考虑到数据可能不准确、不完整,以及可能由于各种质量和信任程度不同的来源而相互矛盾的问题。数据不完整性是信息提取和决策中的一个普遍问题,因此这项工作与许多学科有关。例如,不完美的数据在医学诊断中是显而易见的(例如,患者的测试结果通常只是一个不完美的状况指标),在定义自然保护区进行物种保护时(例如,物种分布图和模型通常对“缺失”数据高度敏感-物种实际存在但未被观察到吗?)以及在安全和防御应用中(例如,应用于监视图像的亚像素目标检测算法在不同环境之间的性能和效用不同)。最近,在2010年海地地震的应对工作中,数据不完善的一些问题非常明显,特别是在为救济活动提供信息的损害绘图方面。许多专业和业余机构以前所未有的数据速率提供了大量善意的援助,但数据量及其问题令人关切。关键问题是地图不准确、不一致,有时甚至相互矛盾。因此,在如何处理这些数据方面出现了重大的制图挑战。一个关键问题是需要关于数据来源和方法准确性的信息,以帮助使用不完善的数据。本项目旨在为这一任务作出贡献。它旨在说明使用不完美数据的影响,探索提高数据质量的方法,以及联合收割机数据源的方法,以产生已知准确度的增强产品。将使用一系列方法,但核心重点是使用潜在类建模。这种类型的分析是基于多个观察或来自各种来源的数据。观察员/数据来源之间的关系被用来试图解释其质量,并建议如何解释数据以产生信息。该方法是一种统计建模的形式,对于具体的研究建议非常有吸引力,因为如果可以形成一个模型,适合观察到的数据,那么模型的参数定义了数据源的准确性,其输出可以用于形成已知准确性的新产品。因此,建模分析可通过表明数据质量并将其有效地结合起来以提取信息,从而增加数据的价值,由于数据不完善的问题是普遍存在的,因此该建议具有广泛的潜在影响。对于具体的DaISy调用,在安全和国防方面有明显的影响。例如,能够从准确性、完整性和可信度各不相同的来源迅速获得合格信息的方法,将提高决策的有效性和质量。此外,作为一种基于模型的方法,它消除/减少了对获取用于验证的参考数据的需要,否则这可能需要将人员部署到危险位置,因此对健康和福祉有相当大的好处。

项目成果

期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Assessing the Accuracy of Volunteered Geographic Information arising from Multiple Contributors to an Internet Based Collaborative Project
  • DOI:
    10.1111/tgis.12033
  • 发表时间:
    2013-12-01
  • 期刊:
  • 影响因子:
    2.4
  • 作者:
    Foody, G. M.;See, L.;Boyd, D. S.
  • 通讯作者:
    Boyd, D. S.
Exploring the accuracy of crowdsourced annotations of post-disaster building damage derived from fine spatial resolution satellite sensor data.
探索从精细空间分辨率卫星传感器数据得出的灾后建筑损坏众包注释的准确性。
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Foody G. M.
  • 通讯作者:
    Foody G. M.
Rating the quality of post-disaster damage maps: Mapping building damage after the 2010 Haiti earthquake
评估灾后受损地图的质量:绘制 2010 年海地地震后的建筑物受损情况图
  • DOI:
    10.1109/igarss.2013.6721249
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Foody G
  • 通讯作者:
    Foody G
Increasing the Accuracy of Crowdsourced Information on Land Cover via a Voting Procedure Weighted by Information Inferred from the Contributed Data
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Giles Foody其他文献

DeepWaterFraction: A globally applicable, self-training deep learning approach for percent surface water area estimation from Landsat mission imagery
DeepWaterFraction:一种全球适用的自训练深度学习方法,用于根据 Landsat 任务图像估算地表水域百分比
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    6.4
  • 作者:
    Zhen Hao;Giles Foody;Yong Ge;Xiaobin Cai;Yun Du;Feng Ling
  • 通讯作者:
    Feng Ling

Giles Foody的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研究基金项目
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
  • 批准号:
  • 批准年份:
    2020
  • 资助金额:
    40 万元
  • 项目类别:
基于高频信息下高维波动率矩阵估计及应用
  • 批准号:
    71901118
  • 批准年份:
    2019
  • 资助金额:
    18.0 万元
  • 项目类别:
    青年科学基金项目
半参数空间自回归面板模型的有效估计与应用研究
  • 批准号:
    71961011
  • 批准年份:
    2019
  • 资助金额:
    16.0 万元
  • 项目类别:
    地区科学基金项目
高频数据波动率统计推断、预测与应用
  • 批准号:
    71971118
  • 批准年份:
    2019
  • 资助金额:
    50.0 万元
  • 项目类别:
    面上项目
基于个体分析的投影式非线性非负张量分解在高维非结构化数据模式分析中的研究
  • 批准号:
    61502059
  • 批准年份:
    2015
  • 资助金额:
    19.0 万元
  • 项目类别:
    青年科学基金项目
基于Linked Open Data的Web服务语义互操作关键技术
  • 批准号:
    61373035
  • 批准年份:
    2013
  • 资助金额:
    77.0 万元
  • 项目类别:
    面上项目
体数据表达与绘制的新方法研究
  • 批准号:
    61170206
  • 批准年份:
    2011
  • 资助金额:
    55.0 万元
  • 项目类别:
    面上项目
一类新Regime-Switching模型及其在金融建模中的应用研究
  • 批准号:
    11061041
  • 批准年份:
    2010
  • 资助金额:
    24.0 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

Cellular Phenotypes of Genetic Variants in Mucopolysaccharidosis
粘多糖贮积症遗传变异的细胞表型
  • 批准号:
    10638709
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
iTEST: Introspective Accuracy as a Novel Target for Functioning in Psychotic Disorders
iTEST:内省准确性作为精神障碍功能的新目标
  • 批准号:
    10642405
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Mobile Health and Oral Testing to Optimize Tuberculosis Contact Tracing in Colombia
移动健康和口腔测试可优化哥伦比亚的结核病接触者追踪
  • 批准号:
    10667885
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Cognitive and brain imaging correlates of apathy- components in asymptomatic middle aged individuals at high ADRD- risk
认知和脑成像与 ADRD 高风险无症状中年个体的冷漠成分相关
  • 批准号:
    10875019
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Digital monitoring of autonomic activity to detect empathy loss in behavioral variant frontotemporal dementia
对自主活动进行数字监测以检测行为变异型额颞叶痴呆的同理心丧失
  • 批准号:
    10722938
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Spatial Profiling of Melanocytic Tumors and Their Microenvironment
黑素细胞肿瘤及其微环境的空间分析
  • 批准号:
    10729434
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Accurate and Reliable Diagnostics for Injured Children: Machine Learning for Ultrasound
为受伤儿童提供准确可靠的诊断:超声机器学习
  • 批准号:
    10572582
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Accuracy and Feasibility of Non-Invasive Anemia Screening Assistant (ASIST) Device in Resource-Limited Settings
资源有限环境中非侵入性贫血筛查辅助 (ASIST) 设备的准确性和可行性
  • 批准号:
    10575222
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
Biomarker-Guided Evaluation of Glycated Testing Modalities for Dysglycemia among Persons Living with HIV (BEGET)
HIV 感染者血糖异常的生物标志物引导糖化检测方式评估 (BEGET)
  • 批准号:
    10751444
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
CTSA RC2 Program at University of Utah: A Translational Platform for Rapid Genomic Medicine
犹他大学 CTSA RC2 项目:快速基因组医学的转化平台
  • 批准号:
    10622189
  • 财政年份:
    2023
  • 资助金额:
    $ 8.82万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了