权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Data integration for large scale ecological models

大规模生态模型的数据集成

基本信息

批准号：
NE/R005133/1
负责人：
Nicholas Isaac
金额：
$ 4.09万
依托单位：
NERC CEH (Up to 30.11.2019)
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2017
资助国家：
英国
起止时间：
2017 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=NE%2FR005133%2F1
关键词：
Data integration large scale ecological

项目摘要

Ecological models are becoming larger, more complicated, and being used for an increasingly wide range of applications, from describing trends and mapping distributions to understanding mechanistic relationships and predicting the impact of future scenarios. In response, there has been a huge growth in statistical methods for large-scale ecological models. However, most such methods do not account for the fact that ecological data is inherently heterogeneous, and large datasets typically contain many forms of bias.Recently, a set of hierarchical Bayesian models (HBMs) have emerged as promising ways for dealing with biased data, particularly for occurrence records and other unstructured data. Many millions of unstructured occurrence records exist, so the potential of these new methods is enormous. Not all data contain biases, though. A minority of biodiversity data is highly structured in terms of the sample locations, fixed protocols and regular sampling. Ideally, we'd like to retain the information about this in our models, but combine it with the much larger sample sizes of unstructured datasets.Integrated models provide a way to do this. They are a subclass of HBM in which data heterogeneity is modelled explicitly, by treating datasets with different observation processes as independent realisations of the same underlying state. For example, causal observations on GBIF and the Breeding Bird Survey both contain information about whether the population of a particular species was extant at a particular point in space and time.At present, these integrated models are the preserve of highly competent statisticians. They are hard to specify and difficult to fit and diagnose. One goal of this partnership is to build an extensible framework for fitting integrated models that will make them accessible to a broad community of ecological modellers. This framework, in the form of open source tools, will make it easier for ecologists to handle biased data when addressing large-scale questions about biodiversity.Although attractive from a conceptual standpoint, it is unclear whether the sophistication of integrated models deliver real benefits over simple ones. In particular there is an urgent need for some general principles about how to proceed when both structured and unstructured data sources are available. Critical questions include:Q1. When and how should we combine datasets with different properties?Q2. Under what circumstances is simple aggregation (i.e. ignoring the different observation processes) better than integration? Q3. If we suspect the data contain biases, can we detect them and handle them adequately?Q4. What are the most appropriate metrics for information content and model fit?These general questions lie at the intersection of the research interests of PI Isaac, Co-I Henrys and Project Partner O'Hara. Each has made some progress towards addressing specific aspects of these questions. Working in partnership would add significant value to each, by taking existing research beyond the specific context and toward general answers to these big questions. It would permit a co-ordinated effort and build a work program of international significance. This pump-priming award would provide a platform for this partnership. The overall aim is to build a framework for inference in large-scale models of species' distribution, and to test it using computer simulations.

生态模型正变得越来越大，越来越复杂，并被用于越来越广泛的应用，从描述趋势和绘制分布图到理解机械关系和预测未来情景的影响。作为回应，大规模生态模型的统计方法有了巨大的增长。然而，大多数这样的方法没有考虑到这样一个事实，即生态数据是固有的异质性，和大型数据集通常包含多种形式的bias.Recently，一组层次贝叶斯模型（HBM）已经出现了有前途的方法来处理有偏见的数据，特别是发生记录和其他非结构化数据。数以百万计的非结构化事件记录存在，所以这些新方法的潜力是巨大的。然而，并非所有数据都包含偏见。少数生物多样性数据在取样地点、固定协议和定期取样方面结构性很强。理想情况下，我们希望在模型中保留这方面的信息，但联合收割机将其与非结构化数据集的更大样本量相结合。集成模型提供了一种实现这一点的方法。它们是HBM的一个子类，其中数据异质性被明确建模，通过将具有不同观测过程的数据集视为相同底层状态的独立实现。例如，对GBIF和鸟类繁殖调查的因果观测都包含了关于特定物种的种群是否在空间和时间的特定点上存在的信息，目前，这些综合模型是高度胜任的统计学家的专利。它们很难具体说明，也很难拟合和诊断。这种伙伴关系的一个目标是建立一个可扩展的框架，以适应综合模型，使它们能够被广泛的生态建模者社区所使用。这一框架以开源工具的形式出现，将使生态学家在解决大规模生物多样性问题时更容易处理有偏见的数据。尽管从概念上看很有吸引力，但目前还不清楚复杂的综合模型是否比简单模型带来了真实的好处。特别是，迫切需要一些关于在结构化和非结构化数据源都可用时如何进行的一般原则。关键问题包括：Q1。何时以及如何组合具有不同属性的联合收割机数据集？Q2.在什么情况下，简单的汇总（即忽略不同的观察过程）比整合更好？Q3.如果我们怀疑数据中含有偏见，我们能否发现它们并充分处理它们？Q4.什么是信息内容和模型匹配的最合适的度量标准？这些一般性的问题是PI Isaac，Co-I Henrys和项目合作伙伴O 'Hara的研究兴趣的交叉点。每一个国家都在解决这些问题的具体方面取得了一些进展。通过将现有的研究超越具体背景，并对这些重大问题做出一般性的回答，合作将为每一个人增加重要的价值。它将允许协调努力，并建立一个具有国际意义的工作计划。这个泵启动奖将为这种伙伴关系提供一个平台。总体目标是建立一个大规模物种分布模型的推理框架，并使用计算机模拟对其进行测试。