权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Hierarchical models for Large Geostatistical Datasets with Applications to Forestry and Ecology

大型地统计数据集的分层模型及其在林业和生态学中的应用

基本信息

批准号：
0706870
负责人：
Sudipto Banerjee
金额：
$ 25.35万
依托单位：
University of Minnesota-Twin Cities
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2007
资助国家：
美国
起止时间：
2007-06-15 至 2010-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0706870&HistoricalAwards=false
关键词：
Hierarchical models Large Geostatistical Datasets

项目摘要

This proposal lays down a comprehensive framework for carrying out statistical inference on point-referenced spatial data that are available from a large number of locations. Statistical theory is used to develop mathematically formal but computationally feasible methods that can have a broad range of applications. Hierarchical models implemented through Markov chain Monte Carlo (MCMC) methods have become especially popular for spatial modelling, given their flexibility and power to fit models that would be infeasible with classical methods as well their avoidance of possibly inappropriate asymptotics. However, fitting hierarchical spatial models often involves expensive matrix decompositions whose computational complexity increases in cubic order with the number of spatial locations, rendering such models infeasible for large spatial data sets. This computational burden is aggravated in multivariate settings with several spatially dependent response variables and also when data is collected at frequent time points and spatiotemporal process models are used. The investigators propose a class of models based upon a stochastic process that results from projecting the original process onto a lower-dimensional subspace. The investigators term these models as predictive process models and propose to explore their theoretical properties. The long-term goal of the PI is to develop a full suite of statistical methods that estimate spatial models in a wide variety of experiments in forestry and ecology. A recurrent underlying theme of the proposed methods that distinguishes it from existing methods is that the modeler does not need to sacrifice richness in modeling as a compromise for the large datasets. This resolves the statistical irony that large datasets are precisely where statistical estimates of rich association structures are permissible. The emphasis is on models that can be executed even with moderately powerful computing tools and so would be accessible to a large number of researchers.With the increasing popularity and availability of spatial referencing technologies such as Geographical Information Systems (GIS) and Global Positioning Systems (GPS) that can identify geographical coordinates with a simple hand-held device, scientists and researchers in a variety of disciplines today have access to large amounts of geocoded data. The broader impact of the proposed methods is best assessed by connecting the outcome of this research with the widely recognized impact of GIS on human society. From identifying spatial disparities in health standards to more precise weather predictions, GIS technology is used today in almost every sphere of society. By redeeming the investigators from using ad-hoc and qualitative methods that often bring out spurious stories, the proposed methods can have far reaching beneficial effects in environmental research that potentially touch unexpected corners of society. Consider a situation where an ecologist is unable to recognize critical symbiotic relationships between multiples species, due to inadequate models. Mathematical formalism, for all its complexities, minimizes such errors arising from qualitative techniques currently prevalent in forestry and ecological analysis. Such and several other scientific problems require formal spatial analysis, harnessing the full power of the information that large datasets carry. They include, but are not limited to, public and environmental health, meteorology, engineering, geosciences and so on, where the fundamental goal is the same: use new findings that will help improve human society.

该建议为对来自大量地点的点参考空间数据进行统计推断奠定了一个全面的框架。统计理论用于发展数学上形式化但计算上可行的方法，这些方法可以有广泛的应用。通过马尔可夫链蒙特卡罗（MCMC）方法实现的分层模型在空间建模中变得特别流行，因为它们具有灵活性和能力，可以拟合经典方法无法实现的模型，并且可以避免可能不适当的渐近。然而，拟合层次空间模型往往涉及昂贵的矩阵分解，其计算复杂度随着空间位置的数量呈三次增长，使得这种模型不适合大型空间数据集。在具有多个空间相关响应变量的多变量设置中，以及在频繁时间点收集数据和使用时空过程模型时，这种计算负担会加重。研究人员提出了一类基于随机过程的模型，该随机过程是将原始过程投影到低维子空间中产生的。研究者将这些模型称为预测过程模型，并提出探索其理论性质。PI的长期目标是开发一套完整的统计方法，用于在林业和生态学的各种实验中估计空间模型。所提出的方法与现有方法的区别在于，建模者不需要牺牲建模的丰富性作为对大型数据集的妥协。这解决了统计上的讽刺，即大型数据集正是允许对丰富关联结构进行统计估计的地方。重点是可以使用中等功能的计算工具执行的模型，因此可以为大量研究人员所访问。随着地理信息系统（GIS）和全球定位系统（GPS）等空间参考技术的日益普及和可用性，这些技术可以用简单的手持设备识别地理坐标，今天各种学科的科学家和研究人员可以访问大量的地理编码数据。通过将本研究的结果与GIS对人类社会的广泛认可的影响联系起来，可以最好地评估所提出方法的更广泛影响。从确定卫生标准的空间差异到更精确的天气预报，地理信息系统技术如今几乎用于社会的每个领域。通过将研究人员从经常带来虚假故事的特别和定性方法中解救出来，所提出的方法可以在环境研究中产生深远的有益影响，可能触及社会意想不到的角落。考虑这样一种情况：由于模型不充分，生态学家无法识别多个物种之间的关键共生关系。数学形式主义虽然很复杂，但它尽量减少了目前在林业和生态分析中流行的定性技术所产生的这种错误。这样和其他一些科学问题需要正式的空间分析，利用大数据集所携带的信息的全部力量。它们包括但不限于公共和环境卫生、气象学、工程学、地球科学等等，这些领域的基本目标是一致的：利用有助于改善人类社会的新发现。