Statistical Theory and Methods for D&R Analysis of Large Complex Data
D 统计理论与方法
基本信息
- 批准号:1228348
- 负责人:
- 金额:$ 31.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-09-01 至 2017-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In Divide and Recombine (D&R) the data are divided into subsets by the data analyst. These are the S computations because they create the subsets. Statistical and visualization methods are applied to each subset without communication among the computations. These are the W computations because they are within subsets. Then the W computation outputs are recombined across subsets. These are the B computations because they are between subsets. One goal of D&R is deep analysis, an ability to study the data in detail despite the size and complexity. A second goal is an ability to carry out analysis wholly from within an interactive language for data analysis (ILDA) such as R. D&R achieves the goals by introducing a simple parallelization, not of the analysis methods themselves which is very complex, but of the data. This results in ``embarrassingly parallel'' computations that can be efficiently carried out by a distributed computational environment like Hadoop. Also, Hadoop can be merged with an ILDA. The investigators will research two areas of statistical theory and methods for D&R. The first is development of D&R statistical division and recombination procedures. This is very broad because there are many analysis methods, and the procedures need to change with the methods and the data structures they address. The second topic is a foundational mathematical theory. In the current fundamental paradigm for statistics, an analysis method is applied directly to all of the data in one big computation. The S, W, and B computations use all of the data too, but the results are in general not the same as those for direct computation and have different statistical properties. This introduces a new fundamental paradigm for statistical accuracy and optimality.In Divide and Recombine (D&R), large complex data are divided into subsets. Statistical and visualization methods are applied to each of the subsets separately. Then the results of each method are recombined across subsets. This new analysis framework for large complex data can readily exploit current distributed computational environments because it leads to very simple parallel computation. The investigators will develop statistical procedures for division and recombination that result in good statistical accuracy for the analysis methods. Accuracy tends to be less than that from direct computation on all of the data in one big computation, which is impractically long or simply infeasible. D&R trades some accuracy for computational feasibility. The result is that almost any statistical or visualization method can be successfully applied to large complex data. This enables a deep, detailed analysis that does not risk losing important information in the data, which is feasible today only with small data.
在划分和重新组合(DR)中,数据分析师将数据划分为子集。这些是S计算,因为它们创建了子集。统计和可视化方法应用于每个子集之间的计算没有通信。这些是W计算,因为它们在子集内。然后,跨子集重新组合W个计算输出。这些是B计算,因为它们在子集之间。D R的目标之一是深入分析,即尽管数据大小和复杂性,但仍能详细研究数据。第二个目标是能够完全从交互式数据分析语言(ILDA)(如R)中进行分析。D R通过引入简单的并行化来实现目标,不是非常复杂的分析方法本身,而是数据。这导致了“非常并行”的计算,可以通过像Hadoop这样的分布式计算环境有效地执行。Hadoop也可以与ILDA合并。 研究人员将研究两个领域的统计理论和方法的D R。第一个是发展的D R统计划分和重组程序。这是非常广泛的,因为有许多分析方法,并且过程需要随着它们所处理的方法和数据结构而变化。第二个主题是基础数学理论。在当前的统计学基本范式中,分析方法直接应用于一次大型计算中的所有数据。 S、W和B计算也使用所有数据,但结果通常与直接计算的结果不同,并且具有不同的统计特性。这为统计精度和最优性引入了一种新的基本范式。在划分和重组(DR)中,将大型复杂数据划分为子集。统计和可视化方法分别应用于每个子集。然后,每种方法的结果在子集之间重新组合。这种新的大型复杂数据的分析框架可以很容易地利用当前的分布式计算环境,因为它导致非常简单的并行计算。 研究者将开发用于划分和重组的统计程序,以使分析方法具有良好的统计准确性。准确性往往低于直接计算的所有数据在一个大的计算,这是不切实际的长或根本不可行的。D R用一些精确性换取了计算的可行性。结果是几乎任何统计或可视化方法都可以成功地应用于大型复杂数据。这使得能够进行深入、详细的分析,而不会有丢失数据中重要信息的风险,这在今天只适用于小数据。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
William Cleveland其他文献
251 Low Social Support Is Associated with Increased All-Cause Mortality in African Americans with Hypertensive CKD
- DOI:
10.1053/j.ajkd.2011.02.254 - 发表时间:
2011-04-01 - 期刊:
- 影响因子:
- 作者:
Anna Porter;Amishi Patel;Anca Zegrean;Gloria No;Deborah Brooks;Marino Bruce;Jeanne Charleston;William Cleveland;Donna Dowie;Marquetta Faulkner;Jennifer Gassman;Tom Greene;Leena Hiremath;Cindy Kendrick;John W. Kusek;Denyse Thornley-Brown;Xuelei Wang;Keith Norris;Michael Fischer;James Lash - 通讯作者:
James Lash
237: Factors Associated With Quality of Life in African Americans With CKD
- DOI:
10.1053/j.ajkd.2010.02.244 - 发表时间:
2010-04-01 - 期刊:
- 影响因子:
- 作者:
Anna Porter;Michael Fischer;Deborah Brooks;Marino Bruce;Jeanne Charleston;William Cleveland;Tonya Corbin;Donna Dowie;Marquetta Faulkner;Jennifer Gassman;Tom Greene;Leena Hiremath;Cindy Kendrick;John Kusek;Denyse Thornley-Brown;Xulei Wang;Keith Norris;Mark Unruh;James Lash; AASK Study Group - 通讯作者:
AASK Study Group
William Cleveland的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('William Cleveland', 18)}}的其他基金
Scalable Visualization and Model Building
可扩展的可视化和模型构建
- 批准号:
0937123 - 财政年份:2009
- 资助金额:
$ 31.5万 - 项目类别:
Standard Grant
Data Mining, Statistical Learning, and Data Visualization for Complex Data
复杂数据的数据挖掘、统计学习和数据可视化
- 批准号:
0532217 - 财政年份:2005
- 资助金额:
$ 31.5万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于isomorph theory研究尘埃等离子体物理量的微观动力学机制
- 批准号:12247163
- 批准年份:2022
- 资助金额:18.00 万元
- 项目类别:专项项目
Toward a general theory of intermittent aeolian and fluvial nonsuspended sediment transport
- 批准号:
- 批准年份:2022
- 资助金额:55 万元
- 项目类别:
英文专著《FRACTIONAL INTEGRALS AND DERIVATIVES: Theory and Applications》的翻译
- 批准号:12126512
- 批准年份:2021
- 资助金额:12.0 万元
- 项目类别:数学天元基金项目
基于Restriction-Centered Theory的自然语言模糊语义理论研究及应用
- 批准号:61671064
- 批准年份:2016
- 资助金额:65.0 万元
- 项目类别:面上项目
相似海外基金
CAREER: Statistical Inference in Observational Studies -- Theory, Methods, and Beyond
职业:观察研究中的统计推断——理论、方法及其他
- 批准号:
2338760 - 财政年份:2024
- 资助金额:
$ 31.5万 - 项目类别:
Continuing Grant
Collaborative Research: DMS/NIGMS 2: New statistical methods, theory, and software for microbiome data
合作研究:DMS/NIGMS 2:微生物组数据的新统计方法、理论和软件
- 批准号:
10797410 - 财政年份:2023
- 资助金额:
$ 31.5万 - 项目类别:
Statistical Methods and Theory for Predictive Biomarker Study in Clinical Trials via Modeling and Analysis of Covariate Interactions
通过协变量相互作用建模和分析进行临床试验中预测生物标志物研究的统计方法和理论
- 批准号:
RGPIN-2018-04462 - 财政年份:2022
- 资助金额:
$ 31.5万 - 项目类别:
Discovery Grants Program - Individual
Nonparametric statistical methods based on graph theory
基于图论的非参数统计方法
- 批准号:
RGPIN-2022-03264 - 财政年份:2022
- 资助金额:
$ 31.5万 - 项目类别:
Discovery Grants Program - Individual
Approximations of computationally intensive statistical learning algorithms: theory and methods
计算密集型统计学习算法的近似:理论和方法
- 批准号:
RGPIN-2019-06487 - 财政年份:2022
- 资助金额:
$ 31.5万 - 项目类别:
Discovery Grants Program - Individual
Statistical theory and methods for high-dimensional data
高维数据统计理论与方法
- 批准号:
RGPIN-2016-03890 - 财政年份:2022
- 资助金额:
$ 31.5万 - 项目类别:
Discovery Grants Program - Individual
Statistical Methods and Theory for Clinical Trials in the Era of Patient-Oriented Research and Personalized Medicine
以患者为中心的研究和个性化医疗时代临床试验的统计方法和理论
- 批准号:
RGPIN-2022-03788 - 财政年份:2022
- 资助金额:
$ 31.5万 - 项目类别:
Discovery Grants Program - Individual
CAREER: Fast and Accurate Statistical Learning and Inference from Large-Scale Data: Theory, Methods, and Algorithms
职业:从大规模数据中快速准确地进行统计学习和推理:理论、方法和算法
- 批准号:
2046874 - 财政年份:2021
- 资助金额:
$ 31.5万 - 项目类别:
Continuing Grant
Statistical Methods and Theory for Predictive Biomarker Study in Clinical Trials via Modeling and Analysis of Covariate Interactions
通过协变量相互作用建模和分析进行临床试验中预测生物标志物研究的统计方法和理论
- 批准号:
RGPIN-2018-04462 - 财政年份:2021
- 资助金额:
$ 31.5万 - 项目类别:
Discovery Grants Program - Individual
FRG: Collaborative Research: Dynamic Tensors: Statistical Methods, Theory, and Applications
FRG:协作研究:动态张量:统计方法、理论和应用
- 批准号:
2052949 - 财政年份:2021
- 资助金额:
$ 31.5万 - 项目类别:
Standard Grant