Statistical Design, Sampling, and Analysis for Large Scale Experiments

大规模实验的统计设计、采样和分析

基本信息

  • 批准号:
    1916467
  • 负责人:
  • 金额:
    $ 12万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-09-01 至 2023-08-31
  • 项目状态:
    已结题

项目摘要

In the big data paradigm, even the controlled experiments can become large-scale, in the sense that the sample size is massive, and the dimension of the input variables is high. Such "big data" problem challenges many statistical approaches and significantly increases the amount of computation in estimation and inference. In this project, the PI focuses on specific instances of large-scale experiments and develops a set of novel theories and methodologies on experimental design, sampling, and analysis. The research has two major parts. In Part 1, the PI focuses on the type of experiments that contains a large dimension of covariate variables. For example, in a clinical trial, the covariates can be patients' rich medical history. How should the treatment settings be assigned to each patient? The PI provides the answer through a general experimental design framework so that the treatment effects are estimated accurately despite the influence of the covariates. In Part 2, the PI focuses on the Gaussian Process (GP) regression, one of the most popular statistical learning tools. The computation required is prohibitive for analyzing large-scale experiments such as the climate model simulations. The PI develops a dimension reduction framework and an active learning method that significantly improves the efficiency and accuracy of the GP model.Three major methodologies are considered. In Parts 1-3, the PI introduces a new discrepancy-based design to achieve covariate balance for experiments with a large dimension of covariates. The discrepancy criterion also has appealing theoretical properties that lead to a more accurate estimation of the parameters including both treatment effects and covariates' effects. Optimal design algorithms are developed for both offline and online experiments. In Part 4, the PI develops a novel dimension reduction method that finds the optimal convex combination of low-dimension kernel functions for the GP model. It is shown that the proposed method is a significantly less computational and more accurate approximation of certain types of underlying functions. In Part 5, an active learning method based on the generalized Cook's Distance is developed for the GP regression. It is more efficient than the standard random sampling method. The research is novel in ideas, rigorous in theories, and useful in practice, and will open new directions in the statistical design and analysis of experiments area. The PI has a detailed education plan to develop new course modules, tutorials, and workshops based on the research products from this project. The research outcomes are readily applicable to a variety of scientific, engineering, medicine and other fields where large-scale data collection and analysis are demanded.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在大数据范式下,即使是受控实验也可以变得大规模,因为样本量很大,输入变量的维度很高。这种大数据问题挑战了许多统计方法,并显著增加了估计和推理的计算量。在这个项目中,PI专注于大型实验的具体实例,并在实验设计、采样和分析方面发展了一套新的理论和方法。这项研究包括两个主要部分。在第一部分中,PI侧重于包含大量协变量的实验类型。例如,在临床试验中,协变量可以是患者丰富的病史。应该如何将治疗设置分配给每个患者?PI通过一个通用的实验设计框架提供了答案,以便在不受协变量影响的情况下准确地估计处理效果。在第二部分中,PI重点介绍了最流行的统计学习工具之一--高斯过程(GP)回归。对于分析大规模实验,如气候模型模拟,所需的计算是令人望而却步的。PI开发了一个降维框架和一种主动学习方法,显著提高了GP模型的效率和精度。在第1-3部分中,PI介绍了一种新的基于差异的设计,以实现协变量平衡的实验与大维度的协变量。差异标准还具有吸引人的理论性质,导致对包括治疗效应和协变量效应在内的参数的更准确估计。开发了离线和在线实验的优化设计算法。在第四部分中,PI提出了一种新的降维方法,为GP模型寻找低维核函数的最优凸组合。结果表明,该方法是一种计算量较小且对某些类型的基本函数有较高精度的近似方法。第五部分提出了一种基于广义库克距离的GP回归主动学习方法。它比标准的随机抽样方法效率更高。本研究思路新颖,理论严谨,具有实用价值,将为实验领域的统计设计和分析开辟新的方向。PI有一个详细的教育计划,以此项目的研究成果为基础开发新的课程模块、教程和研讨会。研究成果适用于需要大规模数据收集和分析的各种科学、工程、医学和其他领域。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Locally Optimal Design for A/B Tests in the Presence of Covariates and Network Dependence
存在协变量和网络依赖性的情况下 A/B 测试的局部最优设计
  • DOI:
    10.1080/00401706.2022.2046169
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Zhang, Qiong;Kang, Lulu
  • 通讯作者:
    Kang, Lulu
Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses
具有定量和定性响应的贝叶斯 D 优化实验设计
Gaussian Process Assisted Active Learning of Physical Laws
  • DOI:
    10.1080/00401706.2020.1817790
  • 发表时间:
    2020-10-12
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Chen, Jiuhai;Kang, Lulu;Lin, Guang
  • 通讯作者:
    Lin, Guang
A Maximin Φp-Efficient Design for Multivariate Generalized Linear Models
多元广义线性模型的最大最小Ψ有效设计
  • DOI:
    10.5705/ss.202020.0278
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    1.4
  • 作者:
    Li, Yiou;Kang, Lulu;Deng, Xinwei
  • 通讯作者:
    Deng, Xinwei
Covariate balancing based on kernel density estimates for controlled experiments
基于核密度估计的协变量平衡受控实验
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Lulu Kang其他文献

A Discrepancy-Based Design for A/B Testing Experiments
基于差异的 A/B 测试实验设计
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yiou Li;Xiao Huang;Lulu Kang
  • 通讯作者:
    Lulu Kang
Variable phenotypes and outcomes associated with the MMACHC c.609G>A homologous mutation: long term follow-up in a large cohort of cases
  • DOI:
    10.1186/s13023-020-01485-7
  • 发表时间:
    2020-08-03
  • 期刊:
  • 影响因子:
    3.500
  • 作者:
    Ruxuan He;Ruo Mo;Ming Shen;Lulu Kang;Jinqing Song;Yi Liu;Zhehui Chen;Hongwu Zhang;Hongxin Yao;Yupeng Liu;Yao Zhang;Hui Dong;Ying Jin;Mengqiu Li;Jiong Qin;Hong Zheng;Yongxing Chen;Dongxiao Li;Haiyan Wei;Xiyuan Li;Huifeng Zhang;Min Huang;Chunyan Zhang;Yuwu Jiang;Desheng Liang;Yaping Tian;Yanling Yang
  • 通讯作者:
    Yanling Yang
Active domain adaptation with mining diverse knowledge: An updated class consensus dictionary approach
利用挖掘多样化知识的主动域适应:一种更新的类共识字典方法
  • DOI:
    10.1016/j.ins.2024.120485
  • 发表时间:
    2024-05-01
  • 期刊:
  • 影响因子:
    6.800
  • 作者:
    Qing Tian;Liangyu Zhou;Yanan Zhu;Lulu Kang
  • 通讯作者:
    Lulu Kang
Hypermethioninemia due to methionine adenosyltransferase I/III deficiency and brain damage
  • DOI:
    10.1186/s12887-024-05196-x
  • 发表时间:
    2024-11-07
  • 期刊:
  • 影响因子:
    2.000
  • 作者:
    Xue Ma;Mei Lu;Zhehui Chen;Huiting Zhang;Jinqing Song;Hui Dong;Ying Jin;Mengqiu Li;Ruxuan He;Lulu Kang;Yi Liu;Yongxing Chen;Zhijun Zhu;Liying Sun;Yao Zhang;Yanling Yang
  • 通讯作者:
    Yanling Yang
Fair Multivariate Adaptive Regression Splines for Ensuring Equity and Transparency
公平多元自适应回归样条,确保公平和透明度
  • DOI:
    10.48550/arxiv.2402.15561
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Parian Haghighat;Denisa G'andara;Lulu Kang;Hadis Anahideh
  • 通讯作者:
    Hadis Anahideh

Lulu Kang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Lulu Kang', 18)}}的其他基金

Energetic Variational Inference: Foundations, Algorithms, and Applications
能量变分推理:基础、算法和应用
  • 批准号:
    2153029
  • 财政年份:
    2022
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
Collaborative Research: Experimental Design and Analysis of Quantitative-Qualitative Responses in Manufacturing and Biomedical Systems
协作研究:制造和生物医学系统中定量-定性响应的实验设计和分析
  • 批准号:
    1435902
  • 财政年份:
    2014
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant

相似国自然基金

Applications of AI in Market Design
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    外国青年学者研 究基金项目
基于“Design-Build-Test”循环策略的新型紫色杆菌素组合生物合成研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
在噪声和约束条件下的unitary design的理论研究
  • 批准号:
    12147123
  • 批准年份:
    2021
  • 资助金额:
    18 万元
  • 项目类别:
    专项基金项目

相似海外基金

ERI: From Data to Design: Enhancing Pedestrian Infrastructure for Well-Being through Mobile Sensing and Experience Sampling in the Wild
ERI:从数据到设计:通过移动传感和野外体验采样增强行人基础设施以促进福祉
  • 批准号:
    2347012
  • 财政年份:
    2024
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
Experimental Design-based Weighted Sampling
基于实验设计的加权抽样
  • 批准号:
    2310637
  • 财政年份:
    2023
  • 资助金额:
    $ 12万
  • 项目类别:
    Standard Grant
Sampling The Environment: Model And Design-based Sampling And Data Analysis
环境采样:基于模型和设计的采样和数据分析
  • 批准号:
    NE/Y003632/1
  • 财政年份:
    2023
  • 资助金额:
    $ 12万
  • 项目类别:
    Training Grant
SHF: Small: A New Approach for Hardware Design of High-Precision Discrete Gaussian Sampling
SHF:小:高精度离散高斯采样硬件设计的新方法
  • 批准号:
    2146881
  • 财政年份:
    2022
  • 资助金额:
    $ 12万
  • 项目类别:
    Continuing Grant
Optimizing sampling design, data integration, and methods for understanding population dynamics and predicting ecological changes
优化抽样设计、数据整合以及了解种群动态和预测生态变化的方法
  • 批准号:
    RGPIN-2020-07034
  • 财政年份:
    2022
  • 资助金额:
    $ 12万
  • 项目类别:
    Discovery Grants Program - Individual
Statistical methodology for rank based sampling design and finite mixture models
基于等级的抽样设计和有限混合模型的统计方法
  • 批准号:
    RGPIN-2020-06696
  • 财政年份:
    2022
  • 资助金额:
    $ 12万
  • 项目类别:
    Discovery Grants Program - Individual
Optimizing sampling design, data integration, and methods for understanding population dynamics and predicting ecological changes
优化抽样设计、数据整合以及了解种群动态和预测生态变化的方法
  • 批准号:
    RGPNS-2020-07034
  • 财政年份:
    2022
  • 资助金额:
    $ 12万
  • 项目类别:
    Discovery Grants Program - Northern Research Supplement
Optimizing sampling design, data integration, and methods for understanding population dynamics and predicting ecological changes
优化抽样设计、数据整合以及了解种群动态和预测生态变化的方法
  • 批准号:
    RGPNS-2020-07034
  • 财政年份:
    2021
  • 资助金额:
    $ 12万
  • 项目类别:
    Discovery Grants Program - Northern Research Supplement
Optimization of Sampling Design For Predictive Digital Soil Mapping: Reducing Uncertainty, Improving Predictions and Gaining Efficiencies in Sampling Programs
预测数字土壤测绘的采样设计优化:减少不确定性、改进预测并提高采样计划的效率
  • 批准号:
    535671-2019
  • 财政年份:
    2021
  • 资助金额:
    $ 12万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Optimizing sampling design, data integration, and methods for understanding population dynamics and predicting ecological changes
优化抽样设计、数据整合以及了解种群动态和预测生态变化的方法
  • 批准号:
    RGPIN-2020-07034
  • 财政年份:
    2021
  • 资助金额:
    $ 12万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了