权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Statistical Design, Sampling, and Analysis for Large Scale Experiments

大规模实验的统计设计、采样和分析

基本信息

批准号：
1916467
负责人：
Lulu Kang
金额：
$ 12万
依托单位：
Illinois Institute of Technology
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-09-01 至 2023-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1916467&HistoricalAwards=false
关键词：
Statistical Design Sampling Analysis Large

项目摘要

In the big data paradigm, even the controlled experiments can become large-scale, in the sense that the sample size is massive, and the dimension of the input variables is high. Such "big data" problem challenges many statistical approaches and significantly increases the amount of computation in estimation and inference. In this project, the PI focuses on specific instances of large-scale experiments and develops a set of novel theories and methodologies on experimental design, sampling, and analysis. The research has two major parts. In Part 1, the PI focuses on the type of experiments that contains a large dimension of covariate variables. For example, in a clinical trial, the covariates can be patients' rich medical history. How should the treatment settings be assigned to each patient? The PI provides the answer through a general experimental design framework so that the treatment effects are estimated accurately despite the influence of the covariates. In Part 2, the PI focuses on the Gaussian Process (GP) regression, one of the most popular statistical learning tools. The computation required is prohibitive for analyzing large-scale experiments such as the climate model simulations. The PI develops a dimension reduction framework and an active learning method that significantly improves the efficiency and accuracy of the GP model.Three major methodologies are considered. In Parts 1-3, the PI introduces a new discrepancy-based design to achieve covariate balance for experiments with a large dimension of covariates. The discrepancy criterion also has appealing theoretical properties that lead to a more accurate estimation of the parameters including both treatment effects and covariates' effects. Optimal design algorithms are developed for both offline and online experiments. In Part 4, the PI develops a novel dimension reduction method that finds the optimal convex combination of low-dimension kernel functions for the GP model. It is shown that the proposed method is a significantly less computational and more accurate approximation of certain types of underlying functions. In Part 5, an active learning method based on the generalized Cook's Distance is developed for the GP regression. It is more efficient than the standard random sampling method. The research is novel in ideas, rigorous in theories, and useful in practice, and will open new directions in the statistical design and analysis of experiments area. The PI has a detailed education plan to develop new course modules, tutorials, and workshops based on the research products from this project. The research outcomes are readily applicable to a variety of scientific, engineering, medicine and other fields where large-scale data collection and analysis are demanded.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在大数据范式下，即使是受控实验也可以变得大规模，因为样本量很大，输入变量的维度很高。这种大数据问题挑战了许多统计方法，并显著增加了估计和推理的计算量。在这个项目中，PI专注于大型实验的具体实例，并在实验设计、采样和分析方面发展了一套新的理论和方法。这项研究包括两个主要部分。在第一部分中，PI侧重于包含大量协变量的实验类型。例如，在临床试验中，协变量可以是患者丰富的病史。应该如何将治疗设置分配给每个患者？PI通过一个通用的实验设计框架提供了答案，以便在不受协变量影响的情况下准确地估计处理效果。在第二部分中，PI重点介绍了最流行的统计学习工具之一--高斯过程(GP)回归。对于分析大规模实验，如气候模型模拟，所需的计算是令人望而却步的。PI开发了一个降维框架和一种主动学习方法，显著提高了GP模型的效率和精度。在第1-3部分中，PI介绍了一种新的基于差异的设计，以实现协变量平衡的实验与大维度的协变量。差异标准还具有吸引人的理论性质，导致对包括治疗效应和协变量效应在内的参数的更准确估计。开发了离线和在线实验的优化设计算法。在第四部分中，PI提出了一种新的降维方法，为GP模型寻找低维核函数的最优凸组合。结果表明，该方法是一种计算量较小且对某些类型的基本函数有较高精度的近似方法。第五部分提出了一种基于广义库克距离的GP回归主动学习方法。它比标准的随机抽样方法效率更高。本研究思路新颖，理论严谨，具有实用价值，将为实验领域的统计设计和分析开辟新的方向。PI有一个详细的教育计划，以此项目的研究成果为基础开发新的课程模块、教程和研讨会。研究成果适用于需要大规模数据收集和分析的各种科学、工程、医学和其他领域。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Locally Optimal Design for A/B Tests in the Presence of Covariates and Network Dependence

存在协变量和网络依赖性的情况下 A/B 测试的局部最优设计

DOI：
10.1080/00401706.2022.2046169
发表时间：
2022
期刊：
Technometrics
影响因子：
2.5
作者：
Zhang, Qiong;Kang, Lulu
通讯作者：
Kang, Lulu

Bayesian D-Optimal Design of Experiments with Quantitative and Qualitative Responses

具有定量和定性响应的贝叶斯 D 优化实验设计

DOI：
10.51387/23-nejsds30
发表时间：
2023
期刊：
The New England Journal of Statistics in Data Science
影响因子：
0
作者：
Kang, Lulu;Deng, Xinwei;Jin, Ran
通讯作者：
Jin, Ran

Gaussian Process Assisted Active Learning of Physical Laws

DOI：
10.1080/00401706.2020.1817790
发表时间：
2020-10-12
期刊：
TECHNOMETRICS
影响因子：
2.5
作者：
Chen, Jiuhai;Kang, Lulu;Lin, Guang
通讯作者：
Lin, Guang

A Maximin Φp-Efficient Design for Multivariate Generalized Linear Models

多元广义线性模型的最大最小Ψ有效设计

DOI：
10.5705/ss.202020.0278
发表时间：
2023
期刊：
Statistica Sinica
影响因子：
1.4
作者：
Li, Yiou;Kang, Lulu;Deng, Xinwei
通讯作者：
Deng, Xinwei

Covariate balancing based on kernel density estimates for controlled experiments

基于核密度估计的协变量平衡受控实验

DOI：
10.1080/24754269.2021.1878742
发表时间：
2021
期刊：
Statistical Theory and Related Fields
影响因子：
0.5
作者：
Li, Yiou;Kang, Lulu;Huang, Xiao
通讯作者：
Huang, Xiao

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Lulu Kang其他文献

A Discrepancy-Based Design for A/B Testing Experiments

基于差异的 A/B 测试实验设计

DOI：
发表时间：
2019
期刊：
影响因子：
0
作者：
Yiou Li;Xiao Huang;Lulu Kang
通讯作者：
Lulu Kang

Variable phenotypes and outcomes associated with the MMACHC c.609G>A homologous mutation: long term follow-up in a large cohort of cases

DOI：
10.1186/s13023-020-01485-7
发表时间：
2020-08-03
期刊：
Orphanet Journal of Rare Diseases
影响因子：
3.500
作者：
Ruxuan He;Ruo Mo;Ming Shen;Lulu Kang;Jinqing Song;Yi Liu;Zhehui Chen;Hongwu Zhang;Hongxin Yao;Yupeng Liu;Yao Zhang;Hui Dong;Ying Jin;Mengqiu Li;Jiong Qin;Hong Zheng;Yongxing Chen;Dongxiao Li;Haiyan Wei;Xiyuan Li;Huifeng Zhang;Min Huang;Chunyan Zhang;Yuwu Jiang;Desheng Liang;Yaping Tian;Yanling Yang
通讯作者：
Yanling Yang

Active domain adaptation with mining diverse knowledge: An updated class consensus dictionary approach

利用挖掘多样化知识的主动域适应：一种更新的类共识字典方法

DOI：
10.1016/j.ins.2024.120485
发表时间：
2024-05-01
期刊：
INFORMATION SCIENCES
影响因子：
6.800
作者：
Qing Tian;Liangyu Zhou;Yanan Zhu;Lulu Kang
通讯作者：
Lulu Kang

Hypermethioninemia due to methionine adenosyltransferase I/III deficiency and brain damage

DOI：
10.1186/s12887-024-05196-x
发表时间：
2024-11-07
期刊：
BMC Pediatrics
影响因子：
2.000
作者：
Xue Ma;Mei Lu;Zhehui Chen;Huiting Zhang;Jinqing Song;Hui Dong;Ying Jin;Mengqiu Li;Ruxuan He;Lulu Kang;Yi Liu;Yongxing Chen;Zhijun Zhu;Liying Sun;Yao Zhang;Yanling Yang
通讯作者：
Yanling Yang