Towards Efficient Bias Correction in Data Snooping
实现数据窥探中的有效偏差校正
基本信息
- 批准号:1914496
- 负责人:
- 金额:$ 25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The choice of a statistical model is often a critical part of data analysis because a useful model helps researchers extract relevant information from noisy data to reach interpretable findings. While scientific or economic theories do help formulate models in some applications, most data analysts have to rely on empirical models. Using the same data to select a model and then to perform model-based statistical inference is commonly known as data snooping. Unfortunately, data snooping is intrinsically risky without a careful analysis of the potential bias resulting from such practices. The primary goal of this project is to study how to understand and correct bias from data snooping and develop sound statistical inference methods. The research will provide valuable tools for scientists, researchers, and policy makers who rely on data-driven models for uncertainty assessment and confirmatory data analysis.This project focuses on regression-adjusted inference on treatment effects and inference on the best selected subgroup. The proposed work is motivated by the pressing need for more fundamental research related to the handling of "post-selection bias" in statistical analysis. The repeated data-splitting method for de-biased inference on a structural parameter (for example, the average treatment effect) enables efficient bias removal in addressing an intrinsic scientific question. The proposed inference on the best selected subgroup provides a bias-correction to a natural estimate of the subgroup effect size, and therefore reduces the risk of data-snooping and false discoveries in subgroup analysis. In the big data era, data-driven models and subgroup analyses are often used to take advantage of anticipated sparsity in the data structure or to explore data heterogeneity. The proposed research aims to provide insights, theory, and tools for more informed decision making in such endeavors. The project will involve collaborations with researchers investigating the risk of concussion as well as scientists in the biotechnology industry who routinely rely on subgroup analysis. Graduate and undergraduate students will be engaged in the proposed research.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
统计模型的选择通常是数据分析的关键部分,因为一个有用的模型可以帮助研究人员从嘈杂的数据中提取相关信息,以获得可解释的发现。虽然科学或经济理论确实有助于在某些应用中制定模型,但大多数数据分析师必须依赖经验模型。使用相同的数据来选择模型,然后执行基于模型的统计推断,这通常被称为数据窥探。不幸的是,如果不仔细分析这种做法造成的潜在偏见,数据窥探本质上是有风险的。本项目的主要目标是研究如何从数据窥探中理解和纠正偏见,并开发合理的统计推断方法。该研究将为依赖数据驱动模型进行不确定性评估和验证性数据分析的科学家、研究人员和政策制定者提供有价值的工具。本项目侧重于对治疗效果的回归校正推断和对最佳选择亚组的推断。提出这项工作的动机是迫切需要更多与处理统计分析中的“后选择偏差”相关的基础研究。对结构参数(例如,平均处理效果)进行去偏推理的重复数据分割方法可以在解决内在科学问题时有效地去除偏倚。所提出的关于最佳选择子组的推断为子组效应大小的自然估计提供了偏差校正,因此降低了子组分析中数据窥探和错误发现的风险。在大数据时代,数据驱动模型和子组分析通常用于利用数据结构中预期的稀疏性或探索数据异质性。提出的研究旨在为这些努力中更明智的决策提供见解,理论和工具。该项目将包括与调查脑震荡风险的研究人员以及生物技术行业的科学家合作,他们通常依赖于亚组分析。研究生和本科生将参与拟议的研究。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Hypothesis Testing for Block-structured Correlation for High Dimensional Variables
- DOI:10.5705/ss.202019.0319
- 发表时间:2022
- 期刊:
- 影响因子:1.4
- 作者:Shurong Zheng;Xuming He;Jianhua Guo
- 通讯作者:Shurong Zheng;Xuming He;Jianhua Guo
From regression rank scores to robust inference for censored quantile regression
- DOI:10.1002/cjs.11740
- 发表时间:2022-11
- 期刊:
- 影响因子:0
- 作者:Yuan Sun;Xuming He
- 通讯作者:Yuan Sun;Xuming He
Inference on Selected Subgroups in Clinical Trials
- DOI:10.1080/01621459.2020.1740096
- 发表时间:2020-04-17
- 期刊:
- 影响因子:3.7
- 作者:Guo, Xinzhou;He, Xuming
- 通讯作者:He, Xuming
Model-based bootstrap for detection of regional quantile treatment effects
基于模型的引导程序用于检测区域分位数治疗效果
- DOI:10.1080/10485252.2021.1934465
- 发表时间:2021
- 期刊:
- 影响因子:1.2
- 作者:Sun, Yuan;He, Xuming
- 通讯作者:He, Xuming
Comments on "Two Cultures": What have changed over 20 years?
评《两种文化》:20年来发生了什么变化?
- DOI:10.1353/obs.2021.0026
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:He, Xuming;Wang, Jingshen
- 通讯作者:Wang, Jingshen
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xuming He其他文献
Semi-Supervised Domain-Adaptive Pulmonary Artery Segmentation via Uncertainty Guidance and Shape Strengthening
通过不确定性指导和形状强化进行半监督域自适应肺动脉分割
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Jiyuan Liu;Xiao Zhang;Dongdong Gu;O. Xi;Jiadong Zhang;Xuming He;Dinggang Shen;Zhong Xue - 通讯作者:
Zhong Xue
Optical ReLU-like activation function based on a semiconductor laser with optical injection.
基于光注入半导体激光器的类光学 ReLU 激活函数。
- DOI:
10.1364/ol.511113 - 发表时间:
2023 - 期刊:
- 影响因子:3.6
- 作者:
Guanting Liu;Yiwei Shen;Ruiqian Li;Jingyi Yu;Xuming He;Chengyuan Wang - 通讯作者:
Chengyuan Wang
On marginal estimation in a semiparametric model for longitudinal data with time-independent covariates
具有时间无关协变量的纵向数据半参数模型中的边际估计
- DOI:
- 发表时间:
2002 - 期刊:
- 影响因子:0
- 作者:
Xuming He;Mi - 通讯作者:
Mi
PENALIZED LIKELIHOOD FOR LOGISTIC-NORMAL MIXTURE MODELS WITH UNEQUAL VARIANCES
- DOI:
doi: https://doi.org/10.5705/ss.202015.0371 - 发表时间:
2017 - 期刊:
- 影响因子:
- 作者:
Juan Shen;Yingchuan Wang;Xuming He - 通讯作者:
Xuming He
LAW OF THE ITERATED LOGARITHM AND INVARIANCE PRINCIPLE FOR M-ESTIMATORS
M-估计量的迭代对数定律和不变性原理
- DOI:
10.1090/s0002-9939-1995-1231036-7 - 发表时间:
1995 - 期刊:
- 影响因子:0
- 作者:
Xuming He;G. Wang - 通讯作者:
G. Wang
Xuming He的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xuming He', 18)}}的其他基金
Conference: Workshop on Translational Research on Data Heterogeneity
会议:数据异构性转化研究研讨会
- 批准号:
2406154 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Covariate-adjusted Expected Shortfall under Data Heterogeneity
数据异质性下的协变量调整预期缺口
- 批准号:
2310464 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Covariate-adjusted Expected Shortfall under Data Heterogeneity
数据异质性下的协变量调整预期缺口
- 批准号:
2345035 - 财政年份:2023
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Statistics at a Crossroads: Challenges and Opportunities in the Data Science Era
十字路口的统计学:数据科学时代的挑战与机遇
- 批准号:
1840278 - 财政年份:2018
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
New algorithms for consistent model selection beyond linear models
用于超越线性模型的一致模型选择的新算法
- 批准号:
1607840 - 财政年份:2016
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
New Directions in Quantile-based Modeling and Analysis
基于分位数的建模和分析的新方向
- 批准号:
1307566 - 财政年份:2013
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Efficient Modeling in Quantile Regression
分位数回归的高效建模
- 批准号:
1237234 - 财政年份:2011
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
Efficient Modeling in Quantile Regression
分位数回归的高效建模
- 批准号:
1007396 - 财政年份:2010
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
A Virtual Center to Promote Collaboration between US- and China-based Researchers in Statistical Science
促进中美统计科学研究人员合作的虚拟中心
- 批准号:
0630950 - 财政年份:2006
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
Inferential Methods for Quantile Regression
分位数回归的推理方法
- 批准号:
0604229 - 财政年份:2006
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
相似海外基金
Recyclable, smart and highly efficient wire-shaped solar cells waved portable/wearable electronics
可回收、智能、高效的线形太阳能电池挥舞着便携式/可穿戴电子产品
- 批准号:
24K15389 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Efficient and unbiased estimation in adaptive platform trials
自适应平台试验中的高效且公正的估计
- 批准号:
MR/X030261/1 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Research Grant
Electro-fermentation process design for efficient CO2 conversion into value-added products
电发酵工艺设计可有效地将二氧化碳转化为增值产品
- 批准号:
EP/Y002482/1 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Research Grant
RII Track-4:NSF: HEAL: Heterogeneity-aware Efficient and Adaptive Learning at Clusters and Edges
RII Track-4:NSF:HEAL:集群和边缘的异质性感知高效自适应学习
- 批准号:
2327452 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
CAREER: Resilient and Efficient Automatic Control in Energy Infrastructure: An Expert-Guided Policy Optimization Framework
职业:能源基础设施中的弹性和高效自动控制:专家指导的政策优化框架
- 批准号:
2338559 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
CAREER: Towards highly efficient UV emitters with lattice engineered substrates
事业:采用晶格工程基板实现高效紫外线发射器
- 批准号:
2338683 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Continuing Grant
Collaborative Research: Beyond the Single-Atom Paradigm: A Priori Design of Dual-Atom Alloy Active Sites for Efficient and Selective Chemical Conversions
合作研究:超越单原子范式:双原子合金活性位点的先验设计,用于高效和选择性化学转化
- 批准号:
2334970 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant
ASCENT: Heterogeneously Integrated and AI-Empowered Millimeter-Wave Wide-Bandgap Transmitter Array towards Energy- and Spectrum-Efficient Next-G Communications
ASCENT:异构集成和人工智能支持的毫米波宽带隙发射机阵列,实现节能和频谱高效的下一代通信
- 批准号:
2328281 - 财政年份:2024
- 资助金额:
$ 25万 - 项目类别:
Standard Grant