Valid Inference when Analytical Models are Approximations
当分析模型为近似值时的有效推理
基本信息
- 批准号:1512084
- 负责人:
- 金额:$ 53.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-07-15 至 2019-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Statistical inferential methods are used to answer questions throughout modern life. For example, what affects the crime rate in a town; which factors are important influences on the housing market; which genes are associated to a certain disease; what are the most important elements to control in order to mitigate climate change? Statistical methods are used to address questions such as these. However, often the statistical data structure and the mathematical model developed for the analysis do not agree. This project arises from a broadly based statistical concern about the mismatch between standard inferential analyses and the statistics of the world they are trying to describe. This research draws a distinction between the statistical models that conventionally describe the correlational relations in experimental and observational data and the inferential models that are used in their analysis. To this end, the project investigates a paradigm in which sampling models are meant to be faithful representations of the real-world structure of the data they are describing. At the same time, the analytical models to be applied to the data are viewed only as approximate descriptions of that reality. The statistical-sampling representations need not match the analytical models, though the two should harmonize in certain important respects. There is a significant disparity between accurate characterization and what is claimed by classical procedures that ignore this distinction. The distinction has been noted by many previous statistical researchers, and various partially adequate approaches have been suggested. Nevertheless, clarifying this distinction in the directions under study and then pursuing the consequences leads to a theory of inference somewhat different from that in common use for relational and observational data. Acknowledging and properly accommodating this duality then leads to new methodology for some important statistical problems. One such new methodology is within the setting of randomized clinical trials in which one wishes to estimate the effect of certain treatment(s) relative to others or to placebo controls. Another is within the setting of semi-supervised learning that occurs in various big-data contexts. The core of the current research is designed for linear analytical models. These involve observations on a vector of explanatory covariates (X-variables) and a numerical dependent variable (Y). The analytical model constructs the best linear approximant of Y as a linear function of the X variables. Virtually no assumptions are made about the (X,Y) pairs in the sample, other than that they form a statistical sample drawn from some unknown joint distribution of (X,Y) pairs and possess desired low-order moments. The notion of "best" is defined in a statistically natural fashion related to minimizing squared prediction error. It follows that the ordinary least squares estimators of parameters still have desirable asymptotic properties. Inference about their (asymptotic) performance can be derived via the standard sandwich estimator. However, a newly derived iterated pairs-bootstrap is shown to give substantially more accurate inferential information for realistic sample sizes. If more information is available about the distribution of X (such as knowledge of its mean and variance) then the usual least-squares solutions can be improved. This observation leads via an indirect path to suggestions that improve the standard methodology for estimating average treatment effect in randomized clinical trials and for producing linear predictions of numerical outcomes in settings of semi-supervised learning. Various additional issues are exposed in the course of the above developments. We also plan to investigate generalizations of the above setting -- for example to models having categorical Y-variables (classification) and to other generalized-linear analytical models. Our earlier research involved post-selection inference in the classical setting in which models for the data and its analysis coincide, and we now intend to pursue analogous issues in the current context in which they do not.
在现代生活中,统计推理方法被用来回答问题。例如,是什么影响了城镇的犯罪率;哪些因素对房地产市场有重要影响;哪些基因与某种疾病有关;为了减缓气候变化,最重要的控制因素是什么?统计方法被用来解决这样的问题。然而,为分析而开发的统计数据结构和数学模型往往不一致。这个项目源于一种广泛的统计学担忧,即标准推论分析与他们试图描述的世界的统计数据之间的不匹配。这项研究区分了传统上描述实验和观测数据相关关系的统计模型和用于分析它们的推论模型。为此,该项目研究了一种范例,在该范例中,采样模型被认为是它们所描述的数据的真实世界结构的忠实表示。与此同时,将应用于数据的分析模型仅被视为对这一现实的大致描述。统计抽样表示法不必与分析模型相匹配,尽管两者在某些重要方面应该是一致的。在准确的定性和忽略这一区别的经典程序所声称的之间存在着巨大的差距。许多以前的统计研究人员已经注意到了这种区别,并提出了各种部分适当的方法。然而,澄清正在研究的方向上的这种区别,然后追求结果,导致了一种与关系数据和观察性数据的常用推论略有不同的推理理论。承认并适当地适应这种二元性,然后导致一些重要的统计问题的新方法。一种这样的新方法是在随机临床试验的背景下进行的,在随机临床试验中,人们希望评估特定治疗(S)相对于其他治疗或安慰剂对照的效果。另一个是在各种大数据环境中发生的半监督学习的背景下。目前研究的核心是为线性分析模型设计的。这涉及到对解释协变量(X变量)和数字因变量(Y)的观察。分析模型将Y的最佳线性逼近构造为X个变量的线性函数。对于样本中的(X,Y)对,除了它们形成从(X,Y)对的某种未知联合分布中提取的统计样本并且具有所需的低阶矩之外,几乎没有任何假设。“最佳”的概念是以统计上自然的方式定义的,与最小化平方预测误差有关。由此推论,参数的普通最小二乘估计仍然具有理想的渐近性质。关于它们的(渐近)性能的推论可以通过标准的三明治估计来得到。然而,新派生的迭代对-Bootstrap被证明对于真实的样本大小提供了实质上更准确的推断信息。如果有更多关于X的分布的信息(例如它的均值和方差的知识),那么通常的最小二乘解可以被改进。这一观察通过间接途径引出建议,改进在随机临床试验中估计平均治疗效果的标准方法,并在半监督学习的环境中产生数值结果的线性预测。在上述发展过程中暴露了各种额外的问题。我们还计划研究上述设置的推广--例如,到具有范畴Y变量(分类)的模型和其他广义线性分析模型。我们早先的研究涉及经典背景下的选择后推理,在这种背景下,数据的模型及其分析是一致的,而我们现在打算在当前背景下探讨类似的问题,而目前的背景不是这样的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Linda Zhao其他文献
Networks in the making: Friendship segregation and ethnic homophily.
正在形成的网络:友谊隔离和种族同质性。
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:2.5
- 作者:
Linda Zhao - 通讯作者:
Linda Zhao
Correction to: Working with Misspecified Regression Models
- DOI:
10.1007/s10940-020-09464-8 - 发表时间:
2020-06-01 - 期刊:
- 影响因子:3.300
- 作者:
Richard Berk;Lawrence Brown;Andreas Buja;Edward George;Linda Zhao - 通讯作者:
Linda Zhao
From Superdiversity to Consolidation: Implications of Structural Intersectionality for Interethnic Friendships
从超级多样性到整合:结构交叉性对种族间友谊的影响
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:4.4
- 作者:
Linda Zhao - 通讯作者:
Linda Zhao
Impact of Quarterly Interdisciplinary Medication Reviews on Resident Care in a Canadian Long Term Care Facility
- DOI:
10.1016/j.jamda.2012.12.065 - 发表时间:
2013-03-01 - 期刊:
- 影响因子:
- 作者:
Denis J.P. O'Donnell;Judith Vepy-Lebrun;Denis J.P. O'Donnell;Judith Vepy-Lebrun;Sid Feldman;Paul R. Katz;Linda Zhao - 通讯作者:
Linda Zhao
Inequality in Place: Effects of Exposure to Neighborhood-Level Economic Inequality on Mortality.
地方不平等:社区层面经济不平等对死亡率的影响。
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:3.5
- 作者:
Linda Zhao;P. Hessel;J. Simon Thomas;Jason Beckfield - 通讯作者:
Jason Beckfield
Linda Zhao的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Linda Zhao', 18)}}的其他基金
Bayesian Inference Estimation in Nonparametric Regression and its Frequentist Properties
非参数回归中的贝叶斯推理估计及其频率属性
- 批准号:
9971848 - 财政年份:1999
- 资助金额:
$ 53.2万 - 项目类别:
Standard Grant
相似海外基金
Towards remission and full recovery from obsessive-compulsive disorder: Investigating the efficacy of Inference-Based Cognitive-Behavioral Therapy when standard treatment has failed
强迫症的缓解和完全康复:研究标准治疗失败时基于推理的认知行为疗法的疗效
- 批准号:
477668 - 财政年份:2023
- 资助金额:
$ 53.2万 - 项目类别:
Operating Grants
Essential and incidental measurement error: Bayesian estimation and inference when sample measurements are random-variable-valued
基本和偶然测量误差:样本测量为随机变量值时的贝叶斯估计和推断
- 批准号:
RGPIN-2021-04357 - 财政年份:2022
- 资助金额:
$ 53.2万 - 项目类别:
Discovery Grants Program - Individual
Essential and incidental measurement error: Bayesian estimation and inference when sample measurements are random-variable-valued
基本和偶然测量误差:样本测量为随机变量值时的贝叶斯估计和推断
- 批准号:
DGECR-2021-00428 - 财政年份:2021
- 资助金额:
$ 53.2万 - 项目类别:
Discovery Launch Supplement
Novel causal inference methods to inform clinical decision on when to discontinue symptomatic treatment for patients with dementia
新的因果推断方法可为痴呆患者何时停止对症治疗提供临床决策
- 批准号:
10322425 - 财政年份:2021
- 资助金额:
$ 53.2万 - 项目类别:
Improving statistical inference when interest focuses on the identification of extreme random effects in clustered data
当兴趣集中于识别集群数据中的极端随机效应时,改进统计推断
- 批准号:
10179473 - 财政年份:2021
- 资助金额:
$ 53.2万 - 项目类别:
Improving statistical inference when interest focuses on the identification of extreme random effects in clustered data
当兴趣集中于识别集群数据中的极端随机效应时,改进统计推断
- 批准号:
10665751 - 财政年份:2021
- 资助金额:
$ 53.2万 - 项目类别:
Essential and incidental measurement error: Bayesian estimation and inference when sample measurements are random-variable-valued
基本和偶然测量误差:样本测量为随机变量值时的贝叶斯估计和推断
- 批准号:
RGPIN-2021-04357 - 财政年份:2021
- 资助金额:
$ 53.2万 - 项目类别:
Discovery Grants Program - Individual
dimensionality reduction when causal inference is the goal
当因果推理为目标时降维
- 批准号:
2097182 - 财政年份:2018
- 资助金额:
$ 53.2万 - 项目类别:
Studentship
Statistical inference for molecules: How many, when and where? (A07*)
分子的统计推断:有多少、何时何地?
- 批准号:
278012085 - 财政年份:2015
- 资助金额:
$ 53.2万 - 项目类别:
Collaborative Research Centres
Methodology of causal inference when an unmeasured confounder exists
存在不可测量的混杂因素时的因果推断方法
- 批准号:
23700344 - 财政年份:2011
- 资助金额:
$ 53.2万 - 项目类别:
Grant-in-Aid for Young Scientists (B)