权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Valid Inference when Analytical Models are Approximations

当分析模型为近似值时的有效推理

基本信息

批准号：
1512084
负责人：
Linda Zhao
金额：
$ 53.2万
依托单位：
University of Pennsylvania
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2015
资助国家：
美国
起止时间：
2015-07-15 至 2019-06-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1512084&HistoricalAwards=false
关键词：
Valid Inference when Analytical Models

项目摘要

Statistical inferential methods are used to answer questions throughout modern life. For example, what affects the crime rate in a town; which factors are important influences on the housing market; which genes are associated to a certain disease; what are the most important elements to control in order to mitigate climate change? Statistical methods are used to address questions such as these. However, often the statistical data structure and the mathematical model developed for the analysis do not agree. This project arises from a broadly based statistical concern about the mismatch between standard inferential analyses and the statistics of the world they are trying to describe. This research draws a distinction between the statistical models that conventionally describe the correlational relations in experimental and observational data and the inferential models that are used in their analysis. To this end, the project investigates a paradigm in which sampling models are meant to be faithful representations of the real-world structure of the data they are describing. At the same time, the analytical models to be applied to the data are viewed only as approximate descriptions of that reality. The statistical-sampling representations need not match the analytical models, though the two should harmonize in certain important respects. There is a significant disparity between accurate characterization and what is claimed by classical procedures that ignore this distinction. The distinction has been noted by many previous statistical researchers, and various partially adequate approaches have been suggested. Nevertheless, clarifying this distinction in the directions under study and then pursuing the consequences leads to a theory of inference somewhat different from that in common use for relational and observational data. Acknowledging and properly accommodating this duality then leads to new methodology for some important statistical problems. One such new methodology is within the setting of randomized clinical trials in which one wishes to estimate the effect of certain treatment(s) relative to others or to placebo controls. Another is within the setting of semi-supervised learning that occurs in various big-data contexts. The core of the current research is designed for linear analytical models. These involve observations on a vector of explanatory covariates (X-variables) and a numerical dependent variable (Y). The analytical model constructs the best linear approximant of Y as a linear function of the X variables. Virtually no assumptions are made about the (X,Y) pairs in the sample, other than that they form a statistical sample drawn from some unknown joint distribution of (X,Y) pairs and possess desired low-order moments. The notion of "best" is defined in a statistically natural fashion related to minimizing squared prediction error. It follows that the ordinary least squares estimators of parameters still have desirable asymptotic properties. Inference about their (asymptotic) performance can be derived via the standard sandwich estimator. However, a newly derived iterated pairs-bootstrap is shown to give substantially more accurate inferential information for realistic sample sizes. If more information is available about the distribution of X (such as knowledge of its mean and variance) then the usual least-squares solutions can be improved. This observation leads via an indirect path to suggestions that improve the standard methodology for estimating average treatment effect in randomized clinical trials and for producing linear predictions of numerical outcomes in settings of semi-supervised learning. Various additional issues are exposed in the course of the above developments. We also plan to investigate generalizations of the above setting -- for example to models having categorical Y-variables (classification) and to other generalized-linear analytical models. Our earlier research involved post-selection inference in the classical setting in which models for the data and its analysis coincide, and we now intend to pursue analogous issues in the current context in which they do not.

在现代生活中，统计推理方法被用来回答问题。例如，是什么影响了城镇的犯罪率；哪些因素对房地产市场有重要影响；哪些基因与某种疾病有关；为了减缓气候变化，最重要的控制因素是什么？统计方法被用来解决这样的问题。然而，为分析而开发的统计数据结构和数学模型往往不一致。这个项目源于一种广泛的统计学担忧，即标准推论分析与他们试图描述的世界的统计数据之间的不匹配。这项研究区分了传统上描述实验和观测数据相关关系的统计模型和用于分析它们的推论模型。为此，该项目研究了一种范例，在该范例中，采样模型被认为是它们所描述的数据的真实世界结构的忠实表示。与此同时，将应用于数据的分析模型仅被视为对这一现实的大致描述。统计抽样表示法不必与分析模型相匹配，尽管两者在某些重要方面应该是一致的。在准确的定性和忽略这一区别的经典程序所声称的之间存在着巨大的差距。许多以前的统计研究人员已经注意到了这种区别，并提出了各种部分适当的方法。然而，澄清正在研究的方向上的这种区别，然后追求结果，导致了一种与关系数据和观察性数据的常用推论略有不同的推理理论。承认并适当地适应这种二元性，然后导致一些重要的统计问题的新方法。一种这样的新方法是在随机临床试验的背景下进行的，在随机临床试验中，人们希望评估特定治疗(S)相对于其他治疗或安慰剂对照的效果。另一个是在各种大数据环境中发生的半监督学习的背景下。目前研究的核心是为线性分析模型设计的。这涉及到对解释协变量(X变量)和数字因变量(Y)的观察。分析模型将Y的最佳线性逼近构造为X个变量的线性函数。对于样本中的(X，Y)对，除了它们形成从(X，Y)对的某种未知联合分布中提取的统计样本并且具有所需的低阶矩之外，几乎没有任何假设。“最佳”的概念是以统计上自然的方式定义的，与最小化平方预测误差有关。由此推论，参数的普通最小二乘估计仍然具有理想的渐近性质。关于它们的(渐近)性能的推论可以通过标准的三明治估计来得到。然而，新派生的迭代对-Bootstrap被证明对于真实的样本大小提供了实质上更准确的推断信息。如果有更多关于X的分布的信息(例如它的均值和方差的知识)，那么通常的最小二乘解可以被改进。这一观察通过间接途径引出建议，改进在随机临床试验中估计平均治疗效果的标准方法，并在半监督学习的环境中产生数值结果的线性预测。在上述发展过程中暴露了各种额外的问题。我们还计划研究上述设置的推广--例如，到具有范畴Y变量(分类)的模型和其他广义线性分析模型。我们早先的研究涉及经典背景下的选择后推理，在这种背景下，数据的模型及其分析是一致的，而我们现在打算在当前背景下探讨类似的问题，而目前的背景不是这样的。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Linda Zhao其他文献

Networks in the making: Friendship segregation and ethnic homophily.

正在形成的网络：友谊隔离和种族同质性。

DOI：
发表时间：
2023
期刊：
Social Science Research
影响因子：
2.5
作者：
Linda Zhao
通讯作者：
Linda Zhao

Correction to: Working with Misspecified Regression Models

DOI：
10.1007/s10940-020-09464-8
发表时间：
2020-06-01
期刊：
JOURNAL OF QUANTITATIVE CRIMINOLOGY
影响因子：
3.300
作者：
Richard Berk;Lawrence Brown;Andreas Buja;Edward George;Linda Zhao
通讯作者：
Linda Zhao

From Superdiversity to Consolidation: Implications of Structural Intersectionality for Interethnic Friendships

从超级多样性到整合：结构交叉性对种族间友谊的影响

DOI：
发表时间：
2023
期刊：
American Journal of Sociology
影响因子：
4.4
作者：
Linda Zhao
通讯作者：
Linda Zhao

Impact of Quarterly Interdisciplinary Medication Reviews on Resident Care in a Canadian Long Term Care Facility

DOI：
10.1016/j.jamda.2012.12.065
发表时间：
2013-03-01
期刊：
Conference abstract
影响因子：
作者：
Denis J.P. O'Donnell;Judith Vepy-Lebrun;Denis J.P. O'Donnell;Judith Vepy-Lebrun;Sid Feldman;Paul R. Katz;Linda Zhao
通讯作者：
Linda Zhao