A suite of new nonparametric methods for missing data and data from heterogeneous sources based on the theory of Frechet classes
基于 Frechet 类理论的一套新的用于缺失数据和异构源数据的非参数方法
基本信息
- 批准号:EP/W016117/1
- 负责人:
- 金额:$ 14.82万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2022
- 资助国家:英国
- 起止时间:2022 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
From the traditional settings of clinical trials to the technologically-driven mass collection of data in many modern application areas, the statistician's raw material is often plagued with missing data. Whether this be down to nonresponse, or the increasing heterogeneity of data sources, incompleteness is typically unavoidable in practice. The vast majority of statistical procedures are designed for use with complete information, and without it may become inapplicable, uninterpretable or unreliable. Restricting attention to complete cases, i.e. data points without missing variables, however, will often drastically reduce the utility of a data set, both by throwing away useful information in the non-complete cases, and by introducing the possibility of bias due to the complete cases not providing a representative sample of the population. When a practitioner encounters missing data, the first questions they must ask themselves concern the mechanism by which the data came to be missing, and whether the missingness will cause serious problems in the analysis of their data set and the interpretation of their results. If the absence of information on certain variables can be modelled as independent of the value of the data, then the data is said to be Missing Completely at Random (MCAR), and subsequent analysis is significantly simpler than it would otherwise be. However, the consequences of making this assumption without proper basis can be severe.We will begin with a rigorous study of the consequences of the MCAR assumption, presenting new characterisations of this property and providing novel connections to concepts studied in other fields, including copula theory and convex and computational geometry. Leveraging knowledge developed in these disciplines, we will design new tools for statisticians, bring new perspectives to the analysis of incomplete data, and open up new frontiers in the study of missingness. Specifically, we will link the property of MCAR to Fréchet classes and compatibility.With the necessary framework in place, we will introduce hypothesis tests for the assumption of MCAR. In the first instance these will be applicable to contingency tables, but they will be extended to continuous data through binning. Certain alternatives are indistinguishable from the null, but we will show that these tests have power against all fixed alternative hypotheses that are distinguishable, and give situations in which they have optimal power.Although a crucial first step, the assumption of MCAR is often too restrictive to be useful in practice. However, it may be that the missingness can be explained by certain fully-observed variables (CDM). Using additional insights from the problem of conditional independence testing we may extend our earlier work to test this more flexible assumption that is similar to, though stronger than, the usual MAR assumption.In high-dimensional settings, the use of such flexible tests is likely to result in low power and we are limited to simple tests. To circumvent this issue, our next goal will be to define and analyse new tests in a relaxed version of the problem, which only attempt to find departures from the null that manifest in incompatibility of means and covariance matrices. We will show that all such departures can be detected, even when dimension grows polynomially in the sample size.Once hypothesis tests have been carried out and reasonable assumptions developed, a practitioner will typically want to perform inference such as estimating an unknown quantity with confidence. In the framework we provide, the construction of confidence intervals for linear estimands is dual to the testing problems we consider. We combine our new technology with empirical process theory to provide minimal width confidence intervals, even in settings where consistent estimation is not possible.
从临床试验的传统设置到许多现代应用领域中技术驱动的大规模数据收集,统计学家的原始材料经常受到缺失数据的困扰。无论这是由于没有答复,还是由于数据来源的异质性增加,不完整性在实践中通常是不可避免的。绝大多数统计程序是为使用完整的信息而设计的,没有完整的信息,统计程序可能变得不适用、无法解释或不可靠。然而,将注意力限制在完整的案例上,即没有缺失变量的数据点,往往会大大降低数据集的效用,因为在不完整的案例中丢弃了有用的信息,并且由于完整的案例没有提供人口的代表性样本而引入了偏倚的可能性。当从业者遇到缺失数据时,他们必须问自己的第一个问题是数据缺失的机制,以及缺失是否会在分析数据集和解释结果时造成严重问题。如果某些变量的信息缺失可以被建模为与数据的值无关,则数据被称为完全随机缺失(MCAR),并且随后的分析比其他情况要简单得多。然而,在没有适当基础的情况下做出这种假设的后果可能是严重的。我们将开始严格研究MCAR假设的后果,提出这种性质的新特征,并提供与其他领域研究的概念的新联系,包括Copula理论和凸几何和计算几何。利用这些学科的知识,我们将为统计人员设计新的工具,为不完整数据的分析带来新的视角,并开辟失踪研究的新领域。具体地说,我们将把MCAR的性质与Fréchet类和兼容性联系起来。在必要的框架到位的情况下,我们将介绍对MCAR假设的假设检验。在第一种情况下,这些将适用于列联表,但它们将通过分组扩展到连续数据。某些替代品是无法区分的空,但我们将表明,这些测试有权力对所有固定的替代假设是可区分的,并给出的情况下,他们有最佳的权力。虽然是至关重要的第一步,假设的MCAR往往是过于限制是有用的在实践中。然而,这种缺失可能可以用某些完全观测变量(CDM)来解释。使用额外的见解,从问题的条件独立性测试,我们可以扩展我们的早期工作,以测试这个更灵活的假设,这是类似的,虽然比,通常的MAR assumption.In高维设置,使用这种灵活的测试很可能会导致低功率,我们仅限于简单的测试。为了规避这个问题,我们的下一个目标将是定义和分析新的测试在一个宽松的版本的问题,只试图找到偏离零,表现在不相容的均值和协方差矩阵。我们将证明,所有这样的偏离可以检测到,即使当维数在样本大小多项式增长。一旦假设检验已经进行,合理的假设发展,从业者通常会想要执行推理,如估计一个未知量的信心。在我们提供的框架中,线性被估量的置信区间的构造与我们考虑的检验问题是对偶的。我们联合收割机将我们的新技术与经验过程理论相结合,以提供最小宽度的置信区间,即使在不可能进行一致估计的情况下也是如此。
项目成果
期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Network change point localisation under local differential privacy
本地差分隐私下的网络变点定位
- DOI:
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Li, M.
- 通讯作者:Li, M.
On robustness and local differential privacy
- DOI:10.1214/23-aos2267
- 发表时间:2022-01
- 期刊:
- 影响因子:0
- 作者:Mengchu Li;Thomas B. Berrett;Yi Yu
- 通讯作者:Mengchu Li;Thomas B. Berrett;Yi Yu
Foundations of Modern Statistics - Festschrift in Honor of Vladimir Spokoiny, Berlin, Germany, November 6-8, 2019, Moscow, Russia, November 30, 2019
现代统计基础 - 纪念弗拉基米尔·斯波科尼 (Vladimir Spokoiny) 的庆典,德国柏林,2019 年 11 月 6-8 日,俄罗斯莫斯科,2019 年 11 月 30 日
- DOI:10.1007/978-3-031-30114-8_2
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Dubois A
- 通讯作者:Dubois A
Optimal nonparametric testing of Missing Completely At Random and its connections to compatibility
完全随机缺失的最优非参数测试及其与兼容性的联系
- DOI:10.1214/23-aos2326
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Berrett T
- 通讯作者:Berrett T
Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility
完全随机缺失的最优非参数测试及其与兼容性的联系
- DOI:10.48550/arxiv.2205.08627
- 发表时间:2022
- 期刊:
- 影响因子:0
- 作者:Berrett T
- 通讯作者:Berrett T
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Thomas Berrett其他文献
Thomas Berrett的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
脊髓新鉴定SNAPR神经元相关环路介导SCS电刺激抑制恶性瘙痒
- 批准号:82371478
- 批准年份:2023
- 资助金额:48.00 万元
- 项目类别:面上项目
tau轻子衰变与新物理模型唯象研究
- 批准号:11005033
- 批准年份:2010
- 资助金额:18.0 万元
- 项目类别:青年科学基金项目
HIV gp41的NHR区新靶点的确证及高效干预
- 批准号:81072676
- 批准年份:2010
- 资助金额:33.0 万元
- 项目类别:面上项目
强子对撞机上新物理信号的多轻子末态研究
- 批准号:10675110
- 批准年份:2006
- 资助金额:36.0 万元
- 项目类别:面上项目
相似海外基金
New Development of Nonparametric and Semiparametric Estimation Methods in Economics, Finance and Insurance
经济、金融和保险领域非参数和半参数估计方法的新进展
- 批准号:
23K01340 - 财政年份:2023
- 资助金额:
$ 14.82万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
CAREER: New Paradigms of Estimation and Inference in Constrained Nonparametric Models
职业:约束非参数模型中估计和推理的新范式
- 批准号:
2143468 - 财政年份:2022
- 资助金额:
$ 14.82万 - 项目类别:
Continuing Grant
CAREER: New Challenges in High-Dimensional and Nonparametric Statistics
职业:高维和非参数统计的新挑战
- 批准号:
2048028 - 财政年份:2021
- 资助金额:
$ 14.82万 - 项目类别:
Continuing Grant
Sources of Cholinergic Modulation of Cortical Microcircuits
皮质微电路胆碱能调节的来源
- 批准号:
9760973 - 财政年份:2019
- 资助金额:
$ 14.82万 - 项目类别:
New Methods and Theory for the Comparison of Nonparametric Trend Curves
比较非参数趋势曲线的新方法和理论
- 批准号:
430668955 - 财政年份:2019
- 资助金额:
$ 14.82万 - 项目类别:
Research Grants
Nonparametric depth-based methods for analyzing high-dimensional data. Applications to biomedical research
用于分析高维数据的基于非参数深度的方法。
- 批准号:
9807861 - 财政年份:2019
- 资助金额:
$ 14.82万 - 项目类别:
Collaborative Research: New Bayesian Nonparametric Paradigms of Personalized Medicine for Lung Cancer
合作研究:肺癌个体化医疗的新贝叶斯非参数范式
- 批准号:
1922567 - 财政年份:2018
- 资助金额:
$ 14.82万 - 项目类别:
Continuing Grant
Collaborative Research: New Bayesian Nonparametric Paradigms of Personalized Medicine for Lung Cancer
合作研究:肺癌个体化医疗的新贝叶斯非参数范式
- 批准号:
1854003 - 财政年份:2018
- 资助金额:
$ 14.82万 - 项目类别:
Continuing Grant
New Nonparametric Modeling Methods for High-Dimensional Time Series
高维时间序列的新非参数建模方法
- 批准号:
1712558 - 财政年份:2017
- 资助金额:
$ 14.82万 - 项目类别:
Continuing Grant
New Developments in Nonparametric Bayesian Inference; Univariate and Multivariate time series with infinite varaince.
非参数贝叶斯推理的新进展;
- 批准号:
203276-2013 - 财政年份:2017
- 资助金额:
$ 14.82万 - 项目类别:
Discovery Grants Program - Individual