Robust Estimation and Model Ensemble Selection
鲁棒估计和模型集成选择
基本信息
- 批准号:RGPIN-2019-04201
- 负责人:
- 金额:$ 1.82万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Consider a data table with n rows - the cases - and d columns - the variables. The classical robustness model assumes that the majority of the cases are contamination free. Hence, a minority of contaminated cases may need to be identified and filtered. Unfortunately this paradigm is not realistic and collapses in high dimensions. If there is a small and independent probability p that a cell (individual entry in the data table) is contaminated then the probability that a case (a row in the data table) is contaminated is 1-(1-p)^d which can quickly go over 0.5. For example, if p=0.01 and d=100 the probability is 0.63397. The PhD thesis of my former student Fatemah Alqallaf, brings attention to this problem called propagation of outliers. Alqallaf, Van Aelst, Yohai and Zamar (2009) continue this research and shows that traditional high breakdown point estimates are no longer resistant against independent contamination. Hence, the PhD thesis of my former student Andy Leung constructed multivariate location and scatter estimates which are resistant to casewise and cellwise outliers. Extensions of this approach to other multivariate models including cluster analysis will be considered by my new graduate students Glenn McGuinness and Malvika Mitra. Cheap collection/storage of data produced variable-rich-case-poor data. It is common to find that a large number of the variables in these datasets are noise variables which hurt rather than help inference tasks. Moreover, the remaining useful variables may be partially redundant and subsets of these variables may predict better than the full set. Then it may be better to ensemble several models based on diverse subsets of variables. This lead to a very broad research endeavor - ensemble selection - a generalization of model selection. The PhD thesis of my former student Jabed Tomal proposes an ad-hoc procedure called phalanxes. My PhD student Anthony Christidis will consider a more structured selection of optimal ensembles by optimizing a new loss function that penalizes lack of sparseness and lack of diversity of the models selected for the ensemble. With the help of several grad students I wish to study extensions to linear and nonlinear regression and classification. I am also interested in cluster analysis. A popular clustering method is K-means. However, this method is not resistant against outliers. My PhD student Juan D. Gonzalez (University of Buenos Aires, Argentina) is developing a robust alternative by using the Tau-scale (Yohai and Zamar, 1989) instead of the average of the distances from the points to their cluster centers. Application of this procedure to the processing of digitized images is also part of his research project.
考虑一个有n行(大小写)和d列(变量)的数据表。经典的鲁棒性模型假设大多数情况下是无污染的。因此,可能需要识别和过滤少数受污染病例。不幸的是,这种范式是不现实的,并在高维中崩溃。如果一个单元格(数据表中的单个条目)被污染的概率很小且独立,那么一个case(数据表中的一行)被污染的概率是1-(1-p)^d,可以很快超过0.5。例如,如果p=0.01, d=100,则概率为0.63397。我以前的学生Fatemah Alqallaf的博士论文,引起了人们对异常值传播问题的关注。Alqallaf、Van Aelst、Yohai和Zamar(2009)继续了这项研究,并表明传统的高击穿点估计不再抵抗独立污染。因此,我以前的学生Andy Leung的博士论文构建了多元位置和散点估计,这些估计可以抵抗个案和单元异常值。我的新研究生Glenn McGuinness和Malvika Mitra将考虑将这种方法扩展到其他多变量模型,包括聚类分析。廉价的数据收集/存储产生了变量丰富的数据。通常会发现,这些数据集中的大量变量是噪声变量,不利于而不是有助于推理任务。此外,剩余的有用变量可能是部分冗余的,这些变量的子集可能比完整的集合预测得更好。那么基于不同的变量子集集成几个模型可能会更好。这导致了一个非常广泛的研究努力-集成选择-模型选择的概括。我以前的学生贾贝德·托马尔的博士论文提出了一种特殊的程序,叫做方阵。我的博士生Anthony Christidis将通过优化一个新的损失函数来考虑更结构化的最优集成选择,该损失函数会惩罚为集成选择的模型缺乏稀疏性和缺乏多样性。在几位研究生的帮助下,我希望学习线性和非线性回归与分类的扩展。我对聚类分析也很感兴趣。一种流行的聚类方法是K-means。然而,这种方法不能抵抗异常值。我的博士生Juan D. Gonzalez(阿根廷布宜诺斯艾利斯大学)正在开发一种强大的替代方法,通过使用tau尺度(Yohai和Zamar, 1989)来代替从点到集群中心的平均距离。将此程序应用于数字化图像的处理也是其研究项目的一部分。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zamar, Ruben其他文献
Robust estimation of error scale in nonparametric regression models
- DOI:
10.1016/j.jspi.2008.01.005 - 发表时间:
2008-10-01 - 期刊:
- 影响因子:0.9
- 作者:
Ghement, Isabella Rodica;Ruiz, Marcelo;Zamar, Ruben - 通讯作者:
Zamar, Ruben
RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm
- DOI:
10.18637/jss.v072.i05 - 发表时间:
2016-08-01 - 期刊:
- 影响因子:5.8
- 作者:
Kondo, Yumi;Salibian-Barrera, Matias;Zamar, Ruben - 通讯作者:
Zamar, Ruben
Zamar, Ruben的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zamar, Ruben', 18)}}的其他基金
Robust Estimation and Model Ensemble Selection
鲁棒估计和模型集成选择
- 批准号:
RGPIN-2019-04201 - 财政年份:2021
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Robust Estimation and Model Ensemble Selection
鲁棒估计和模型集成选择
- 批准号:
RGPIN-2019-04201 - 财政年份:2020
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Robust Estimation and Model Ensemble Selection
鲁棒估计和模型集成选择
- 批准号:
RGPIN-2019-04201 - 财政年份:2019
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Application of robust statistical models to measure data quality for improved use of sensors and diagnostics in an active mine setting
应用稳健的统计模型来测量数据质量,以改善活跃矿山环境中传感器和诊断的使用
- 批准号:
532134-2018 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Engage Grants Program
Robust Estimation and Inference
稳健的估计和推理
- 批准号:
RGPIN-2014-05227 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Robust Estimation and Inference
稳健的估计和推理
- 批准号:
RGPIN-2014-05227 - 财政年份:2017
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Design and development of machine learning and data optimization processes for detection of biogenic patterns
设计和开发用于检测生物模式的机器学习和数据优化流程
- 批准号:
500801-2016 - 财政年份:2016
- 资助金额:
$ 1.82万 - 项目类别:
Engage Plus Grants Program
Robust Estimation and Inference
稳健的估计和推理
- 批准号:
RGPIN-2014-05227 - 财政年份:2016
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Robust Estimation and Inference
稳健的估计和推理
- 批准号:
RGPIN-2014-05227 - 财政年份:2015
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Design and development of machine learning and data optimization processes for detection of biogenic patterns
设计和开发用于检测生物模式的机器学习和数据优化流程
- 批准号:
490833-2015 - 财政年份:2015
- 资助金额:
$ 1.82万 - 项目类别:
Engage Grants Program
相似海外基金
Bayesian causal estimation via model misspecification
通过模型错误指定进行贝叶斯因果估计
- 批准号:
EP/Y029755/1 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Research Grant
CAREER: New data integration approaches for efficient and robust meta-estimation, model fusion and transfer learning
职业:新的数据集成方法,用于高效、稳健的元估计、模型融合和迁移学习
- 批准号:
2337943 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Continuing Grant
Identification and Estimation of the entry model of firms in the differentiated products oligopoly market.
差异化产品寡头垄断市场企业进入模式的识别与估计。
- 批准号:
23K01393 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of a receptivity estimation model for risky utterance strategies in non-task-oriented conversational systems
非面向任务的会话系统中风险话语策略的接受度估计模型的构建
- 批准号:
23K16923 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Clustered Coefficient Regression Model-Based Estimators in Small Area Estimation
小区域估计中基于聚类系数回归模型的估计器
- 批准号:
2316353 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Standard Grant
A Toolkit for Endogenous Regime-Switching model Estimation
内生机制切换模型估计工具包
- 批准号:
EP/Y023595/1 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Research Grant
Development of tree and log strength estimation model: Optimizing wood distribution based on strength information
树木和原木强度估计模型的开发:根据强度信息优化木材分布
- 批准号:
23K05470 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Maximum Precipitation estimation using a numerical weather model for Senjo-kousuitai events in the Tohoku region, Japan
使用数值天气模型估算日本东北地区 Senjo-kousuitai 事件的最大降水量
- 批准号:
23K19131 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Predictive value of Parkinson's Disease gait model-derived features for disease severity and progression estimation: An individual patient data meta-analysis.
帕金森病步态模型衍生特征对疾病严重程度和进展估计的预测价值:个体患者数据荟萃分析。
- 批准号:
495240 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
The Development of Age Estimation Model for Unidentified Individuals Based on Mitochondrial DNA Methylation Changes
基于线粒体DNA甲基化变化的身份不明个体年龄估计模型的建立
- 批准号:
22KJ0206 - 财政年份:2023
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for JSPS Fellows