High-Dimensional Random Forests Learning, Inference, and Beyond

高维随机森林学习、推理及其他

基本信息

  • 批准号:
    2310981
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-08-15 至 2026-07-31
  • 项目状态:
    未结题

项目摘要

Random Forests are one of the most popularly used computational methods for making predictions. The approach works by creating a group of decision-makers, like a team of experts, and then aggregates the individual predictions by these experts to form the final prediction. The great success of Random Forests has been verified by the superior performance when applied to many different types of data. Despite the tremendous success, Random Forests are still largely regarded as a Black-box method because of the limited theoretical understanding of it. The complicated nature of the algorithm and lack of theoretical understanding also make the results it produces less reproducible and hard to interpret. The project will theoretically study the properties of Random Forests to understand when the algorithm works, and more importantly, when the algorithm fails. Such studies can provide practitioners with more confidence and better guidance in applying Random Forests. The project will investigate how to improve the interpretability of Random Forests. Finally, with the understanding gained from these studies, the project will study how to improve the performance of the algorithm to make it even more useful for big data analysis. These research activities will offer numerous training initiatives for professional development of the next generation of statisticians and data scientists.Recently, there has been made important progress in the analysis of random forest algorithms, for instance, proof of the polynomial consistency rate of the original version of Random Forests in the high dimensional setting, without making specific assumptions of the regression function and feature distribution. Yet, there are still many fundamentally important questions left unanswered. The overall objective of this project is to provide an in-depth understanding of complicated ensemble methods such as Random Forests, and provide improved, interpretable, and reproducible statistical estimation and inference results. The project will first study some important open questions about Random Forests, and then move to the statistical inference. In particular, recent studies have confirmed that Random Forests can adapt to sparse models. A natural question is how to undermine the underlying true sparsity structure. Furthermore, some preliminary results suggest that popular existing methods are biased when there exists feature collinearity. The project will develop valid feature importance measures and further investigate the calculation of p-values for evaluating conditional feature importance in the existence of feature collinearity. The project will also move beyond Random Forests and study the larger problem of the conditional independence test. Utilizing the insights gained from these theoretical studies, the project will further develop an improved ensemble learning method for better prediction, interpretability, and reproducibility in big data analysis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随机森林是最常用的预测计算方法之一。该方法的工作原理是创建一组决策者,就像一个专家团队一样,然后将这些专家的个人预测汇总起来,形成最终的预测。随机森林的巨大成功已经被应用于许多不同类型的数据时的优越性能所验证。尽管取得了巨大的成功,随机森林仍然很大程度上被认为是一种黑盒方法,因为对它的理论理解有限。算法的复杂性和缺乏理论认识也使得它产生的结果重复性较差,难以解释。该项目将从理论上研究随机森林的特性,以了解算法何时有效,更重要的是,何时算法失败。这些研究可以为实践者在应用随机森林时提供更大的信心和更好的指导。该项目将研究如何提高随机森林的可解释性。最后,通过对这些研究的了解,项目将研究如何提高算法的性能,使其更适用于大数据分析。这些研究活动将为下一代统计学家和数据科学家的专业发展提供许多培训举措。近年来,在随机森林算法的分析方面取得了重要进展,例如,在没有对回归函数和特征分布做出具体假设的情况下,证明了高维环境下原始版本随机森林的多项式一致性率。然而,仍有许多根本重要的问题没有得到解答。该项目的总体目标是提供对随机森林等复杂集成方法的深入理解,并提供改进的、可解释的、可重复的统计估计和推断结果。该项目将首先研究一些关于随机森林的重要开放问题,然后转向统计推断。特别是,最近的研究已经证实随机森林可以适应稀疏模型。一个自然的问题是如何破坏潜在的真正稀疏性结构。此外,一些初步结果表明,当存在特征共线性时,现有的常用方法存在偏差。该项目将开发有效的特征重要性度量,并进一步研究在特征共线性存在的情况下评估条件特征重要性的p值的计算。该项目还将超越随机森林,研究更大的条件独立性测试问题。利用从这些理论研究中获得的见解,该项目将进一步开发一种改进的集成学习方法,以便在大数据分析中更好地预测、解释和再现。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yingying Fan其他文献

Human Vc9Vd2-T cells efficiently kill influenza virus-infected lung alveolar epithelial cells
人Vc9Vd2-T细胞有效杀死流感病毒感染的肺泡上皮细胞
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    24.1
  • 作者:
    Hong Li;Wenwei Tu;Zheng Xiang;Ting Feng;Jinrong Li;Yingying Fan;Qiao Lu;Zhongwei Yin;Meixing Yu1;Chongyang Shen
  • 通讯作者:
    Chongyang Shen
Effect of ,-Dimethylacrylshikonin on Inhibition of Human Colorectal Cancer Cell Growth in Vitro and in Vivo
效果
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yingying Fan;Shaoju Jin;Jun He;Zhenjun Shao;Jiao Yan;Ting Feng;Hong Li
  • 通讯作者:
    Hong Li
Asymptotic properties of high-dimensional random forests
高维随机森林的渐近性质
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Chien;Patrick Vossler;Yingying Fan;Jinchi Lv
  • 通讯作者:
    Jinchi Lv
Lipid composition and oxidative changes in diabetes and alcoholic diabetes rats
糖尿病和酒精糖尿病大鼠的脂质组成和氧化变化
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lin Qin;Shaik Althaf Hussain;N. Maddu;Chinna Padamala Manjuvani;Venkata Subba Reddy Gangireddygari;Yingying Fan
  • 通讯作者:
    Yingying Fan
Estimation of weak factor models
弱因子模型的估计
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yingying Fan;Jinchi Lv;Mahrad Sharifvaghefi;Yoshimasa Uematsu;Yoshimasa Uematsu;Yoshimasa Uematsu;植松良公;植松良公;植松良公;植松良公;Yoshimasa Uematsu;Yoshimasa Uematsu
  • 通讯作者:
    Yoshimasa Uematsu

Yingying Fan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yingying Fan', 18)}}的其他基金

FRG: Collaborative Research: Flexible Network Inference
FRG:协作研究:灵活的网络推理
  • 批准号:
    2052964
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: High-Dimensional Variable Selection in Nonlinear Models and Classification with Correlated Data
职业:非线性模型中的高维变量选择以及相关数据的分类
  • 批准号:
    1150318
  • 财政年份:
    2012
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Regularization Methods in High Dimensions with Applications to Functional Data Analysis, Mixed Effects Models and Classification
高维正则化方法及其在函数数据分析、混合效应模型和分类中的应用
  • 批准号:
    0906784
  • 财政年份:
    2009
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant

相似海外基金

Improving Mortality Prediction in People with Cardiovascular Disease: A Random Survival Forests Approach Integrating Frailty Assessment
改善心血管疾病患者的死亡率预测:结合衰弱评估的随机生存森林方法
  • 批准号:
    495596
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
Developing Efficient Numerical Algorithms Using Fast Bayesian Random Forests
使用快速贝叶斯随机森林开发高效的数值算法
  • 批准号:
    2748743
  • 财政年份:
    2022
  • 资助金额:
    $ 25万
  • 项目类别:
    Studentship
Random forests, nonparametric and screening methods
随机森林、非参数和筛选方法
  • 批准号:
    RGPIN-2016-05702
  • 财政年份:
    2022
  • 资助金额:
    $ 25万
  • 项目类别:
    Discovery Grants Program - Individual
Random forests, nonparametric and screening methods
随机森林、非参数和筛选方法
  • 批准号:
    RGPIN-2016-05702
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Discovery Grants Program - Individual
Human Forests versus Random Forest Models in Prediction
预测中的人类森林与随机森林模型
  • 批准号:
    2050727
  • 财政年份:
    2020
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Deep Learning and Random Forests for High-Dimensional Regression
用于高维回归的深度学习和随机森林
  • 批准号:
    2054808
  • 财政年份:
    2020
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Random forests, nonparametric and screening methods
随机森林、非参数和筛选方法
  • 批准号:
    RGPIN-2016-05702
  • 财政年份:
    2019
  • 资助金额:
    $ 25万
  • 项目类别:
    Discovery Grants Program - Individual
Human Forests versus Random Forest Models in Prediction
预测中的人类森林与随机森林模型
  • 批准号:
    1919333
  • 财政年份:
    2019
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Deep Learning and Random Forests for High-Dimensional Regression
用于高维回归的深度学习和随机森林
  • 批准号:
    1915932
  • 财政年份:
    2019
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Random forests, nonparametric and screening methods
随机森林、非参数和筛选方法
  • 批准号:
    RGPIN-2016-05702
  • 财政年份:
    2018
  • 资助金额:
    $ 25万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了