权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

High-Dimensional Random Forests Learning, Inference, and Beyond

高维随机森林学习、推理及其他

基本信息

批准号：
2310981
负责人：
Yingying Fan
金额：
$ 25万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-15 至 2026-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2310981&HistoricalAwards=false
关键词：
Dimensional Random Forests Learning Inference

项目摘要

Random Forests are one of the most popularly used computational methods for making predictions. The approach works by creating a group of decision-makers, like a team of experts, and then aggregates the individual predictions by these experts to form the final prediction. The great success of Random Forests has been verified by the superior performance when applied to many different types of data. Despite the tremendous success, Random Forests are still largely regarded as a Black-box method because of the limited theoretical understanding of it. The complicated nature of the algorithm and lack of theoretical understanding also make the results it produces less reproducible and hard to interpret. The project will theoretically study the properties of Random Forests to understand when the algorithm works, and more importantly, when the algorithm fails. Such studies can provide practitioners with more confidence and better guidance in applying Random Forests. The project will investigate how to improve the interpretability of Random Forests. Finally, with the understanding gained from these studies, the project will study how to improve the performance of the algorithm to make it even more useful for big data analysis. These research activities will offer numerous training initiatives for professional development of the next generation of statisticians and data scientists.Recently, there has been made important progress in the analysis of random forest algorithms, for instance, proof of the polynomial consistency rate of the original version of Random Forests in the high dimensional setting, without making specific assumptions of the regression function and feature distribution. Yet, there are still many fundamentally important questions left unanswered. The overall objective of this project is to provide an in-depth understanding of complicated ensemble methods such as Random Forests, and provide improved, interpretable, and reproducible statistical estimation and inference results. The project will first study some important open questions about Random Forests, and then move to the statistical inference. In particular, recent studies have confirmed that Random Forests can adapt to sparse models. A natural question is how to undermine the underlying true sparsity structure. Furthermore, some preliminary results suggest that popular existing methods are biased when there exists feature collinearity. The project will develop valid feature importance measures and further investigate the calculation of p-values for evaluating conditional feature importance in the existence of feature collinearity. The project will also move beyond Random Forests and study the larger problem of the conditional independence test. Utilizing the insights gained from these theoretical studies, the project will further develop an improved ensemble learning method for better prediction, interpretability, and reproducibility in big data analysis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随机森林是最常用的预测计算方法之一。该方法的工作原理是创建一组决策者，就像一个专家团队一样，然后将这些专家的个人预测汇总起来，形成最终的预测。随机森林的巨大成功已经被应用于许多不同类型的数据时的优越性能所验证。尽管取得了巨大的成功，随机森林仍然很大程度上被认为是一种黑盒方法，因为对它的理论理解有限。算法的复杂性和缺乏理论认识也使得它产生的结果重复性较差，难以解释。该项目将从理论上研究随机森林的特性，以了解算法何时有效，更重要的是，何时算法失败。这些研究可以为实践者在应用随机森林时提供更大的信心和更好的指导。该项目将研究如何提高随机森林的可解释性。最后，通过对这些研究的了解，项目将研究如何提高算法的性能，使其更适用于大数据分析。这些研究活动将为下一代统计学家和数据科学家的专业发展提供许多培训举措。近年来，在随机森林算法的分析方面取得了重要进展，例如，在没有对回归函数和特征分布做出具体假设的情况下，证明了高维环境下原始版本随机森林的多项式一致性率。然而，仍有许多根本重要的问题没有得到解答。该项目的总体目标是提供对随机森林等复杂集成方法的深入理解，并提供改进的、可解释的、可重复的统计估计和推断结果。该项目将首先研究一些关于随机森林的重要开放问题，然后转向统计推断。特别是，最近的研究已经证实随机森林可以适应稀疏模型。一个自然的问题是如何破坏潜在的真正稀疏性结构。此外，一些初步结果表明，当存在特征共线性时，现有的常用方法存在偏差。该项目将开发有效的特征重要性度量，并进一步研究在特征共线性存在的情况下评估条件特征重要性的p值的计算。该项目还将超越随机森林，研究更大的条件独立性测试问题。利用从这些理论研究中获得的见解，该项目将进一步开发一种改进的集成学习方法，以便在大数据分析中更好地预测、解释和再现。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Yingying Fan其他文献

Human Vc9Vd2-T cells efficiently kill influenza virus-infected lung alveolar epithelial cells

人Vc9Vd2-T细胞有效杀死流感病毒感染的肺泡上皮细胞

DOI：
发表时间：
期刊：
Cellular and Molecular Immunology
影响因子：
24.1
作者：
Hong Li;Wenwei Tu;Zheng Xiang;Ting Feng;Jinrong Li;Yingying Fan;Qiao Lu;Zhongwei Yin;Meixing Yu1;Chongyang Shen
通讯作者：
Chongyang Shen

Effect of ,-Dimethylacrylshikonin on Inhibition of Human Colorectal Cancer Cell Growth in Vitro and in Vivo

效果

DOI：
发表时间：
期刊：
Int. J. Mol. Sci
影响因子：
0
作者：
Yingying Fan;Shaoju Jin;Jun He;Zhenjun Shao;Jiao Yan;Ting Feng;Hong Li
通讯作者：
Hong Li

Asymptotic properties of high-dimensional random forests

高维随机森林的渐近性质

DOI：
发表时间：
2020
期刊：
Annals of Statistics
影响因子：
4.5
作者：
Chien;Patrick Vossler;Yingying Fan;Jinchi Lv
通讯作者：
Jinchi Lv

Lipid composition and oxidative changes in diabetes and alcoholic diabetes rats

糖尿病和酒精糖尿病大鼠的脂质组成和氧化变化

DOI：
发表时间：
2022
期刊：
Journal of King Saud University - Science
影响因子：
0
作者：
Lin Qin;Shaik Althaf Hussain;N. Maddu;Chinna Padamala Manjuvani;Venkata Subba Reddy Gangireddygari;Yingying Fan
通讯作者：
Yingying Fan

Estimation of weak factor models

弱因子模型的估计

DOI：
发表时间：
2019
期刊：
影响因子：
0
作者：
Yingying Fan;Jinchi Lv;Mahrad Sharifvaghefi;Yoshimasa Uematsu;Yoshimasa Uematsu;Yoshimasa Uematsu;植松良公;植松良公;植松良公;植松良公;Yoshimasa Uematsu;Yoshimasa Uematsu
通讯作者：
Yoshimasa Uematsu