权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Deep Learning and Random Forests for High-Dimensional Regression

用于高维回归的深度学习和随机森林

基本信息

批准号：
1915932
负责人：
Jason Klusowski
金额：
$ 18万
依托单位：
Rutgers University New Brunswick
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-08-15 至 2020-11-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1915932&HistoricalAwards=false
关键词：
Deep Learning Random Forests Dimensional

项目摘要

This project aims to investigate two of the most widely used and state-of-the-art methods for high-dimensional regression: deep neural networks and random forests. Despite their widespread implementation, pinning down their theoretical properties has eluded researchers until recently. The proposed research aims to add to the growing body of literature on their analysis, by both developing tools of theoretical value and providing guarantees and guidance for practitioners and applied scientists who use these popular methods frequently in their work.The success of multi-layer networks has largely been buoyed by their ability to generalize well despite being able to fit most datasets, given enough parameters. This phenomenon is particularly striking when the input dimension is far greater than the available sample size, as is the case with many modern applications in molecular biology, medical imaging, and astrophysics, to name a few. A major component of the proposed work will be to obtain complexity bounds for classes of deep neural networks with controls on the size of their weights, which can then be used to bound generalization error and statistical risk. These complexity bounds reveal the role of complexity penalization, which is based on certain norms of the weights of the network. Motivated by these observations, another stream of the proposed research seeks to provide statistical guarantees of certain complexity penalized estimators and their adaptive properties. Current theoretical results for random forests are either for stylized versions of those that are used in practice or are asymptotic in nature and it is therefore difficult to determine the quality of convergence as a function of the parameters of the random forest. Furthermore, the setting for the analysis of more practical implementations of random forests is limited to structured, fixed-dimensional regression function classes. Given these restrictions, the first component of the proposal aims to investigate how random forests behave in the high-dimensional regime when the number of predictors grows with the sample size. Another research objective is to isolate and study families of flexible high-dimensional regression functions for which finite sample convergence rates can be established. The final endeavor of this project is to connect popular measures of variable importance to the bias of random forests. Since variable importance measures are used for assessing the role each predictor variable plays in influencing the output, this connection will partially explain why random forests are adaptive to sparsity. The relationship will also help to theoretically motivate variable importance measures as useful tools for model interpretability.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目旨在研究两种最广泛使用和最先进的高维回归方法：深度神经网络和随机森林。尽管它们得到了广泛的应用，但直到最近，研究人员才确定它们的理论性质。该研究旨在通过开发具有理论价值的工具，并为经常使用这些流行方法的实践者和应用科学家提供保证和指导，来增加越来越多的关于其分析的文献。多层网络的成功在很大程度上得益于它们的泛化能力，尽管它们能够适应大多数数据集，给定足够的参数。当输入维度远远大于可用样本大小时，这种现象尤其引人注目，例如分子生物学，医学成像和天体物理学等许多现代应用中的情况。拟议工作的一个主要组成部分将是获得深度神经网络类的复杂性界限，并控制其权重的大小，然后可用于限制泛化误差和统计风险。这些复杂性界限揭示了复杂性惩罚的作用，这是基于网络权重的某些规范。受这些观察的启发，另一个流的拟议研究旨在提供某些复杂性惩罚估计和自适应特性的统计保证。目前的随机森林的理论结果是程式化的版本，在实践中使用的或渐近的性质，因此很难确定的随机森林的参数的函数的收敛质量。此外，分析随机森林更实际的实现的设置仅限于结构化的、固定维度的回归函数类。考虑到这些限制，该提案的第一个组成部分旨在研究当预测因子的数量随着样本大小而增长时，随机森林在高维制度中的行为。另一个研究目标是分离和研究灵活的高维回归函数的家庭，有限样本收敛速度可以建立。这个项目的最后奋进是将流行的变量重要性度量与随机森林的偏差联系起来。由于变量重要性度量用于评估每个预测变量在影响输出中所起的作用，这种联系将部分解释为什么随机森林适应稀疏性。这种关系也将有助于从理论上激励变量重要性措施作为模型可解释性的有用工具。该奖项反映了NSF的法定使命，并被认为是值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估的支持。