Leveraging Structural Information in Regression Tree Ensembles

利用回归树集成中的结构信息

基本信息

  • 批准号:
    1712870
  • 负责人:
  • 金额:
    $ 10万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-09-01 至 2020-02-29
  • 项目状态:
    已结题

项目摘要

A common task in statistics is prediction; for example, a practitioner may be interested in predicting the presence of a disease given genetic information about an individual. Due to recent advances in data collection, frequently one has access to datasets which contain a massive number of predictors, but with correspondingly few subjects. This setting is generally referred to as the "big P, small n" scenario. Drawing meaningful conclusions under such circumstances is generally impossible unless the underlying data satisfy certain structural assumptions. The simplest such structural assumption is that only a small number of the predictors are relevant; in this setting, finding the useful predictors corresponds to finding a so-called "needle in a haystack." The goal of this project is to construct procedures which adapt to this, and other, structural assumptions. The project will focus on methods based on decision trees, which are flowchart-like structures in which predictions are based on whether the predictors satisfy various rules. Usually an ensemble of decision trees are constructed, with the predictions for each individual tree averaged. While decision tree ensembles are frequently used with high dimensional data, it is unclear to what extent they adapt to the structural properties of the data. This project will show that, in practice, off-the-shelf decision tree ensembling methods do not adapt to common structural assumptions, and will develop new methods which do. In addition to developing methods with strong theoretical support, this project will support the development of an R package to give practitioners easy access to our methodology. The PI will develop Bayesian methods for incorporating structural information into tree-based ensemble methods, and establish theoretically the benefit of making use of this additional information. This forms a nonparametric counterpart to the parametric approaches used in linear models, such as the lasso, graphical lasso, or group lasso; Bayesian approaches in the parametric setting include the use of variable selection priors, such as spike-and-slab priors and global-local shrinkage priors. Structural information will be incorporated by modifying the commonly used priors on decision tree ensembles so that the prior is concentrated on models which satisfy the desired structure. The PI will first investigate the theoretical properties of a sparsity inducing prior which is designed to eliminate unnecessary predictors. Sparsity here is obtained by applying a sparsity inducing Dirichlet prior to the a priori probability that a given branch is associated to a given predictor. This prior will be extended to allow for grouped variable selection in a similar manner to the group lassoby considering the class of Dirichlet tree priors, and further to accommodate graphical structures in the predictors through sparsity inducing logistic normal priors. Additionally, the PI will develop computationally efficient Markov chain Monte Carlo algorithms to fit the resulting models. Compared to existing methods, these structural priors will be shown to lead to substantial gains in predictive accuracy, and to more accurate scientific discovery.
统计学中的一个常见任务是预测;例如,医生可能对给定个体遗传信息预测某种疾病的存在感兴趣。由于数据收集的最新进展,人们经常可以访问包含大量预测因子的数据集,但相应的,只有很少的主题。这种设置通常被称为“大P,小n”场景。在这种情况下通常不可能得出有意义的结论,除非基础数据满足某些结构性假设。最简单的结构性假设是,只有少数预测因素是相关的;在这种情况下,找到有用的预测因子相当于找到所谓的“大海捞针”。这个项目的目标是构建适应这个和其他结构假设的程序。该项目将侧重于基于决策树的方法,决策树是类似流程图的结构,其中预测基于预测者是否满足各种规则。通常构建决策树的集合,对每个单独树的预测取平均值。虽然决策树集成经常用于高维数据,但尚不清楚它们在多大程度上适应数据的结构属性。该项目将表明,在实践中,现成的决策树集成方法不适应常见的结构假设,并将开发新的方法来适应。除了开发具有强大理论支持的方法外,该项目还将支持R软件包的开发,以便从业者轻松访问我们的方法。PI将开发贝叶斯方法,将结构信息整合到基于树的集成方法中,并从理论上建立利用这些附加信息的好处。这形成了线性模型中使用的参数方法的非参数对应物,例如套索,图形套索或组套索;参数设置中的贝叶斯方法包括使用变量选择先验,例如尖钉-板先验和全局-局部收缩先验。通过修改决策树集成中常用的先验,将结构信息纳入决策树集成中,使先验集中在满足期望结构的模型上。PI将首先研究稀疏性诱导先验的理论性质,该先验旨在消除不必要的预测因子。这里的稀疏性是通过在给定分支与给定预测器相关联的先验概率之前应用稀疏性诱导狄利克雷获得的。该先验将被扩展到允许以与考虑Dirichlet树先验类类似的方式进行分组变量选择,并进一步通过稀疏性诱导逻辑正态先验来适应预测器中的图形结构。此外,PI将开发计算效率高的马尔可夫链蒙特卡罗算法来拟合所得模型。与现有方法相比,这些结构先验将显示出预测准确性的实质性提高,并更准确地进行科学发现。

项目成果

期刊论文数量(7)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Bayesian regression tree ensembles that adapt to smoothness and sparsity
Interaction Detection with Bayesian Decision Tree Ensembles
使用贝叶斯决策树集成进行交互检测
Bayesian Approaches for Missing Not at Random Outcome Data: The Role of Identifying Restrictions
  • DOI:
    10.1214/17-sts630
  • 发表时间:
    2018-05-01
  • 期刊:
  • 影响因子:
    5.7
  • 作者:
    Linero, Antonio R.;Daniels, Michael J.
  • 通讯作者:
    Daniels, Michael J.
A Bayesian approach to sequential monitoring of nonlinear profiles using wavelets: Wavelet-Based Bayesian Profile Monitoring
使用小波连续监测非线性剖面的贝叶斯方法:基于小波的贝叶斯剖面监测
Incorporating Grouping Information into Bayesian Decision Tree Ensembles
将分组信息合并到贝叶斯决策树集成中
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Antonio Linero其他文献

Advances in Periodic Difference Equations with Open Problems
具有开放问题的周期差分方程的进展
  • DOI:
    10.1007/978-3-662-44140-4_6
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Z. Alsharawi;Jose C´anovas;Antonio Linero
  • 通讯作者:
    Antonio Linero

Antonio Linero的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Antonio Linero', 18)}}的其他基金

CAREER: Foundations for Bayesian Nonparametric Causal Inference
职业:贝叶斯非参数因果推理基础
  • 批准号:
    2144933
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
  • 批准号:
    2015636
  • 财政年份:
    2019
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant

相似国自然基金

Understanding structural evolution of galaxies with machine learning
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目

相似海外基金

CAREER: Efficient coding of visual,structural, and semantic scene information
职业:视觉、结构和语义场景信息的高效编码
  • 批准号:
    2240815
  • 财政年份:
    2023
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Discovery of Structural Weaknesses of Information Barriers and Exploration of Improvement Measures through Mathematical Modeling
通过数学建模发现信息壁垒的结构性弱点并探索改进措施
  • 批准号:
    23K01215
  • 财政年份:
    2023
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Development of allosteric chaperone compounds based on structural information of the target enzymes
基于目标酶的结构信息开发变构伴侣化合物
  • 批准号:
    23K06403
  • 财政年份:
    2023
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
3D-Proteomics: FAIRification of proteomics data for comprehensive integration with structural biology information
3D-蛋白质组学:蛋白质组学数据的公平化,以与结构生物学信息全面整合
  • 批准号:
    BB/V018779/1
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
    Research Grant
CRII: III: Learning networks from strategic decisions: enabling network intervention and revealing social privacy risks without structural information
CRII:III:从战略决策中学习网络:在没有结构信息的情况下实现网络干预并揭示社会隐私风险
  • 批准号:
    2153468
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Integration of functional and structural knowledge across scales to decipher information processing in the mammalian brain
跨尺度整合功能和结构知识来破译哺乳动物大脑中的信息处理
  • 批准号:
    EP/W024292/1
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
    Research Grant
3D-Proteomics: FAIRification of proteomics data for comprehensive integration with structural biology information
3D-蛋白质组学:蛋白质组学数据的公平化,以与结构生物学信息全面整合
  • 批准号:
    BB/V018817/1
  • 财政年份:
    2022
  • 资助金额:
    $ 10万
  • 项目类别:
    Research Grant
Increased performance of nanophotonic devices by utilizing structural fluctuation information with deep learning
通过深度学习利用结构波动信息提高纳米光子器件的性能
  • 批准号:
    21K18912
  • 财政年份:
    2021
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Exploratory)
Building machine learning models and neural networks trained on structural information of drug targets to predict antimicrobial resistance
构建机器学习模型和神经网络,并根据药物靶标的结构信息进行训练,以预测抗菌药物耐药性
  • 批准号:
    2597363
  • 财政年份:
    2021
  • 资助金额:
    $ 10万
  • 项目类别:
    Studentship
Biomechanical Framework to Integrate Structural MRI Information in White Matter
整合白质结构 MRI 信息的生物力学框架
  • 批准号:
    10043013
  • 财政年份:
    2020
  • 资助金额:
    $ 10万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了