Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
基本信息
- 批准号:2015636
- 负责人:
- 金额:$ 2.59万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2020-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A common task in statistics is prediction; for example, a practitioner may be interested in predicting the presence of a disease given genetic information about an individual. Due to recent advances in data collection, frequently one has access to datasets which contain a massive number of predictors, but with correspondingly few subjects. This setting is generally referred to as the "big P, small n" scenario. Drawing meaningful conclusions under such circumstances is generally impossible unless the underlying data satisfy certain structural assumptions. The simplest such structural assumption is that only a small number of the predictors are relevant; in this setting, finding the useful predictors corresponds to finding a so-called "needle in a haystack." The goal of this project is to construct procedures which adapt to this, and other, structural assumptions. The project will focus on methods based on decision trees, which are flowchart-like structures in which predictions are based on whether the predictors satisfy various rules. Usually an ensemble of decision trees are constructed, with the predictions for each individual tree averaged. While decision tree ensembles are frequently used with high dimensional data, it is unclear to what extent they adapt to the structural properties of the data. This project will show that, in practice, off-the-shelf decision tree ensembling methods do not adapt to common structural assumptions, and will develop new methods which do. In addition to developing methods with strong theoretical support, this project will support the development of an R package to give practitioners easy access to our methodology. The PI will develop Bayesian methods for incorporating structural information into tree-based ensemble methods, and establish theoretically the benefit of making use of this additional information. This forms a nonparametric counterpart to the parametric approaches used in linear models, such as the lasso, graphical lasso, or group lasso; Bayesian approaches in the parametric setting include the use of variable selection priors, such as spike-and-slab priors and global-local shrinkage priors. Structural information will be incorporated by modifying the commonly used priors on decision tree ensembles so that the prior is concentrated on models which satisfy the desired structure. The PI will first investigate the theoretical properties of a sparsity inducing prior which is designed to eliminate unnecessary predictors. Sparsity here is obtained by applying a sparsity inducing Dirichlet prior to the a priori probability that a given branch is associated to a given predictor. This prior will be extended to allow for grouped variable selection in a similar manner to the group lassoby considering the class of Dirichlet tree priors, and further to accommodate graphical structures in the predictors through sparsity inducing logistic normal priors. Additionally, the PI will develop computationally efficient Markov chain Monte Carlo algorithms to fit the resulting models. Compared to existing methods, these structural priors will be shown to lead to substantial gains in predictive accuracy, and to more accurate scientific discovery.
统计学中的一个常见任务是预测;例如,从业者可能对预测给定个体遗传信息的疾病的存在感兴趣。由于数据收集的最新进展,人们经常可以访问包含大量预测因子的数据集,但相应的主题很少。这种设置通常被称为“大P,小n”场景。在这种情况下得出有意义的结论通常是不可能的,除非基础数据满足某些结构性假设。最简单的此类结构性假设是,只有少数预测因子是相关的;在这种情况下,找到有用的预测因子相当于找到所谓的“大海捞针”。“这个项目的目标是建立适应这种和其他结构性假设的程序。该项目将侧重于基于决策树的方法,决策树是一种类似流程图的结构,其中预测是基于预测器是否满足各种规则。通常构造决策树的集合,并对每个单独的树的预测进行平均。虽然决策树集成经常用于高维数据,但尚不清楚它们在多大程度上适应数据的结构特性。这个项目将表明,在实践中,现成的决策树集成方法不适合常见的结构假设,并将开发新的方法。除了开发具有强大理论支持的方法外,该项目还将支持R包的开发,以使从业者轻松访问我们的方法。 PI将开发贝叶斯方法,将结构信息纳入基于树的集成方法,并从理论上确定利用这些额外信息的好处。这形成了线性模型中使用的参数方法的非参数对应物,例如套索,图形套索或组套索;参数设置中的贝叶斯方法包括使用变量选择先验,例如尖峰和平板先验和全局-局部收缩先验。结构信息将通过修改决策树集合上常用的先验来合并,使得先验集中在满足所需结构的模型上。PI将首先研究稀疏诱导先验的理论特性,该先验旨在消除不必要的预测因子。这里的稀疏性通过在给定分支与给定预测器相关联的先验概率之前应用稀疏性诱导狄利克雷来获得。这之前将被扩展到允许分组变量选择以类似的方式考虑类的Dirichlet树先验的组lassoby,并进一步通过稀疏诱导逻辑正态先验的预测,以适应图形结构。此外,PI将开发计算效率高的马尔可夫链蒙特卡罗算法,以适应产生的模型。与现有的方法相比,这些结构先验将被证明会导致预测准确性的大幅提高,以及更准确的科学发现。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Computationally efficient Bayesian sequential function monitoring
计算高效的贝叶斯顺序函数监控
- DOI:10.1080/00224065.2020.1801366
- 发表时间:2020
- 期刊:
- 影响因子:2.5
- 作者:Shamp, Wright;Varbanov, Roumen;Chicken, Eric;Linero, Antonio;Yang, Yun
- 通讯作者:Yang, Yun
Semiparametric mixed‐scale models using shared Bayesian forests
使用共享贝叶斯森林的半参数混合尺度模型
- DOI:10.1111/biom.13107
- 发表时间:2019
- 期刊:
- 影响因子:1.9
- 作者:Linero, Antonio R.;Sinha, Debajyoti;Lipsitz, Stuart R.
- 通讯作者:Lipsitz, Stuart R.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Antonio Linero其他文献
Advances in Periodic Difference Equations with Open Problems
具有开放问题的周期差分方程的进展
- DOI:
10.1007/978-3-662-44140-4_6 - 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Z. Alsharawi;Jose C´anovas;Antonio Linero - 通讯作者:
Antonio Linero
Antonio Linero的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Antonio Linero', 18)}}的其他基金
CAREER: Foundations for Bayesian Nonparametric Causal Inference
职业:贝叶斯非参数因果推理基础
- 批准号:
2144933 - 财政年份:2022
- 资助金额:
$ 2.59万 - 项目类别:
Continuing Grant
Leveraging Structural Information in Regression Tree Ensembles
利用回归树集成中的结构信息
- 批准号:
1712870 - 财政年份:2017
- 资助金额:
$ 2.59万 - 项目类别:
Continuing Grant
相似国自然基金
Understanding structural evolution of galaxies with machine learning
- 批准号:
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
相似海外基金
CAREER: Efficient coding of visual,structural, and semantic scene information
职业:视觉、结构和语义场景信息的高效编码
- 批准号:
2240815 - 财政年份:2023
- 资助金额:
$ 2.59万 - 项目类别:
Continuing Grant
Discovery of Structural Weaknesses of Information Barriers and Exploration of Improvement Measures through Mathematical Modeling
通过数学建模发现信息壁垒的结构性弱点并探索改进措施
- 批准号:
23K01215 - 财政年份:2023
- 资助金额:
$ 2.59万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of allosteric chaperone compounds based on structural information of the target enzymes
基于目标酶的结构信息开发变构伴侣化合物
- 批准号:
23K06403 - 财政年份:2023
- 资助金额:
$ 2.59万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
3D-Proteomics: FAIRification of proteomics data for comprehensive integration with structural biology information
3D-蛋白质组学:蛋白质组学数据的公平化,以与结构生物学信息全面整合
- 批准号:
BB/V018779/1 - 财政年份:2022
- 资助金额:
$ 2.59万 - 项目类别:
Research Grant
CRII: III: Learning networks from strategic decisions: enabling network intervention and revealing social privacy risks without structural information
CRII:III:从战略决策中学习网络:在没有结构信息的情况下实现网络干预并揭示社会隐私风险
- 批准号:
2153468 - 财政年份:2022
- 资助金额:
$ 2.59万 - 项目类别:
Standard Grant
Integration of functional and structural knowledge across scales to decipher information processing in the mammalian brain
跨尺度整合功能和结构知识来破译哺乳动物大脑中的信息处理
- 批准号:
EP/W024292/1 - 财政年份:2022
- 资助金额:
$ 2.59万 - 项目类别:
Research Grant
3D-Proteomics: FAIRification of proteomics data for comprehensive integration with structural biology information
3D-蛋白质组学:蛋白质组学数据的公平化,以与结构生物学信息全面整合
- 批准号:
BB/V018817/1 - 财政年份:2022
- 资助金额:
$ 2.59万 - 项目类别:
Research Grant
Increased performance of nanophotonic devices by utilizing structural fluctuation information with deep learning
通过深度学习利用结构波动信息提高纳米光子器件的性能
- 批准号:
21K18912 - 财政年份:2021
- 资助金额:
$ 2.59万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Building machine learning models and neural networks trained on structural information of drug targets to predict antimicrobial resistance
构建机器学习模型和神经网络,并根据药物靶标的结构信息进行训练,以预测抗菌药物耐药性
- 批准号:
2597363 - 财政年份:2021
- 资助金额:
$ 2.59万 - 项目类别:
Studentship
Biomechanical Framework to Integrate Structural MRI Information in White Matter
整合白质结构 MRI 信息的生物力学框架
- 批准号:
10043013 - 财政年份:2020
- 资助金额:
$ 2.59万 - 项目类别:














{{item.name}}会员




