Ensemble Methods for Classification/Prediction With High-Dimensional Explanatory Variables

使用高维解释变量进行分类/预测的集成方法

基本信息

  • 批准号:
    RGPIN-2014-04962
  • 负责人:
  • 金额:
    $ 1.31万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2015
  • 资助国家:
    加拿大
  • 起止时间:
    2015-01-01 至 2016-12-31
  • 项目状态:
    已结题

项目摘要

Advances in science and engineering have vastly increased the number of variables available to predict / classify a response outcome of interest. At the same time the information in the data may be sparse. Novel methods based on ensembles of models are proposed for higher prediction accuracy. Methodology will be developed for two problems with these characteristics: prediction of complex computer codes and prediction / classification in analysis of drug discovery data. Deterministic computer models can have complex relationships with high-dimensional input (explanatory) variables. For instance, the Community Land Model of the carbon cycle and vegetation dynamics has hundreds of inputs for the ecosystem, climate, hydrology, etc. Experiments with about 100 variables are aimed at sensitivity analysis, i.e., find the inputs that have most impact on an output such as a measure of total vegetation. It is feasible to make thousands of computer model runs, yet work to date shows the input-output relationships are hard to model with useful accuracy. Most likely, there are complex interaction effects between the inputs, and identifying them is a challenge because of the high dimensionality. In drug discovery, the input variables are "chemical descriptors" from computational chemistry to characterize drug-like molecules. Many sets are available, and each can have thousands of variables. The response variable or output is from a physical assay of activity against a biological target implicated in a disease. A statistical model relating biological activity to the chemical inputs can be used to predict activity of molecules that have not been assayed yet, increasing efficiency of the process to search for candidate drugs. Unfortunately, active molecules are rare, so there is a paucity of information in the response data to fit a model. Gaussian Processes (GPs) are widely used to model the deterministic input-output relationship of a computer code. They have also been used in the analysis of drug discovery data. The proposed approach to high-dimensional input and limited data information is based on ensembles of GPs, either by building separate models and averaging them, or by ensembles of correlation functions (which are key to the GP approach). Ensembles have well known general advantages in prediction accuracy and are established as among the best for the drug discovery problem, for example. They typically generate multiple prediction models by perturbing the data (bootstrapping) or dividing the data observations and then fitting a model to each data set created. The models are then averaged when making predictions. With high-dimensional input, however, sparse information in the response data means that most of the input variables are unused in a model when it is fit to data. In contrast, the proposed approach is to build an ensemble of models over distinct subsets of input variables. A subset of inputs with interaction effects should be in the same model; variables that do not interact can be in separate models. It is easier to fill the input space in a data set densely a few variables at a time, increasing prediction accuracy. Furthermore, by attributing variables to different models, more inputs have a chance to contribute to prediction accuracy. The challenges and goals of the research program are how to identify subsets of high-dimensional input variables that should be together in the same model, how to combine models for high overall prediction accuracy, and efficient algorithms to overcome the computational demands of GP models. The over-arching goal is to understand how a statistical model like a GP should be tuned to the complexities of relationships involving high-dimensional input.
科学和工程的进步极大地增加了可用于预测/分类感兴趣的响应结果的变量的数量。同时,数据中的信息可能是稀疏的。为了提高预测精度,提出了基于模型集成的新方法。方法学将针对两个具有这些特征的问题发展:复杂计算机代码的预测和药物发现数据分析中的预测/分类。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Welch, William其他文献

Corporate Volunteerism, the Experience of Self-Integrity, and Organizational Commitment: Evidence from the Field
  • DOI:
    10.1007/s11211-014-0204-8
  • 发表时间:
    2014-03-01
  • 期刊:
  • 影响因子:
    2.3
  • 作者:
    Brockner, Joel;Senior, Deanna;Welch, William
  • 通讯作者:
    Welch, William
Surgical Management of Idiopathic Thoracic Spinal Cord Herniation
  • DOI:
    10.1016/j.wneu.2019.05.219
  • 发表时间:
    2019-09-01
  • 期刊:
  • 影响因子:
    2
  • 作者:
    Neale, Natalie;Ramayya, Ashwin;Welch, William
  • 通讯作者:
    Welch, William

Welch, William的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Welch, William', 18)}}的其他基金

Adaptive Design for Fast Machine/Statistical Learning
快速机器/统计学习的自适应设计
  • 批准号:
    RGPIN-2019-05019
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Design for Fast Machine/Statistical Learning
快速机器/统计学习的自适应设计
  • 批准号:
    RGPIN-2019-05019
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Design for Fast Machine/Statistical Learning
快速机器/统计学习的自适应设计
  • 批准号:
    RGPIN-2019-05019
  • 财政年份:
    2020
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Adaptive Design for Fast Machine/Statistical Learning
快速机器/统计学习的自适应设计
  • 批准号:
    RGPIN-2019-05019
  • 财政年份:
    2019
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Ensemble Methods for Classification/Prediction With High-Dimensional Explanatory Variables
使用高维解释变量进行分类/预测的集成方法
  • 批准号:
    RGPIN-2014-04962
  • 财政年份:
    2018
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Ensemble Methods for Classification/Prediction With High-Dimensional Explanatory Variables
使用高维解释变量进行分类/预测的集成方法
  • 批准号:
    RGPIN-2014-04962
  • 财政年份:
    2017
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Ensemble Methods for Classification/Prediction With High-Dimensional Explanatory Variables
使用高维解释变量进行分类/预测的集成方法
  • 批准号:
    RGPIN-2014-04962
  • 财政年份:
    2016
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Ensemble Methods for Classification/Prediction With High-Dimensional Explanatory Variables
使用高维解释变量进行分类/预测的集成方法
  • 批准号:
    RGPIN-2014-04962
  • 财政年份:
    2014
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Classification: methodology for variable selection and efficient tuning and comparasion of models
分类:变量选择和模型高效调整和比较的方法
  • 批准号:
    36462-2008
  • 财政年份:
    2012
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual
Classification: methodology for variable selection and efficient tuning and comparasion of models
分类:变量选择和模型高效调整和比较的方法
  • 批准号:
    36462-2008
  • 财政年份:
    2011
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Discovery Grants Program - Individual

相似国自然基金

Computational Methods for Analyzing Toponome Data
  • 批准号:
    60601030
  • 批准年份:
    2006
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

IMR: MM-1C: Enabling Continual Passive Estimation of Performance of Internet Transfers: Online Measurement and Classification Methods
IMR:MM-1C:实现互联网传输性能的持续被动估计:在线测量和分类方法
  • 批准号:
    2319511
  • 财政年份:
    2023
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Standard Grant
Classification of learners based on explicit and implicit shyness and an examination of appropriate learning environments and support methods
根据显性和隐性害羞对学习者进行分类,并检查适当的学习环境和支持方法
  • 批准号:
    23K02874
  • 财政年份:
    2023
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Classification and generation methods for socially acceptable robot voice
社会可接受的机器人语音的分类和生成方法
  • 批准号:
    573710-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 1.31万
  • 项目类别:
    University Undergraduate Student Research Awards
Collaborative Research: Development of Classification Theory and Methods for Objective Asymmetry, Sample Size Limitation, Labeling Ambiguity, and Feature Importance
合作研究:针对客观不对称性、样本量限制、标签歧义和特征重要性的分类理论和方法的发展
  • 批准号:
    2113500
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Standard Grant
Classification of late-onset psychosis and verification of effective treatment methods
迟发性精神病的分类及有效治疗方法的验证
  • 批准号:
    21K15730
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Collaborative Research: Development of Classification Theory and Methods for Objective Asymmetry, Sample Size Limitation, Labeling Ambiguity, and Feature Importance
合作研究:针对客观不对称性、样本量限制、标签歧义和特征重要性的分类理论和方法的发展
  • 批准号:
    2113754
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Standard Grant
Building evidence for adopting a new disease classification system in Canadian primary care settings: a mixed methods feasibility study
为在加拿大初级保健机构采用新的疾病分类系统建立证据:混合方法可行性研究
  • 批准号:
    451333
  • 财政年份:
    2021
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Operating Grants
Machine Learning Methods to Re-annotate Histone Modifications with Locus-specific Functional Classification
使用位点特异性功能分类重新注释组蛋白修饰的机器学习方法
  • 批准号:
    MR/T022620/1
  • 财政年份:
    2020
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Fellowship
Classification methods taking account clarity of characteristics in biological sounds.
考虑生物声音特征清晰度的分类方法。
  • 批准号:
    20K12045
  • 财政年份:
    2020
  • 资助金额:
    $ 1.31万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了