权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Bayesian Variable Selection in Generalized Linear Models with Missing Varibles

缺失变量的广义线性模型中的贝叶斯变量选择

基本信息

批准号：
8194802
负责人：
XIAOWEI YANG
金额：
$ 19.27万
依托单位：
UNIVERSITY OF CALIFORNIA AT DAVIS
依托单位国家：
美国
项目类别：
财政年份：
2011
资助国家：
美国
起止时间：
2011-08-11 至 2014-04-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8194802
关键词：
Address Algorithms Archives Autistic Disorder Behavioral Benefits and Risks Biomedical Computing Biomedical Research Biomedical Technology Caring Childhood Clinical Clinical Trials Complex Computer software Data Data Analyses Data Set Development Dropout Drug Addiction Effectiveness Environmental Risk Factor Face Gene Expression Generic Drugs Genetic Guidelines Individual Libraries Linear Models Linear Regressions Markov Chains Measures Medical Research Medicine Methods Modeling Outcome Patients Performance Pharmacotherapy Phenotype Preventive Procedures Process Proteomics Resort Safety Scientist Simulate Solutions Structure Testing Time analytical tool base comparative effectiveness cytokine design effectiveness research flexibility health care delivery patient oriented rapid growth response smoking cessation software development therapeutic effectiveness

项目摘要

DESCRIPTION (provided by applicant): The applicant seeks to address the problem of missing values A major challenge for biomedical research comes from the problems of missing values, which may be caused by subjective (e.g., nonresponse and dropout) and technical reasons (e.g., censoring over/below quantization level). Generalized linear models (GLMs) and Generalized Linear Mixed Models (GLMMs) are popularly applied in biomedical data analysis where a fundamental task is to identify a subset of independent variables (e.g., genetic, proteomic, behavioral, or environmental factors) to interpret or predict a dependent variable (e.g., therapeutic effectiveness and safety). Given an incomplete data set, practitioners may needlessly resort to the strategy of case-deletion where individuals are excluded from consideration if they miss any of the variables targeted for analysis. This method would not only sacrifice useful information, but also give rise to biased estimates because it requires strong assumptions to accept the missingness mechanisms. A more satisfactory solution for missing data problems involves multiple imputation, where several imputations are created for the same set of missing values. Across multiply imputed data sets, however, traditional variable selection methods (based on significance tests or likelihood criteria) often result in models with different selected predictors, thus presenting a problem of combining the models to make final inferences. In this R01 proposal, we aim to develop alternative strategies of variable selection for GLMs with missing values by drawing on a Bayesian framework. One approach called "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. The second strategy - "simultaneously impute and select" (SIAS) - conducts Bayesian variable selection and missing data imputation simultaneously within one Markov Chain Monte Carlo (MCMC) process. ITS and SIAS offer two generic frameworks within which various Bayesian variable selection algorithms and missing data imputation algorithms can be implemented. The strategies will be extended to handle complex data sets such as those with multi-level design structures and/or large number of variables. The strategies will be developed, evaluated, and implemented into an R library for normal, binomial/multinomial, and Poisson regression models with mixed categorical and continuous explanatory variables. Simulated and practical data sets from studies on childhood autism and drug dependence will be used to address the effectiveness and flexibility of the proposed strategies. PUBLIC HEALTH RELEVANCE: Missing data is the normal circumstance when developing large data sets. This issue comes to the forefront when using large data sets to develop personalized and individualized care. To avoid this loss of data and provide better predictions of risk and benefit, imputation-based Bayesian variable selection strategy provides a powerful analytical tool. The availability of our new method and software package will greatly enhance the capacity and quality of medical research and healthcare delivery

描述（由申请人提供）：申请人寻求解决缺失值的问题生物医学研究的一个主要挑战来自缺失值的问题，这可能是由主观（例如，无应答和辍学）和技术原因（例如，在量化级之上/之下删失）。广义线性模型（GLM）和广义线性混合模型（GLM）广泛应用于生物医学数据分析，其中基本任务是识别自变量的子集（例如，遗传、蛋白质组、行为或环境因素）来解释或预测因变量（例如，治疗效果和安全性）。鉴于数据集不完整，从业人员可能不必要地采取删除案例的策略，即如果个人错过了分析的任何目标变量，则将其排除在考虑之外。这种方法不仅会牺牲有用的信息，而且会导致有偏估计，因为它需要强假设来接受缺失机制。缺失数据问题的一个更令人满意的解决方案涉及多重插补，即为同一组缺失值创建多个插补。然而，在多重插补数据集上，传统的变量选择方法（基于显著性检验或似然标准）通常会导致模型具有不同的选定预测因子，从而提出了将模型组合以进行最终推断的问题。在这个R 01提案中，我们的目标是通过贝叶斯框架来开发具有缺失值的GLM的变量选择的替代策略。一种称为“插补，然后选择”（ITS）的方法涉及首先进行多重插补，然后将贝叶斯变量选择应用于多重插补数据集。第二种策略--“同时估算和选择”（SIAS）--在一个马尔科夫链蒙特卡罗（MCMC）过程中同时进行贝叶斯变量选择和缺失数据估算。ITS和SIAS提供了两个通用的框架，在其中可以实现各种贝叶斯变量选择算法和缺失数据填补算法。这些策略将扩展到处理复杂的数据集，例如具有多级设计结构和/或大量变量的数据集。将开发、评价这些策略，并将其应用到R库中，用于正态、二项/多项和泊松回归模型（具有混合分类和连续解释变量）。将使用儿童自闭症和药物依赖研究的模拟和实际数据集来评估拟议战略的有效性和灵活性。公共卫生相关性：在开发大型数据集时，缺失数据是正常情况。当使用大型数据集来开发个性化和个体化护理时，这个问题就成为最重要的问题。为了避免这种数据丢失，并提供更好的风险和获益预测，基于插补的贝叶斯变量选择策略提供了一个强大的分析工具。我们的新方法和软件包的可用性将大大提高医学研究和医疗保健服务的能力和质量