Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
基本信息
- 批准号:8194802
- 负责人:
- 金额:$ 19.27万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2011
- 资助国家:美国
- 起止时间:2011-08-11 至 2014-04-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsArchivesAutistic DisorderBehavioralBenefits and RisksBiomedical ComputingBiomedical ResearchBiomedical TechnologyCaringChildhoodClinicalClinical TrialsComplexComputer softwareDataData AnalysesData SetDevelopmentDropoutDrug AddictionEffectivenessEnvironmental Risk FactorFaceGene ExpressionGeneric DrugsGeneticGuidelinesIndividualLibrariesLinear ModelsLinear RegressionsMarkov ChainsMeasuresMedical ResearchMedicineMethodsModelingOutcomePatientsPerformancePharmacotherapyPhenotypePreventiveProceduresProcessProteomicsResortSafetyScientistSimulateSolutionsStructureTestingTimeanalytical toolbasecomparative effectivenesscytokinedesigneffectiveness researchflexibilityhealth care deliverypatient orientedrapid growthresponsesmoking cessationsoftware developmenttherapeutic effectiveness
项目摘要
DESCRIPTION (provided by applicant): The applicant seeks to address the problem of missing values A major challenge for biomedical research comes from the problems of missing values, which may be caused by subjective (e.g., nonresponse and dropout) and technical reasons (e.g., censoring over/below quantization level). Generalized linear models (GLMs) and Generalized Linear Mixed Models (GLMMs) are popularly applied in biomedical data analysis where a fundamental task is to identify a subset of independent variables (e.g., genetic, proteomic, behavioral, or environmental factors) to interpret or predict a dependent variable (e.g., therapeutic effectiveness and safety). Given an incomplete data set, practitioners may needlessly resort to the strategy of case-deletion where individuals are excluded from consideration if they miss any of the variables targeted for analysis. This method would not only sacrifice useful information, but also give rise to biased estimates because it requires strong assumptions to accept the missingness mechanisms. A more satisfactory solution for missing data problems involves multiple imputation, where several imputations are created for the same set of missing values. Across multiply imputed data sets, however, traditional variable selection methods (based on significance tests or likelihood criteria) often result in models with different selected predictors, thus presenting a problem of combining the models to make final inferences. In this R01 proposal, we aim to develop alternative strategies of variable selection for GLMs with missing values by drawing on a Bayesian framework. One approach called "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. The second strategy - "simultaneously impute and select" (SIAS) - conducts Bayesian variable selection and missing data imputation simultaneously within one Markov Chain Monte Carlo (MCMC) process. ITS and SIAS offer two generic frameworks within which various Bayesian variable selection algorithms and missing data imputation algorithms can be implemented. The strategies will be extended to handle complex data sets such as those with multi-level design structures and/or large number of variables. The strategies will be developed, evaluated, and implemented into an R library for normal, binomial/multinomial, and Poisson regression models with mixed categorical and continuous explanatory variables. Simulated and practical data sets from studies on childhood autism and drug dependence will be used to address the effectiveness and flexibility of the proposed strategies.
PUBLIC HEALTH RELEVANCE: Missing data is the normal circumstance when developing large data sets. This issue comes to the forefront when using large data sets to develop personalized and individualized care. To avoid this loss of data and provide better predictions of risk and benefit, imputation-based Bayesian variable selection strategy provides a powerful analytical tool. The availability of our new method and software package will greatly enhance the capacity and quality of medical research and healthcare delivery
描述(由申请人提供):申请人寻求解决缺失值的问题生物医学研究的一个主要挑战来自缺失值的问题,这可能是由主观(例如,无应答和辍学)和技术原因(例如,在量化级之上/之下删失)。广义线性模型(GLM)和广义线性混合模型(GLM)广泛应用于生物医学数据分析,其中基本任务是识别自变量的子集(例如,遗传、蛋白质组、行为或环境因素)来解释或预测因变量(例如,治疗效果和安全性)。鉴于数据集不完整,从业人员可能不必要地采取删除案例的策略,即如果个人错过了分析的任何目标变量,则将其排除在考虑之外。这种方法不仅会牺牲有用的信息,而且会导致有偏估计,因为它需要强假设来接受缺失机制。缺失数据问题的一个更令人满意的解决方案涉及多重插补,即为同一组缺失值创建多个插补。然而,在多重插补数据集上,传统的变量选择方法(基于显著性检验或似然标准)通常会导致模型具有不同的选定预测因子,从而提出了将模型组合以进行最终推断的问题。 在这个R 01提案中,我们的目标是通过贝叶斯框架来开发具有缺失值的GLM的变量选择的替代策略。一种称为“插补,然后选择”(ITS)的方法涉及首先进行多重插补,然后将贝叶斯变量选择应用于多重插补数据集。第二种策略--“同时估算和选择”(SIAS)--在一个马尔科夫链蒙特卡罗(MCMC)过程中同时进行贝叶斯变量选择和缺失数据估算。ITS和SIAS提供了两个通用的框架,在其中可以实现各种贝叶斯变量选择算法和缺失数据填补算法。这些策略将扩展到处理复杂的数据集,例如具有多级设计结构和/或大量变量的数据集。将开发、评价这些策略,并将其应用到R库中,用于正态、二项/多项和泊松回归模型(具有混合分类和连续解释变量)。将使用儿童自闭症和药物依赖研究的模拟和实际数据集来评估拟议战略的有效性和灵活性。
公共卫生相关性:在开发大型数据集时,缺失数据是正常情况。当使用大型数据集来开发个性化和个体化护理时,这个问题就成为最重要的问题。为了避免这种数据丢失,并提供更好的风险和获益预测,基于插补的贝叶斯变量选择策略提供了一个强大的分析工具。我们的新方法和软件包的可用性将大大提高医学研究和医疗保健服务的能力和质量
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
XIAOWEI YANG其他文献
XIAOWEI YANG的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('XIAOWEI YANG', 18)}}的其他基金
Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
- 批准号:
8471550 - 财政年份:2011
- 资助金额:
$ 19.27万 - 项目类别:
Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
- 批准号:
8317303 - 财政年份:2011
- 资助金额:
$ 19.27万 - 项目类别:
Bayesian Variable Selection in Generalized Linear Models with Missing Varibles
缺失变量的广义线性模型中的贝叶斯变量选择
- 批准号:
8543193 - 财政年份:2011
- 资助金额:
$ 19.27万 - 项目类别:
iPhone-based Real-time Data Solution for Drug Abuse and Other Medical Research
基于 iPhone 的药物滥用和其他医学研究实时数据解决方案
- 批准号:
7672825 - 财政年份:2009
- 资助金额:
$ 19.27万 - 项目类别:
Transition Model for Incomplete Longitudinal Binary Data
不完整纵向二进制数据的转换模型
- 批准号:
6676189 - 财政年份:2003
- 资助金额:
$ 19.27万 - 项目类别:
DEVELOPMENT OF AN AUTOMATED NEURAL SPIKE DISCRIMINATOR
自动神经尖峰鉴别器的开发
- 批准号:
3504570 - 财政年份:1991
- 资助金额:
$ 19.27万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 19.27万 - 项目类别:
Continuing Grant