权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Methods for Epidemiology Studies

流行病学研究方法

基本信息

批准号：
8565443
负责人：
Nilanjan Chatterjee
金额：
$ 323.27万
依托单位：
DIVISION OF CANCER EPIDEMIOLOGY AND GENETICS
依托单位国家：
美国
项目类别：
财政年份：
资助国家：
美国
起止时间：
至
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/8565443
关键词：
Accounting Agreement Architecture Biometry Breast Breast Cancer Detection Breast Cancer Education Bronchi Case-Control Studies Categories Cervical Cohort Studies Colon Complex Computer software Computers Confounding Factors (Epidemiology)Consumption Data Data Set Development Diagnostic tests Dimensions Disease Dose Environment Environmental Exposure Epidemiologic Methods Epidemiologic Studies Evaluation Future Gene Frequency General Population Genes Genetic Genomics Human Papillomavirus Incidence Individual Investigation Linear Models Linear Regressions Logistic Regressions Lung Malignant Neoplasms Measures Methodology Methods Mission Modeling Nature Odds Ratio Outcome Outcome Measure Pap smear Pattern Performance Pleura Population Predisposition Prevention Rectum Reporting Research Residual state Risk Sample Size Sampling Scanning Screening for cancer Screening procedure Specimen Statistical Methods Structure Surveys Techniques Test Result Testing Time Trachea Validation Variant Woman base cohort cost design diagnostic accuracy disease diagnosis disorder risk epidemiology study follow-up gene environment interaction genetic association genome wide association study genome-wide high risk improved member novel novel diagnostics prognostic repository response simulation

项目摘要

Investigations have been conducted for using data from current genome-wide association studies to assess genetic architecture of cancer and likely yield of future genome-wide association studies. One project explored distribution of allele frequencies and effect-size and their interrelationships for common susceptibility SNPs using discoveries from existing genome-wide association. It used novel methods to correct for bias as variants with larger effect-sizes are currently over-represented due to their larger statistical power for discovery. The analysis identified several intriguing patterns that can have implications for design and analysis of future genetic association studies. A second project explored potential utility of future discoveries from larger genome-wide association studies for building risk-prediction models that can be potentially utilized for targeting high-risk groups for cancer screening. It was found that although many discoveries are expected from future genome-wide association studies, risk-prediction models based only on discovered SNPs are unlike to identify a small portion of the population that would give rise to the large majority of the future cases. Several projects involved development of statistical methods for exploring gene-gene and gene-environment interactions using data from genome-wide association studies. A new method was developed for modeling interaction of an environmental exposure with multiple SNPs within a genomic region using a Bayesian latent variable modeling approach. Another method exploited an assumption of gene-environment independence in the underlying population to improve the power for the test for gene-environment interaction on the absolute risk of a disease from case-control studies. Another report investigated power for various alternative methods for conducting genome-wide interaction scans using simulation studies. General statistical methods Several studies have been conducted to evaluate efficient design and analysis strategies for epidemiologic studies that use complex sampling designs. One study focuses on the efficient usage of specimen repositories for the evaluation of new diagnostic tests and for comparing new tests with existing tests. Typically, all pre-existing diagnostic tests will already have been conducted on all specimens. It was proposed that retesting only a judicious subsample of the specimens by the new diagnostic test could minimizes study costs and specimen consumption, yet estimates of agreement or diagnostic accuracy potentially retain adequate statistical efficiency. Another project explore efficient analysis method for case-cohort designs that select a random sample of a cohort to be used as control with cases arising from the follow-up of the cohort. Analyses of case-cohort studies with time-varying exposures that use Cox partial likelihood methods can be computer intensive. A new computationally simple method has been developed using piecewise-exponential approach where Poisson regression model parameters are estimated from a pseudo-likelihood and the corresponding variances are derived by applying the corresponding variances are derived by applying Taylor linearization methods that are used in survey research. Several studies have involved development of regression models in a setting that involve potentially a large number of predictor variables. A Bayesian variable selection method has been developed in a setting where the number of independent variables or predictors in a particular dataset is much larger than the available sample size. While most of the existing methods allow some degree of correlations among predictors but do not consider these correlations for variable selection, the proposed method accounts for correlations among the predictors in variable selection. The method could be applied to continuous, binary, ordinal, and count outcome data. Another method is proposed to combine several predictors (markers) that are measured repeatedly over time into a composite marker score without assuming a model and only requiring a mild condition on the predictor distribution. Assuming that the first and second moments of the predictors can be decomposed into a time and a marker component via a Kronecker product structure that accommodates the longitudinal nature of the predictors, the method uses first-moment sufficient dimension reduction techniques to replace the original markers with linear transformations that contain sufficient information for the regression of the predictors on the outcome. These linear combinations can then be combined into a score that has better predictive performance than a score built under a general model that ignores the longitudinal structure of the data. Our methods can be applied to either continuous or categorical outcome measures. Several studies have developed methodologies related to models for predicting absolute risk of diseases and their applications. One study has developed two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q), is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), namely the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. New methods of inference are developed to compare the PCFs and PNFs of two risk models that are built based on the same validation data. A second project developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, while the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. The method was applied to estimate the absolute five-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern Califronia. The LEXPIT model found an increased risk due to abnormal Pap test in HPV-negative that was not detected with logistic regression. An R package blm was developed to provide free and easy-to-use software for fitting the LEXPIT model.

研究人员利用当前全基因组关联研究的数据来评估癌症的遗传结构和未来全基因组关联研究的可能结果。一个项目利用现有全基因组关联的发现，探索了等位基因频率和效应大小的分布及其与常见易感性snp的相互关系。它使用了新颖的方法来纠正偏差，因为具有较大效应大小的变量目前由于其较大的发现统计能力而被过度代表。该分析确定了几个有趣的模式，这些模式可能对未来遗传关联研究的设计和分析产生影响。第二个项目探索了未来从更大的全基因组关联研究中发现的潜在效用，以建立风险预测模型，这些模型可以潜在地用于针对高危人群进行癌症筛查。研究发现，尽管未来的全基因组关联研究有望有许多发现，但仅基于已发现的snp的风险预测模型无法识别一小部分人群，而这一小部分人群将导致未来的绝大多数病例。一些项目涉及利用全基因组关联研究的数据开发用于探索基因-基因和基因-环境相互作用的统计方法。利用贝叶斯潜变量建模方法，建立了一种新的方法来模拟环境暴露与基因组区域内多个snp的相互作用。另一种方法利用基础人群中基因-环境独立的假设，以提高病例对照研究中基因-环境相互作用对疾病绝对风险的检验能力。另一份报告调查了使用模拟研究进行全基因组相互作用扫描的各种替代方法的能力。一般统计方法已经进行了几项研究，以评估使用复杂抽样设计的流行病学研究的有效设计和分析策略。一项研究的重点是有效利用标本库来评价新的诊断试验并将新试验与现有试验进行比较。通常，所有预先存在的诊断检测都已对所有标本进行过。有人建议，通过新的诊断测试只重新测试样本的一个明智的子样本，可以最大限度地减少研究成本和标本消耗，但估计的一致性或诊断准确性可能保留足够的统计效率。另一个项目探索病例队列设计的有效分析方法，即随机选择一个队列样本作为对照，并在队列随访中产生病例。使用Cox部分似然方法对时变暴露的病例队列研究进行分析可能需要大量计算机。本文提出了一种新的计算简单的方法，即采用分段指数法，利用伪似然估计泊松回归模型参数，并利用调查研究中常用的泰勒线性化方法得到相应的方差，从而得到相应的方差。有几项研究涉及在可能涉及大量预测变量的情况下建立回归模型。在特定数据集中自变量或预测因子的数量远远大于可用样本量的情况下，开发了贝叶斯变量选择方法。虽然大多数现有方法允许预测因子之间存在一定程度的相关性，但在变量选择中没有考虑这些相关性，但本文提出的方法考虑了变量选择中预测因子之间的相关性。该方法适用于连续、二进制、有序和计数结果数据。提出了另一种方法，将几个预测因子（标记）结合起来，随着时间的推移反复测量成一个复合标记得分，而不需要假设一个模型，只需要一个温和的预测因子分布条件。假设预测器的第一和第二矩可以通过适应预测器纵向性质的Kronecker积结构分解为时间和标记分量，该方法使用第一矩充分降维技术用线性变换替换原始标记，线性变换包含足够的信息，用于预测器对结果的回归。然后，这些线性组合可以组合成一个分数，这个分数比在忽略数据纵向结构的一般模型下建立的分数具有更好的预测性能。我们的方法可以应用于连续或分类结果测量。若干研究已发展出与预测疾病绝对风险及其应用的模型有关的方法。一项研究制定了两项标准，以评估预测疾病发病率风险的模型对筛查和预防的有用性，或评估疾病诊断后管理的预后模型的有用性。第一个标准，遵循PCF的病例比例(q)，是指将发病的个体的比例，包括在最高风险人群中个体的比例q中。第二个标准是需要随访的比例，即PNF(p)，即人们需要随访的处于最高风险的一般人群的比例，以便在注定成为病例的人群中有一定比例的人得到随访。提出了一种新的推理方法来比较基于相同验证数据建立的两个风险模型的pcf和pnf。第二个项目开发了一个线性退出回归模型（LEXPIT），将线性和非线性风险效应结合起来，从二元结果的研究中估计绝对风险。LEXPIT是二项线性回归模型和逻辑回归模型的推广。LEXPIT线性项的系数估计调整后的风险差异，而指数非线性项估计剩余优势比。LEXPIT对于风险关联的流行病学研究尤其有用，因为通常需要对多个混杂变量进行调整。该方法用于估计167,171名在北加州Kaiser Permanente接受筛查的妇女患宫颈癌前病变或与不同Pap和人乳头瘤病毒检测结果相关的癌症的绝对五年风险。LEXPIT模型发现，由于hpv阴性的Pap检查异常，风险增加，而逻辑回归未检测到这一点。开发了一个R包blm，以提供免费且易于使用的软件来拟合LEXPIT模型。