权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Methodology for Reliable Risk Assessment with Error-prone Electronic Medical Records Using Optimal Design of Experiments Concepts

使用实验概念优化设计对容易出错的电子病历进行可靠风险评估的方法

基本信息

批准号：
1436574
负责人：
Daniel Apley
金额：
$ 40万
依托单位：
Northwestern University at Chicago
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-09-01 至 2018-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1436574&HistoricalAwards=false
关键词：
Methodology Reliable Risk Assessment Error

项目摘要

Enormous healthcare resources are devoted to compiling electronic medical record (EMR) databases that are increasingly integrated and rich in patient population and that offer potential for identifying disease risk factors via statistical analyses to predict the disease risk as a function of various factors (e.g., clinical and demographic) for that patient. Unfortunately, the disease event data may have high miscoding error rates, due to the fact that clerical personnel with limited training are employed to enter their codes. For example, in one EMR database of patients with cardiac workup, after reviewing a random sample of cases recorded as sudden cardiac arrest events, the error rate was found to be 75 percent. In order to take such errors into account and avoid developing unreliable risk assessment models, it is imperative that a doctor perform chart reviews to validate a sample of cases and determine whether the events were true events. However, the number of chart reviews is limited due to the high cost of doctors' time. The objective of this research is to develop a methodology for judiciously and efficiently selecting validation cases for maximum information content, which will allow reliable disease risk assessment even with highly error-prone EMR data. The anticipated benefits to the health and well-being of society are substantial, as this research will allow the enormous untapped potential of large EMR databases to be more fully utilized for discovering new disease risk factors. It is also anticipated that this research can be extended to other big-data application domains for extracting reliable information from large quantities of data that are of questionable quality.Large electronic medical record (EMR) databases offer potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that predict the likelihood that a patient develops a particular condition as a function of various predictor variables (e.g., clinical, phenotypical, and demographic data) for that patient. Although the predictor variable data are often recorded reliably, the event data may have high error rates due to ICD-9 disease miscoding. To avoid developing unreliable risk assessment models, previous research used random validation sampling to estimate error probabilities for correcting biases in logistic regression models fit to the entire data, which is both inefficient and unreliable with high error rates. In contrast, this research will develop a validation sampling and reliable risk assessment (VSRRA) methodology for judiciously designing a validation sample. The intellectual underpinning is the observed analogy between VSRRA and traditional design of experiments (DOE), whereby validating the response for one error-prone case in VSRRA corresponds to conducting one experimental run in DOE. In light of this analogy, this research will develop (i) suitable VSRRA design criteria based on the Fisher information matrix for the model parameters and Bayesian counterparts such as posterior and preposterior parameter covariance matrices, applicable to a broad class of generalized linear models commonly used in medical risk studies; (ii) heuristic and more exact hybrid algorithms for selecting the validation sample to optimize the design criteria; (iii) multistage, sequential versions of the VSRRA sampling strategies that refine the designs based on information that is learned along the way, as new cases are validated; and (iv) methods that determine whether and how the full set of unvalidated data can be reliably included, along with the validated data, in the final model fitting. A fundamental tenet of data analysis is that carefully designed experimental studies produce far more reliable statistical conclusions than observational studies. Likewise, it is anticipated that the DOE-based VSRRA methodology will allow far more reliable disease risk assessment and hypotheses generation.

大量的医疗资源致力于汇编电子病历(EMR)数据库，这些数据库在患者群体中日益整合和丰富，并提供通过统计分析识别疾病风险因素的可能性，以预测该患者的各种因素(例如，临床和人口统计)的疾病风险。不幸的是，疾病事件数据可能有很高的误码率，这是因为雇用了受过有限培训的文书人员来输入他们的代码。例如，在一个心脏检查患者的EMR数据库中，在审查了记录为心脏骤停事件的随机样本后，发现错误率为75%。为了将这些错误考虑在内，避免开发不可靠的风险评估模型，医生必须执行图表审查，以验证病例样本并确定事件是否为真实事件。然而，由于医生的时间成本很高，图表审查的次数有限。这项研究的目的是开发一种方法，以明智和有效地选择验证案例，以获得最大的信息量，这将允许可靠的疾病风险评估，即使在高度容易出错的EMR数据。对社会健康和福祉的预期好处是巨大的，因为这项研究将使大型电子病历数据库的巨大未开发潜力得到更充分的利用，以发现新的疾病风险因素。这项研究还可以扩展到其他大数据应用领域，以便从质量有问题的大量数据中提取可靠的信息。大型电子病历(EMR)数据库通过拟合统计模型来提供开发临床假设和识别疾病风险关联的潜力，这些统计模型预测患者发生特定疾病的可能性作为该患者的各种预测变量(例如，临床、表型和人口统计数据)的函数。虽然预测变量数据通常被可靠地记录，但由于ICD-9疾病错误编码，事件数据可能具有高错误率。为了避免开发不可靠的风险评估模型，以往的研究使用随机验证抽样来估计误差概率，以修正适用于整个数据的Logistic回归模型中的偏差，这种方法效率低，错误率高，不可靠。相反，这项研究将开发一种验证抽样和可靠风险评估(VSRRA)方法，以明智地设计验证样本。智能基础是VSRRA和传统实验设计(DOE)之间的观察类比，即验证VSRRA中一个容易出错的情况的响应相当于在DOE中进行一次实验运行。根据这种类比，本研究将开发(I)基于模型参数的Fisher信息矩阵和贝叶斯对应的适当的VSRRA设计准则，例如后验和后验参数协方差矩阵，适用于医疗风险研究中常用的广泛类别的广义线性模型；(Ii)用于选择验证样本的启发式和更精确的混合算法，以优化设计准则；(Iii)VSRRA抽样策略的多阶段、顺序版本，其基于在一路上学习的信息来改进设计，因为新的病例被验证；以及(Iv)确定是否以及如何在最终模型拟合中可靠地包括未验证数据的全集以及验证数据的方法。数据分析的一个基本原则是，精心设计的实验研究比观察性研究产生的统计结论可靠得多。同样，预计基于DOE的VSRRA方法将允许进行更可靠的疾病风险评估和假设生成。