A Methodology for Reliable Risk Assessment with Error-prone Electronic Medical Records Using Optimal Design of Experiments Concepts
使用实验概念优化设计对容易出错的电子病历进行可靠风险评估的方法
基本信息
- 批准号:1436574
- 负责人:
- 金额:$ 40万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-09-01 至 2018-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Enormous healthcare resources are devoted to compiling electronic medical record (EMR) databases that are increasingly integrated and rich in patient population and that offer potential for identifying disease risk factors via statistical analyses to predict the disease risk as a function of various factors (e.g., clinical and demographic) for that patient. Unfortunately, the disease event data may have high miscoding error rates, due to the fact that clerical personnel with limited training are employed to enter their codes. For example, in one EMR database of patients with cardiac workup, after reviewing a random sample of cases recorded as sudden cardiac arrest events, the error rate was found to be 75 percent. In order to take such errors into account and avoid developing unreliable risk assessment models, it is imperative that a doctor perform chart reviews to validate a sample of cases and determine whether the events were true events. However, the number of chart reviews is limited due to the high cost of doctors' time. The objective of this research is to develop a methodology for judiciously and efficiently selecting validation cases for maximum information content, which will allow reliable disease risk assessment even with highly error-prone EMR data. The anticipated benefits to the health and well-being of society are substantial, as this research will allow the enormous untapped potential of large EMR databases to be more fully utilized for discovering new disease risk factors. It is also anticipated that this research can be extended to other big-data application domains for extracting reliable information from large quantities of data that are of questionable quality.Large electronic medical record (EMR) databases offer potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that predict the likelihood that a patient develops a particular condition as a function of various predictor variables (e.g., clinical, phenotypical, and demographic data) for that patient. Although the predictor variable data are often recorded reliably, the event data may have high error rates due to ICD-9 disease miscoding. To avoid developing unreliable risk assessment models, previous research used random validation sampling to estimate error probabilities for correcting biases in logistic regression models fit to the entire data, which is both inefficient and unreliable with high error rates. In contrast, this research will develop a validation sampling and reliable risk assessment (VSRRA) methodology for judiciously designing a validation sample. The intellectual underpinning is the observed analogy between VSRRA and traditional design of experiments (DOE), whereby validating the response for one error-prone case in VSRRA corresponds to conducting one experimental run in DOE. In light of this analogy, this research will develop (i) suitable VSRRA design criteria based on the Fisher information matrix for the model parameters and Bayesian counterparts such as posterior and preposterior parameter covariance matrices, applicable to a broad class of generalized linear models commonly used in medical risk studies; (ii) heuristic and more exact hybrid algorithms for selecting the validation sample to optimize the design criteria; (iii) multistage, sequential versions of the VSRRA sampling strategies that refine the designs based on information that is learned along the way, as new cases are validated; and (iv) methods that determine whether and how the full set of unvalidated data can be reliably included, along with the validated data, in the final model fitting. A fundamental tenet of data analysis is that carefully designed experimental studies produce far more reliable statistical conclusions than observational studies. Likewise, it is anticipated that the DOE-based VSRRA methodology will allow far more reliable disease risk assessment and hypotheses generation.
大量的医疗资源致力于汇编电子病历(EMR)数据库,这些数据库在患者群体中日益整合和丰富,并提供通过统计分析识别疾病风险因素的可能性,以预测该患者的各种因素(例如,临床和人口统计)的疾病风险。不幸的是,疾病事件数据可能有很高的误码率,这是因为雇用了受过有限培训的文书人员来输入他们的代码。例如,在一个心脏检查患者的EMR数据库中,在审查了记录为心脏骤停事件的随机样本后,发现错误率为75%。为了将这些错误考虑在内,避免开发不可靠的风险评估模型,医生必须执行图表审查,以验证病例样本并确定事件是否为真实事件。然而,由于医生的时间成本很高,图表审查的次数有限。这项研究的目的是开发一种方法,以明智和有效地选择验证案例,以获得最大的信息量,这将允许可靠的疾病风险评估,即使在高度容易出错的EMR数据。对社会健康和福祉的预期好处是巨大的,因为这项研究将使大型电子病历数据库的巨大未开发潜力得到更充分的利用,以发现新的疾病风险因素。这项研究还可以扩展到其他大数据应用领域,以便从质量有问题的大量数据中提取可靠的信息。大型电子病历(EMR)数据库通过拟合统计模型来提供开发临床假设和识别疾病风险关联的潜力,这些统计模型预测患者发生特定疾病的可能性作为该患者的各种预测变量(例如,临床、表型和人口统计数据)的函数。虽然预测变量数据通常被可靠地记录,但由于ICD-9疾病错误编码,事件数据可能具有高错误率。为了避免开发不可靠的风险评估模型,以往的研究使用随机验证抽样来估计误差概率,以修正适用于整个数据的Logistic回归模型中的偏差,这种方法效率低,错误率高,不可靠。相反,这项研究将开发一种验证抽样和可靠风险评估(VSRRA)方法,以明智地设计验证样本。智能基础是VSRRA和传统实验设计(DOE)之间的观察类比,即验证VSRRA中一个容易出错的情况的响应相当于在DOE中进行一次实验运行。根据这种类比,本研究将开发(I)基于模型参数的Fisher信息矩阵和贝叶斯对应的适当的VSRRA设计准则,例如后验和后验参数协方差矩阵,适用于医疗风险研究中常用的广泛类别的广义线性模型;(Ii)用于选择验证样本的启发式和更精确的混合算法,以优化设计准则;(Iii)VSRRA抽样策略的多阶段、顺序版本,其基于在一路上学习的信息来改进设计,因为新的病例被验证;以及(Iv)确定是否以及如何在最终模型拟合中可靠地包括未验证数据的全集以及验证数据的方法。数据分析的一个基本原则是,精心设计的实验研究比观察性研究产生的统计结论可靠得多。同样,预计基于DOE的VSRRA方法将允许进行更可靠的疾病风险评估和假设生成。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Daniel Apley其他文献
Daniel Apley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Daniel Apley', 18)}}的其他基金
Collaborative Research: Model-Based Multidisciplinary Dynamic Decisions in Design
协作研究:设计中基于模型的多学科动态决策
- 批准号:
1537641 - 财政年份:2015
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Collaborative Research: Leveraging Noncontact Dimensional Metrology to Understand Complex Part-to-Part Variation
合作研究:利用非接触式尺寸计量来理解复杂的零件间差异
- 批准号:
1265709 - 财政年份:2013
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Enhancing Identifiability of Computer Simulation Models via Design for Calibration
通过校准设计增强计算机仿真模型的可识别性
- 批准号:
1233403 - 财政年份:2012
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Collaborative Research: Blind Discovery of Variation Sources for Visualization by Multidisciplinary Teams
协作研究:多学科团队盲目发现可视化变异源
- 批准号:
0826081 - 财政年份:2008
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
A Bayesian Treatment of Uncertainty in Simulation-Based Methods for Enhancing Process and Product Robustness
贝叶斯处理基于仿真的方法中的不确定性,以增强过程和产品的鲁棒性
- 批准号:
0758557 - 财政年份:2008
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CAREER: A Methodology to Systematically Characterize and Diagnose Manufacturing Variation with In-Process Measurement Data
职业生涯:一种利用过程中测量数据系统地表征和诊断制造偏差的方法
- 批准号:
0354824 - 财政年份:2003
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
CAREER: A Methodology to Systematically Characterize and Diagnose Manufacturing Variation with In-Process Measurement Data
职业生涯:一种利用过程中测量数据系统地表征和诊断制造偏差的方法
- 批准号:
0093580 - 财政年份:2001
- 资助金额:
$ 40万 - 项目类别:
Continuing Grant
相似海外基金
CRII: RI: Deep neural network pruning for fast and reliable visual detection in self-driving vehicles
CRII:RI:深度神经网络修剪,用于自动驾驶车辆中快速可靠的视觉检测
- 批准号:
2412285 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Enabling Reliable Testing Of SMLM Datasets
实现 SMLM 数据集的可靠测试
- 批准号:
BB/X01858X/1 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Research Grant
RITA: Reliable and Efficient Task Management in Edge Computing for AIoT Systems
RITA:AIoT 系统边缘计算中可靠、高效的任务管理
- 批准号:
EP/Y015886/1 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Fellowship
A Novel Contour-based Machine Learning Tool for Reliable Brain Tumour Resection (ContourBrain)
一种基于轮廓的新型机器学习工具,用于可靠的脑肿瘤切除(ContourBrain)
- 批准号:
EP/Y021614/1 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Research Grant
CAREER: Graded and Reliable Aerosol Deposition for Electronics (GRADE): Understanding Multi-Material Aerosol Jet Printing with In-Line Mixing
职业:电子产品的分级且可靠的气溶胶沉积 (GRADE):了解通过在线混合进行多材料气溶胶喷射打印
- 批准号:
2336356 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
STTR Phase I: A Reliable and Efficient New Method for Satellite Attitude Control
STTR第一阶段:可靠、高效的卫星姿态控制新方法
- 批准号:
2310323 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Standard Grant
Towards an Explainable, Efficient, and Reliable Federated Learning Framework: A Solution for Data Heterogeneity
迈向可解释、高效、可靠的联邦学习框架:数据异构性的解决方案
- 批准号:
24K20848 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
SPARQ(s) - Scalable, Precise, And Reliable positioning of color centers for Quantum computing and simulation
SPARQ(s) - 用于量子计算和模拟的可扩展、精确且可靠的色心定位
- 批准号:
10078083 - 财政年份:2024
- 资助金额:
$ 40万 - 项目类别:
Collaborative R&D