权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Privacy-preserving methods and tools for handling missing data in distributed health data networks

用于处理分布式健康数据网络中丢失数据的隐私保护方法和工具

基本信息

批准号：
9364071
负责人：
Qi Long
金额：
$ 59.85万
依托单位：
UNIVERSITY OF PENNSYLVANIA
依托单位国家：
美国
项目类别：
财政年份：
2017
资助国家：
美国
起止时间：
2017-09-08 至 2021-06-30
项目状态：
已结题

项目摘要

PROJECT SUMMARY Distributed health data networks (DHDNs) that leverage electronic health records (EHRs) (e.g., eMerge, pSCANNER, PEDSnet) have drawn substantial interests in recent years, as they a) eliminate the need to create, maintain, and secure access to central data repositories, b) minimize the need to disclose protected health information outside the data-owning entity, and c) mitigate many security, proprietary, legal, and privacy concerns. Missing data are ubiquitous and present analytical challenges in DHDNs. However, very limited research has been conducted to address missing data in such settings. When applying to a distributed environment, the current state-of-the-art approaches for handling missing data require pooling raw data into a central repository before analysis and hence require individual-level data sharing, which may not be feasible for a number of reasons, including institutional policies prohibiting such sharing, high regulatory hurdles, public privacy concerns, and costs/overhead of moving massive amounts of data. A large body of research has demonstrated that given some background information about an individual such as data from EHRs, an adversary can learn (from “de-identified” data) sensitive information about the individual and improper disclosure of individual-level data may have serious implications. The proposed research will address the challenges associated with handling missing data in distributed analysis and fill a crucial methodology gap. We propose the following specific aims: 1) develop privacy-preserving distributed methods for handling missing data in horizontally partitioned data; 2) develop privacy preserving distributed methods for handling missing data in vertically partitioned data; 3) develop a user-friendly toolkit to allow researchers to handle missing data for distributed analysis in health data networks; and 4) evaluate and validate the methods and tool kit using the UCSD obesity patient data prepared for pSCANNER, and data from PEDSnet in addition to simulated data. The proposed approaches will enable using data across multiple sites and will not require pooling patient-level data into a central repository. They can be scaled up to handle massive amounts of data in DHDNs, because the decomposed computation can be parallelized to all participating parties. The results of our study will significantly advance the state-of-the-art in missing data methodology for DHDNs. The privacy-preserving software toolkit will enable researchers to use more complete data in their research by leveraging information from multiple sites without compromising patient privacy, and help lower regulatory and other hurdles for collaboration across multiple institutions and build the public trust. As such, it will encourage more institutions and healthcare systems to become part of a clinical data research network and more patients to participate in clinical studies, which will improve the validity, robustness and generalizability of research findings and offer substantial benefits in areas including, but not limited to, precision medicine and informatics practice.

项目摘要利用电子健康记录（EHR）的分布式健康数据网络（DHDN）（例如，eMerge， pSCANNER，PEDSnet）近年来引起了极大的兴趣，因为它们a）消除了创建、维护和保护对中央数据存储库的访问，B）最大限度地减少披露受保护的数据拥有实体之外的健康信息，以及c）减轻许多安全性、专有性、法律的和隐私性问题缺失数据是普遍存在的，并在DHDN中提出了分析挑战。然而，非常有限为解决这种情况下的数据缺失问题进行了研究。当应用于分布式环境中，当前处理缺失数据的最先进方法需要将原始数据汇集到因此需要个人层面的数据共享，这可能不可行由于许多原因，包括禁止这种共享的机构政策，高监管障碍，公共隐私问题以及移动大量数据的成本/开销。大量的研究表明，证明了给定一些关于个人的背景信息，例如来自EHR的数据，对手可以（从“去识别”数据）了解有关个人的敏感信息，披露个人数据可能会产生严重影响。拟议的研究将解决解决了分布式分析中处理缺失数据的难题，填补了关键的方法学空白。我们提出了以下具体目标：1）开发隐私保护的分布式方法来处理丢失水平分区数据中的数据; 2）开发隐私保护分布式方法来处理丢失数据垂直分区数据中的数据; 3）开发一个用户友好的工具包，使研究人员能够处理丢失的数据用于卫生数据网络中的分布式分析;以及4）使用为pSCANNER准备的UCSD肥胖患者数据，以及来自PEDSnet的数据和模拟数据。所提出的方法将能够使用多个研究中心的数据，并且不需要合并患者水平数据到中央存储库。它们可以扩展以处理DHDN中的大量数据，因为分解的计算可以被并行化到所有参与方。我们的研究结果将显着推进DHDN缺失数据方法的最新技术水平。隐私保护软件工具包将使研究人员能够利用信息，在研究中使用更完整的数据从多个网站，而不损害病人的隐私，并有助于降低监管和其他障碍，跨机构合作，建立公众信任。因此，它将鼓励更多的机构和医疗保健系统成为临床数据研究网络的一部分，临床研究，这将提高有效性，鲁棒性和研究结果的普遍性，并提供在包括但不限于精准医学和信息学实践等领域的重大利益。