权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Privacy Challenges of Genomic Data-Sharing Beacons and Solutions

基因组数据共享信标和解决方案的隐私挑战

基本信息

批准号：
10443776
负责人：
Erman Ayday
金额：
$ 30.19万
依托单位：
CASE WESTERN RESERVE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2020
资助国家：
美国
起止时间：
2020-08-01 至 2024-07-31
项目状态：
已结题

项目摘要

Abstract. Availability of very large genomic datasets promises a revolution in medicine. However, it has been shown that it is not straightforward to ensure anonymity of the participants in such datasets. Sharing data in a privacy-preserving way stands as a major bottleneck in front of the medical progress. Recently, a community-driven protocol has been widely adopted for sharing genomic data. So called “genomic data-sharing beacon protocol” aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. Previously deemed robust against privacy threats, beacon protocol was recently shown to be vulnerable against membership inference attacks despite its stringent policy. Currently, there is no way to systematically assess beacons' privacy risks for neither the genome donors nor the beacon operators. This cast doubts on usability of beacons from both parties' point of views. Setting up a beacon is risky for beacon operators because of repercussions of possible breaches. Furthermore, for the donors who lack technical background to comprehend the risk, it is often safer to opt-out. Thus, a comprehensive understanding of the system's pitfalls and briefing the genome donors and the beacon operators on potential threats are important issues to overcome to move forward. In this proposal, we aim at (i) detecting and analyzing vulnerabilities of the genomic data-sharing beacons, (ii) providing risk quantification tools for both the donors and data owners to inform both parties on possible risks, and (iii) generating countermeasures against these vulnerabilities. We provide extensive preliminary work on possible vulnerabilities of the beacon system and potential countermeasures. For the first time, we will investigate the information leakage due to beacon updates, which will guide beacon admins on when and how to update the content of the beacon. As the second goal, we will design risk quantification algorithms to assess the risk and inform both the genome donors and beacon operators on possible risks of sharing data. This will be the first attempt at helping beacon operators and participants make informed decisions. We project that if this project is realized, beacon system will be transparent in terms of privacy risks, which will reinstate the trustworthiness of the system and increase its usability. This in turn will tear down the borders that stand in the way of sharing genomic data and enable all downstream research that will benefit from larger data sizes. Our final goal is to focus on countermeasures to protect sensitive information. We observe that current approaches fail to protect the privacy of individuals and provide high data utility at the same time. We will implement novel differential privacy and game theory-based techniques to ensure privacy- preserving data sharing with high data utility.

抽象的。超大型基因组数据集的出现预示着医学革命的到来。然而，它已经表明，要确保此类数据集中参与者的匿名性并不是一帆风顺的。以保护隐私的方式共享数据是摆在医学进步面前的一大瓶颈。最近，一种社区驱动的协议被广泛采用来共享基因组数据。所谓的 “基因组数据共享信标协议”旨在提供一种安全、易于实现和标准化的数据共享界面，只允许对特定存在的是/否进行查询数据集中的等位基因。以前被认为对隐私威胁很强大的信标协议最近尽管其严格的政策，但显示出易受成员身份推断攻击。目前，没有办法系统地评估信标对基因组捐赠者和信标操作员。这让人从两党的角度对信标的可用性产生了怀疑。设置对于信标操作员来说，建立信标是有风险的，因为可能会出现违规后果。此外，对于缺乏技术背景以了解风险的捐赠者来说，通常更安全的做法是选择退出。因此，全面了解该系统的缺陷并向基因组捐赠者介绍而信标运营商对潜在威胁的攻克是向前迈进的重要问题。在……里面本方案的目的是：(1)检测和分析基因组数据共享的脆弱性信标，(2)为捐助者和数据拥有者提供风险量化工具，以告知双方关于可能存在的风险，以及(3)针对这些漏洞制定对策。我们就信标系统可能存在的漏洞和潜在风险开展广泛的前期工作对策。我们将首次调查由于信标更新而导致的信息泄露，其将指导信标管理员何时以及如何更新信标的内容。作为第二个目标，我们将设计风险量化算法来评估风险并通知基因组捐助者和信标运营商讨论共享数据可能存在的风险。这将是第一次尝试帮助信标操作员和参与者做出明智的决定。我们预计，如果这个项目实现，信标系统在隐私风险方面将是透明的，这将恢复提高了系统的可用性。反过来，这将拆除阻碍共享基因组数据，并支持将受益于更大数据量的所有下游研究。我们的最终目标是专注于保护敏感信息的对策。我们观察到这股海流这些方法不能在保护个人隐私的同时提供高数据利用率。我们将实施新颖的差异隐私和基于博弈论的技术来确保隐私- 以高数据利用率保护数据共享。