权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Risk Management Framework for Identifiability in Genomics Research

基因组学研究中可识别性的风险管理框架

基本信息

批准号：
9754854
负责人：
Bradley A. Malin
金额：
$ 24.06万
依托单位：
VANDERBILT UNIVERSITY MEDICAL CENTER
依托单位国家：
美国
项目类别：
财政年份：
2012
资助国家：
美国
起止时间：
2012-09-21 至 2021-07-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9754854
关键词：
Academic Medical Centers Address Adopted Agreement Back Behavior Case Study Collection Computerized Medical Record Contracts Control Groups Cost Measures Data Data Aggregation Data Collection Data Security Data Set Databases Detection Foundations Funding Genome Genomic Segment Genomic approach Genomics Grant Health Individual Information Systems Intelligence Internet Investigation Knowledge Lead Length Life Link Measures Medical center Modeling Motivation Names Paper Participant Peer Review Phase Play Policies Policy Maker Precision Medicine Initiative Privacy Probability Process Progress Reports Publications Published Database Publishing Punishment Records Reporting Research Research Personnel Research Project Grants Resources Risk Risk Assessment Risk Management Scientist Services Societies System Theoretical model Time Translations United States National Institutes of Health Variant Work base cohort cost data management data sharing database of Genotypes and Phenotypes design genomic data human subject phenome phenotypic data precision medicine repository risk mitigation social socioeconomics statistics theories tool virtual

项目摘要

The past decade has witnessed numerous demonstrations that genomic data can be traced back to the corresponding named individuals. These attacks exploit various collections, including the NIH Database of Genotypes and Phenotypes (dbGaP), the 1000 Genomes Project, and the Beacon Project of the Global Alliance for Genomics and Health, and are often reported in the popular media. At the same time, research conducted in the first phase of this grant (from 2012-2016) showed that such re-identification attacks often represent worst- case, non-generalizable scenarios. Specifically, it was shown that these attacks often focus on the possibility of attack - and not its probability given the wide range of factors often at play in practice. By focusing on the possible, such investigations can lead policy makers to believe that de-identification is a useless activity. However, our research showed that de-identification is only one part of a larger strategy of deterrents that can be used to manage risk. By intelligently combining de-identification with other technical risk mitigation approaches (e.g., controlled access) and societal constructs (e.g., data use agreements and penalties), genomic data sharing solutions can be developed with appropriate levels of risk and utility for scientists and society. While our research laid the foundation for managing identification risk in genomic data sharing, significant questions remain regarding its translation into practical guidance. In particular, risk management models must be specialized to the type of data that is shared, the types of penalties (or punishments) available, and the costs of adopting and administering deterrence mechanisms. Thus, in the second phase of this research project, we propose to augment risk-based re-identification management frameworks to model and assess the deterrence approaches invoked by existing repositories, such as dbGaP (which holds a collection of smaller historical datasets from completed studies), as well as emerging initiatives, such as the Precision Medicine Initiative. This project will pursue three specific aims, designed to work in harmony, but at the same time sufficiently independent that if one fails, the research will still yield fruitful risk management guidance for genomic databases: 1) Develop game theoretic models to assess re-identification attacks at different levels of detail in genomic data sharing (e.g., aggregate summaries of the proportion of variants in case vs. control groups in association studies); 2) Characterize and measure the costs associated with common re-identification deterrence approaches for genomic data (e.g., physical investigatory reviews and virtual audits of IT system use); and 3) Optimize the parameterization of a deterrence policy (e.g., the amount of damages for violation of a data use agreement or the amount of time to withhold data from an attacker/investigator) given the expected value of genomic data. We will evaluate these approaches with a large repository of de-identified genomic and electronic medical records in use at a large academic medical center, datasets hosted at two federal repositories, and a web system that presents summary statistics from a cohort of 9000 participants.

在过去的十年里，已经有许多证据表明，基因组数据可以追溯到对应的命名个体。这些攻击利用各种集合，包括NIH数据库，基因型和表型（dbGaP），1000个基因组计划和全球联盟的灯塔计划基因组学和健康，并经常在大众媒体报道。与此同时，进行的研究在第一阶段（2012-2016年）的研究表明，这种重新识别攻击通常是最严重的- 案例，不可推广的场景。具体而言，研究表明，这些攻击往往侧重于以下可能性：攻击-而不是它的概率考虑到广泛的因素往往在实践中发挥作用。通过关注这种调查有可能导致决策者认为，取消身份是一种无用的活动。然而，我们的研究表明，去身份化只是更大的威慑战略的一部分，用于管理风险。通过智能地将去识别与其他技术风险缓解相结合，方法（例如，受控访问）和社会结构（例如，数据使用协议和处罚），基因组可以开发出对科学家和社会具有适当风险和效用的数据共享解决方案。而我们的研究为管理基因组数据共享中的识别风险奠定了基础，将其转化为实践指导。特别是，风险管理模型必须专门针对共享的数据类型，可用的惩罚类型（或惩罚）以及采用和管理威慑机制。因此，在本研究项目的第二阶段，我们建议加强基于风险的重新识别管理框架，以模拟和评估威慑由现有存储库调用的方法，如dbGaP（它包含一个较小的历史数据集），已完成研究的数据集），以及新兴的倡议，如精准医学倡议。这该项目将追求三个具体目标，旨在协调工作，但同时又足够独立如果其中一个失败了，这项研究仍然会为基因组数据库提供富有成效的风险管理指导：1）开发在基因组数据共享中评估不同细节级别的重新识别攻击的博弈论模型 (e.g.,关联研究中病例组与对照组中变异比例的汇总）; 2）描述和衡量与共同的重新识别威慑方法相关的成本，基因组数据（例如，对信息技术系统使用情况进行物理解释性审查和虚拟审计）;以及3）优化威慑策略的参数化（例如，违反数据使用协议的损害赔偿金额，或向攻击者/调查者隐瞒数据的时间量）。我们将评估这些方法与一个大的知识库去识别基因组和电子医疗在一个大型学术医疗中心使用的记录，在两个联邦存储库托管的数据集，以及一个Web系统它提供了9000名参与者的汇总统计数据。

项目成果

期刊论文数量（4）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Robust Transparency Against Model Inversion Attacks.

DOI：
10.1109/tdsc.2020.3019508
发表时间：
2021-09
期刊：
IEEE transactions on dependable and secure computing
影响因子：
7.3
作者：
Alufaisan Y;Kantarcioglu M;Zhou Y
通讯作者：
Zhou Y

Integrating linear optimization with structural modeling to increase HIV neutralization breadth.

将线性优化与结构建模相结合，以增加 HIV 中和广度。

DOI：
10.1371/journal.pcbi.1005999
发表时间：
2018
期刊：
PLoS computational biology
影响因子：
4.3
作者：
Sevy,AlexanderM;Panda,Swetasudha;CroweJr,JamesE;Meiler,Jens;Vorobeychik,Yevgeniy
通讯作者：
Vorobeychik,Yevgeniy

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Bradley A. Malin其他文献

Dataset Representativeness and Downstream Task Fairness

数据集代表性和下游任务公平性

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Victor A. Borza;Andrew Estornell;Chien;Bradley A. Malin;Yevgeniy Vorobeychik
通讯作者：
Yevgeniy Vorobeychik

APPLICATIONS OF HOMOMORPHIC ENCRYPTION

同态加密的应用

DOI：
发表时间：
2017
期刊：
影响因子：
0
作者：
David Archer;Lily Chen;Jung Hee Cheon;Ran Gilad;Roger A. Hallman;Zhicong Huang;Xiaoqian Jiang;R. Kumaresan;Bradley A. Malin;Heidi Sofia;Yongsoo Song;Shuang Wang
通讯作者：
Shuang Wang

Protecting Genomic Sequence Anonymity with Generalization Lattices

DOI：
10.1055/s-0038-1634025
发表时间：
2005
期刊：
Methods of Information in Medicine
影响因子：
1.7
作者：
Bradley A. Malin
通讯作者：
Bradley A. Malin

Optimizing word embeddings for small datasets: a case study on patient portal messages from breast cancer patients

DOI：
10.1038/s41598-024-66319-z
发表时间：
2024-07-12
期刊：
Scientific Reports
影响因子：
3.900
作者：
Qingyuan Song;Congning Ni;Jeremy L. Warner;Qingxia Chen;Lijun Song;S. Trent Rosenbloom;Bradley A. Malin;Zhijun Yin
通讯作者：
Zhijun Yin

Computational strategic recruitment for representation and coverage studied in the All of Us Research Program

在“我们所有人”研究计划中研究的代表和覆盖范围的计算战略招聘

DOI：
10.1038/s41746-025-01804-x
发表时间：
2025-07-03
期刊：
npj Digital Medicine
影响因子：
15.100
作者：
Victor A. Borza;Qingxia Chen;Ellen W. Clayton;Murat Kantarcioglu;Lina Sulieman;Yevgeniy Vorobeychik;Bradley A. Malin
通讯作者：
Bradley A. Malin