权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Risk Management Framework for Identifiability in Genomics Research

基因组学研究中可识别性的风险管理框架

基本信息

批准号：
8695427
负责人：
Bradley A. Malin
金额：
$ 34.3万
依托单位：
VANDERBILT UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2012
资助国家：
美国
起止时间：
2012-09-21 至 2016-06-30
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8695427
关键词：
Access to Information Address Adopted Adoption Agreement Clinical Clinical Data Commit Computers Costs and Benefits DNA Sequence Databases Data Data Protection Data Sources Databases Effectiveness Electronic Health Record Engineering Equilibrium Ethics Evaluation Evolution Foundations Genome Genomics Genotype Goals Guidelines Health Insurance Portability and Accountability Act Healthcare High-Throughput Nucleotide Sequencing Human Genome Project Hybrids Individual Label Legal Literature Marketing Measurement Measures Medicine Methods Modeling Modification Motivation Nature Outcome Perception Plant Roots Policies Policy Maker Population Price Privacy Protocols documentation Provider Regulation Relative (related person)Research Research Project Grants Resources Risk Risk Assessment Risk Management Safety Simulate Solutions Staging Structure Time Work base common rule computerized data processing computerized tools cost court data sharing design firewall health care delivery improved population based public health relevance social sound statistics tool usability

项目摘要

DESCRIPTION (provided by applicant): When the Human Genome Project was completed almost ten years ago it cost millions of dollars to sequence an individual's genome. Yet, the evolution of high-throughput sequencing and computational tools has been swift and it will soon be possible to genotype anyone for a nominal price. The ability to generate genomic data coincides with the adoption of electronic health records, setting the stage for large-scale personalized medicine research, the results of which can improve the efficiency, effectiveness, and safety of healthcare delivery. To ease barriers to population-based research, genomic and clinical data are often made available via a de- identified designation by various policies and regulations. However, there is a growing perception that de- identification is a fallacy and that biomedical data can be re-identified with relative ease. This argument, which is partially based on our own studies, forms the core of calls for legislative and regulatory modifications in the literature and court cases. Most notably, a recent Advanced Notice of Proposed Rule Making (ANPRM) inquires if biospecimens, as well as derived genomic data, should be redefined as inherently identifiable. Such labeling would require changes to the Common Rule and HIPAA Privacy Rule and could influence the availability of genomic data for research. It is clear that only a small amount of genomic data is necessary to uniquely distinguish an individual, even in the context of aggregated statistics. However, at the same time, it must be recognized that "distinguishable" is not equivalent to "identifiable" and though re-identification is possible it des not imply it is probable. Identifiability concerns should not be trivialized, but there is currentl no sound basis for reasoning about such risks, limiting the ability to make informed policy decisions. There are many factors associated with identifiability, including the information shared with genomic data (e.g., clinical, demographic), with whom it is shared, what other sources of data exist, and the relevant legal landscape. A limiting factor of prior studies in genomic identifiability is their consideration of these factors in isolation, which provides an incomplete picture. To fill this void, the overarching objective of our research is to engineer a foundation, rooted in ethical, legal, and computational formalisms, that provides a basis for reasoning about, and managing, genomic data identifiability risks. This foundation will be realized through specific aims: (1) build a protocol for modeling the extent to which sharing genomic data can substantiate re-identification concerns, (2) design and evaluate practical measures of genomic identifiability for risk assessment protocols, (3) develop a strategy that supplies options to mitigate genomic data identification risks. We envision several notable outcomes from this project. First, this work will yield guidelines and risk assessment strategies that can be employed by genomic data managers and policy makers to inform their decisions regarding identifiability. Second, we will perform an evaluation of our framework with a real, large de-identified database of clinical and genomic data to provide tangible and pragmatic results.

描述(申请人提供)：当人类基因组计划在近十年前完成时，对一个人的基因组进行测序需要花费数百万美元。然而，高通量测序和计算工具的发展一直很迅速，很快就可以以象征性的价格对任何人进行基因分型。产生基因组数据的能力与电子健康记录的采用相吻合，为大规模个性化医学研究奠定了基础，其结果可以提高医疗保健提供的效率、有效性和安全性。为了减少基于人群的研究的障碍，基因组和临床数据通常是通过各种政策和法规指定的非特定名称来提供的。然而，越来越多的人认为，去身份识别是一种谬误，生物医学数据可以相对容易地重新身份识别。这一论点部分基于我们自己的研究，构成了在文献和法院案例中呼吁立法和监管修改的核心。最值得注意的是，最近的一份拟议规则制定提前通知(ANPRM)询问生物标本以及衍生的基因组数据是否应该被重新定义为内在可识别的。这样的标签将需要修改共同规则和HIPAA隐私规则，并可能影响用于研究的基因组数据的可用性。显然，即使在综合统计的背景下，也只需要少量的基因组数据来唯一区分一个人。然而，与此同时，必须认识到，“可区分”并不等同于“可识别”，虽然重新识别是可能的，但这并不意味着它是可能的。不应轻视可辨识性问题，但目前没有关于此类风险的合理依据，限制了做出明智政策决定的能力。与可识别性相关的因素很多，包括与基因组数据共享的信息(例如，临床、人口统计)、与谁共享信息、存在哪些其他数据来源以及相关的法律环境。以前的基因组识别研究的一个限制因素是他们孤立地考虑这些因素，这提供了一个不完整的图景。为了填补这一空白，我们研究的首要目标是设计一个根植于伦理、法律和计算形式主义的基础，为推理和管理基因组数据可识别风险提供基础。这一基础将通过具体目标实现：(1)建立一个协议，用于模拟共享基因组数据在多大程度上能够证实重新鉴定的担忧，(2)设计和评估风险评估协议的基因组可识别性的实用措施，(3)制定一项战略，提供缓解基因组数据鉴定风险的选择。我们预计这个项目会有几个值得注意的结果。首先，这项工作将产生指导方针和风险评估战略，基因组数据管理者和政策制定者可以使用这些指导方针和风险评估战略，为他们关于可识别性的决策提供信息。其次，我们将使用一个真实的、大型的、未识别的临床和基因组数据数据库对我们的框架进行评估，以提供切实和务实的结果。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Bradley A. Malin其他文献

Dataset Representativeness and Downstream Task Fairness

数据集代表性和下游任务公平性

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Victor A. Borza;Andrew Estornell;Chien;Bradley A. Malin;Yevgeniy Vorobeychik
通讯作者：
Yevgeniy Vorobeychik

APPLICATIONS OF HOMOMORPHIC ENCRYPTION

同态加密的应用

DOI：
发表时间：
2017
期刊：
影响因子：
0
作者：
David Archer;Lily Chen;Jung Hee Cheon;Ran Gilad;Roger A. Hallman;Zhicong Huang;Xiaoqian Jiang;R. Kumaresan;Bradley A. Malin;Heidi Sofia;Yongsoo Song;Shuang Wang
通讯作者：
Shuang Wang

Protecting Genomic Sequence Anonymity with Generalization Lattices

DOI：
10.1055/s-0038-1634025
发表时间：
2005
期刊：
Methods of Information in Medicine
影响因子：
1.7
作者：
Bradley A. Malin
通讯作者：
Bradley A. Malin

Optimizing word embeddings for small datasets: a case study on patient portal messages from breast cancer patients

DOI：
10.1038/s41598-024-66319-z
发表时间：
2024-07-12
期刊：
Scientific Reports
影响因子：
3.900
作者：
Qingyuan Song;Congning Ni;Jeremy L. Warner;Qingxia Chen;Lijun Song;S. Trent Rosenbloom;Bradley A. Malin;Zhijun Yin
通讯作者：
Zhijun Yin

Computational strategic recruitment for representation and coverage studied in the All of Us Research Program

在“我们所有人”研究计划中研究的代表和覆盖范围的计算战略招聘

DOI：
10.1038/s41746-025-01804-x
发表时间：
2025-07-03
期刊：
npj Digital Medicine
影响因子：
15.100
作者：
Victor A. Borza;Qingxia Chen;Ellen W. Clayton;Murat Kantarcioglu;Lina Sulieman;Yevgeniy Vorobeychik;Bradley A. Malin
通讯作者：
Bradley A. Malin