Enhancing open data sharing for functional genomics experiments: Measures to quantify genomic information leakage and file formats for privacy preservation

加强功能基因组学实验的开放数据共享:量化基因组信息泄漏的措施和保护隐私的文件格式

基本信息

  • 批准号:
    10703382
  • 负责人:
  • 金额:
    $ 52.65万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-09-02 至 2025-06-30
  • 项目状态:
    未结题

项目摘要

Project Summary/Abstract: With the surge of large genomics data, there is an immense increase in the breadth and depth of different omics datasets and an increasing importance in the topic of privacy of individuals in genomic data science. Detailed genetic and environmental characterization of diseases and conditions relies on the large-scale mining of functional genomics data; hence, there is great desire to share data as broadly as possible. However, there is a scarcity of privacy studies focused on such data. A key first step in reducing private information leakage is to measure the amount of information leakage in functional genomics data, particularly in different data file types. To this end, we propose to to derive information-theoretic measures for private information leakage in different data types from functional genomics data. We will also develop various file formats to reduce this leakage during sharing. We will approach the privacy analysis under three aims. First, we will develop statistical metrics that can be used to quantify the sensitive information leakage from raw reads. We will systematically analyze how linking attacks can be instantiated using various genotyping methods such as single nucleotide variant and structural variant calling from raw reads, signal profiles, Hi-C interaction matrices, and gene expression matrices. Second, we will study different algorithms to implement privacy-preserving transformations to the functional genomics data in various forms. Particularly, we will create privacy-preserving file formats for raw sequence alignment maps, signal track files, three-dimensional interaction matrices, and gene expression quantification matrices that contain information from multiple individuals. This will allow us to study the sources of sensitive information leakages other than raw reads, for example signal profiles, splicing and isoform transcription, and abnormal three-dimensional genomic interactions. Third, we will investigate the reads that can be mapped to the microbiome in the raw human functional genomics datasets. We will use inferred microbial information to characterize private information about individuals, and then combine the microbial information with the information from human mapped reads to increase the re-identification accuracy in the linking attacks described in the second aim. We will use the tools to quantify the sensitive information and privacy-preserving file formats in the available datasets from large sequencing projects, such as the ENCODE, The Cancer Genome Atlas, 1,000 Genomes, gEUVADIS, and Genotype-Tissue Expression projects.
项目摘要/摘要:随着大规模基因组数据的激增, 不同组学数据集的广度和深度以及在隐私主题中日益重要的 基因组数据科学中的个人。疾病和疾病的详细遗传和环境特征 条件依赖于功能基因组数据的大规模挖掘;因此,人们非常希望分享 数据尽可能广泛。然而,关注这类数据的隐私研究很少。一把钥匙 减少私人信息泄露的第一步是衡量 功能基因组数据,特别是不同数据文件类型的数据。为此,我们建议推导出 泛函中不同数据类型隐私信息泄露的信息论方法 基因组学数据。我们还将开发各种文件格式,以减少共享过程中的这种泄漏。我们会 在三个目标下进行隐私分析。首先,我们将开发可用于 量化从原始读取中泄露的敏感信息。我们将系统地分析链接攻击是如何 可以使用各种基因分型方法来实例化,例如单核苷酸变异和结构 来自原始读数、信号图谱、Hi-C相互作用矩阵和基因表达矩阵的变体调用。 其次,我们将研究不同的算法来实现对泛函的隐私保护变换 各种形式的基因组数据。特别是,我们将为原始序列创建隐私保护的文件格式 对齐地图、信号轨迹文件、三维交互矩阵和基因表达 包含来自多个个体的信息的量化矩阵。这将使我们能够研究 原始读取以外的敏感信息泄漏的来源,例如信号配置文件、拼接和 异构体转录和异常的三维基因组相互作用。第三,我们将调查 可以映射到原始人类功能基因组数据集中的微生物组的读数。我们将使用 推断的微生物信息来表征关于个人的私人信息,然后将 利用微生物信息从人类图谱中读取信息,以增加重新鉴定 第二个目标中描述的链接攻击的准确性。我们将使用这些工具来量化敏感的 来自大型测序项目的可用数据集中的信息和隐私保护文件格式, 例如ENCODE、癌症基因组图谱、1,000个基因组、gEUVADIS和基因组织 表达式项目。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases.
  • DOI:
    10.1101/gr.278322.123
  • 发表时间:
    2023-12-27
  • 期刊:
  • 影响因子:
    7
  • 作者:
    Emani, Prashant S.;Geradi, Maya N.;Gursoy, Gamze;Grasty, Monica R.;Miranker, Andrew;Gerstein, Mark B.
  • 通讯作者:
    Gerstein, Mark B.
Storing and analyzing a genome on a blockchain.
  • DOI:
    10.1186/s13059-022-02699-7
  • 发表时间:
    2022-06-29
  • 期刊:
  • 影响因子:
    12.3
  • 作者:
  • 通讯作者:
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mark Bender Gerstein其他文献

Mark Bender Gerstein的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Mark Bender Gerstein', 18)}}的其他基金

1/2 Discovery and validation of neuronal enhancers associated with the development of psychiatric disorders
1/2 与精神疾病发展相关的神经元增强剂的发现和验证
  • 批准号:
    10801125
  • 财政年份:
    2023
  • 资助金额:
    $ 52.65万
  • 项目类别:
EDAC: ENCODE Data Analysis Center
EDAC:ENCODE数据分析中心
  • 批准号:
    10547896
  • 财政年份:
    2022
  • 资助金额:
    $ 52.65万
  • 项目类别:
Integrative analysis of genomics and imaging data from the BRAIN Initiative and other public data sources
对来自 BRAIN Initiative 和其他公共数据源的基因组学和成像数据进行综合分析
  • 批准号:
    10190025
  • 财政年份:
    2021
  • 资助金额:
    $ 52.65万
  • 项目类别:
Laboratory, Data Analysis, and Coordinating Center (LDACC) for the Developmental Human Genotype-Tissue Expression Project
人类发育基因型组织表达项目实验室、数据分析和协调中心 (LDACC)
  • 批准号:
    10306961
  • 财政年份:
    2021
  • 资助金额:
    $ 52.65万
  • 项目类别:
EDAC: ENCODE Data Analysis Center
EDAC:ENCODE数据分析中心
  • 批准号:
    10240955
  • 财政年份:
    2021
  • 资助金额:
    $ 52.65万
  • 项目类别:
Laboratory, Data Analysis, and Coordinating Center (LDACC) for the Developmental Human Genotype-Tissue Expression Project
人类发育基因型组织表达项目实验室、数据分析和协调中心 (LDACC)
  • 批准号:
    10709553
  • 财政年份:
    2021
  • 资助金额:
    $ 52.65万
  • 项目类别:
A Big Data Approach to Identify Epigenetic, Transcriptomic, and Network Dynamics as Immune Dysfunction Drivers Associated with HIV Infection and Substance Use Disorder
利用大数据方法识别表观遗传、转录组和网络动态作为与 HIV 感染和药物滥用障碍相关的免疫功能障碍驱动因素
  • 批准号:
    10408130
  • 财政年份:
    2020
  • 资助金额:
    $ 52.65万
  • 项目类别:
The Y-SCORCH Data Generation Center at Yale for Single-Cell Opioid Responses in the Context of HIV
耶鲁大学 Y-SCORCH 数据生成中心用于艾滋病毒背景下的单细胞阿片类药物反应
  • 批准号:
    10685384
  • 财政年份:
    2020
  • 资助金额:
    $ 52.65万
  • 项目类别:
The Y-SCORCH Data Generation Center at Yale for Single-Cell Opioid Responses in the Context of HIV
耶鲁大学 Y-SCORCH 数据生成中心用于艾滋病毒背景下的单细胞阿片类药物反应
  • 批准号:
    10461029
  • 财政年份:
    2020
  • 资助金额:
    $ 52.65万
  • 项目类别:
Supplement: Human Brain Collection for Study of the Neuropathogenesis of SARS-CoV-2, HIV-1, and Opioid Use Disorder
补充:用于研究 SARS-CoV-2、HIV-1 和阿片类药物使用障碍神经发病机制的人脑采集
  • 批准号:
    10468477
  • 财政年份:
    2020
  • 资助金额:
    $ 52.65万
  • 项目类别:

相似海外基金

Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
  • 批准号:
    MR/S03398X/2
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
  • 批准号:
    EP/Y001486/1
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
  • 批准号:
    2338423
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
  • 批准号:
    MR/X03657X/1
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
  • 批准号:
    2348066
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Standard Grant
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
  • 批准号:
    2341402
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
  • 批准号:
    AH/Z505481/1
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10107647
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    EU-Funded
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
  • 批准号:
    10106221
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
  • 批准号:
    AH/Z505341/1
  • 财政年份:
    2024
  • 资助金额:
    $ 52.65万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了