Transforming dbGaP genetic and genomic data to FAIR-ready by artificial intelligence and machine learning algorithms

通过人工智能和机器学习算法将 dbGaP 遗传和基因组数据转变为 FAIR-ready

基本信息

项目摘要

dbGaP is a repository for NIH funded projects and it contains many genetic and genomic data. However, data there are not ready for AI and machine learning applications. This application proposes methods to address this issue. We have two aims: 1). Develop and standardize procedures to transfer genetic and genomic data into image like objects and tokenized custom vocabulary so that the data can be utilized by advanced AI algorithms such CNN, autoencoder and transformer. To transform genetic data into image, we recode allele dosage value as pixel intensity and arrange a collection of genetic markers such as SNPs and CNVs into an artificial image object so that it can be analyzed by CNN algorithms. Genetic markers can also be used to define haplotypes, which can be tokenized into custom vocabularies for use in NLP models. 2). Use Alzheimer's disease and schizophrenia as case studies to demonstrate the utilities of transformed data for the discovery and identification of risk variants/genes for both conditions. We plan to impute genetically controlled gene expression using brain specific eQTLs and individual genotypes for an AD dataset, and transform the expression data into image objects for analyses by CNN model with self attention mechanism. For schizophrenia, we plan to use k- mer tokenizer to break haplotypes into a collection of small haplotype blocks and treat them as tokens for analyses by NLP models. We use both CNN and NLP models as screen tools to select promising candidates using the attention weights, and then directly test these candidates for their association with AD/schizophrenia using logistic regression. Due to the selection effect, we can dramatically reduce the number of testing, significantly increase our statistical power to detect risk variants/genes to AD/schizophrenia.
dbGaP是NIH资助项目的存储库,它包含许多遗传和基因组数据。 然而,那里的数据还没有为人工智能和机器学习应用做好准备。本申请 提出了解决这一问题的方法。我们有两个目标:1)。发展和规范 将遗传和基因组数据转换为图像状对象和标记化定制的过程 词汇表,以便数据可以被先进的人工智能算法(如CNN、自动编码器)利用。 和Transformer。为了将遗传数据转化为图像,我们将等位基因剂量值重新编码为像素 将诸如SNP和CNV遗传标记的集合强度化并排列成人工 图像对象,以便可以通过CNN算法进行分析。也可以使用遗传标记 定义单倍型,可以标记成自定义词汇表,用于NLP模型。 2)。使用阿尔茨海默病和精神分裂症作为案例研究,以证明 用于发现和鉴定两种病症的风险变体/基因的转换数据。 我们计划使用脑特异性eQTL来估算遗传控制的基因表达, AD数据集的个体基因型,并将表达数据转换为图像对象 用具有自注意机制的CNN模型进行分析。对于精神分裂症,我们计划使用k- mer标记化器将单倍型分解成小的单倍型块的集合,并将它们视为 用于NLP模型分析的令牌。我们使用CNN和NLP模型作为屏幕工具, 使用注意力权重选择有希望的候选者,然后直接测试这些候选者 使用logistic回归分析其与AD/精神分裂症的关联。由于选择效应, 我们可以大大减少测试的数量,显著提高我们的统计能力, 检测AD/精神分裂症的风险变体/基因。

项目成果

期刊论文数量(42)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
DeepVISP: Deep Learning for Virus Site Integration Prediction and Motif Discovery.
TissGDB: tissue-specific gene database in cancer.
  • DOI:
    10.1093/nar/gkx850
  • 发表时间:
    2018-01-04
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Kim P;Park A;Han G;Sun H;Jia P;Zhao Z
  • 通讯作者:
    Zhao Z
Federated learning based futuristic biomedical big-data analysis and standardization.
  • DOI:
    10.1371/journal.pone.0291631
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    3.7
  • 作者:
    Fathima, Afifa Salsabil;Basha, Syed Muzamil;Ahmed, Syed Thouheed;Mathivanan, Sandeep Kumar;Rajendran, Sukumar;Mallik, Saurav;Zhao, Zhongming
  • 通讯作者:
    Zhao, Zhongming
KinaseMD: kinase mutations and drug response database.
  • DOI:
    10.1093/nar/gkaa945
  • 发表时间:
    2021-01-08
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Hu R;Xu H;Jia P;Zhao Z
  • 通讯作者:
    Zhao Z
Splicing QTL of human adipose-related traits.
  • DOI:
    10.1038/s41598-017-18767-z
  • 发表时间:
    2018-01-10
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Ma L;Jia P;Zhao Z
  • 通讯作者:
    Zhao Z
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhongming Zhao其他文献

Zhongming Zhao的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhongming Zhao', 18)}}的其他基金

Constructing A Transcriptomic Atlas of Retrotransposon in Alzheimer's Disease
构建阿尔茨海默病逆转录转座子转录组图谱
  • 批准号:
    10431366
  • 财政年份:
    2022
  • 资助金额:
    $ 30.61万
  • 项目类别:
Deep learning methods to predict the function of genetic variants in orofacial clefts
深度学习方法预测口颌裂遗传变异的功能
  • 批准号:
    9764346
  • 财政年份:
    2018
  • 资助金额:
    $ 30.61万
  • 项目类别:
Predicting Phenotype by Deep Learning Heterogeneous Multi-Omics Data
通过深度学习异构多组学数据预测表型
  • 批准号:
    10318084
  • 财政年份:
    2017
  • 资助金额:
    $ 30.61万
  • 项目类别:
Predicting Phenotype by Using Transcriptomic Alteration as Endophenotype
使用转录组改变作为内表型预测表型
  • 批准号:
    9980998
  • 财政年份:
    2017
  • 资助金额:
    $ 30.61万
  • 项目类别:
Predicting Phenotype by Deep Learning Heterogeneous Multi-Omics Data
通过深度学习异构多组学数据预测表型
  • 批准号:
    10640868
  • 财政年份:
    2017
  • 资助金额:
    $ 30.61万
  • 项目类别:
Predicting Phenotype by Deep Learning Heterogeneous Multi-Omics Data
通过深度学习异构多组学数据预测表型
  • 批准号:
    10449376
  • 财政年份:
    2017
  • 资助金额:
    $ 30.61万
  • 项目类别:
Predicting Phenotype by Using Transcriptomic Alteration as Endophenotype
使用转录组改变作为内表型预测表型
  • 批准号:
    9750105
  • 财政年份:
    2017
  • 资助金额:
    $ 30.61万
  • 项目类别:
Mapping the Genetic Architecture of Complex Disease via RNA-seq and GWAS
通过 RNA-seq 和 GWAS 绘制复杂疾病的遗传结构
  • 批准号:
    9212507
  • 财政年份:
    2016
  • 资助金额:
    $ 30.61万
  • 项目类别:
MicroRNA and Transcription Factor Co-regulation in Cancer
癌症中的 MicroRNA 和转录因子共同调控
  • 批准号:
    9329385
  • 财政年份:
    2016
  • 资助金额:
    $ 30.61万
  • 项目类别:
MicroRNA and Transcription Factor Co-regulation in Cancer
癌症中的 MicroRNA 和转录因子共同调控
  • 批准号:
    9093087
  • 财政年份:
    2016
  • 资助金额:
    $ 30.61万
  • 项目类别:

相似海外基金

SHINE: Origin and Evolution of Compressible Fluctuations in the Solar Wind and Their Role in Solar Wind Heating and Acceleration
SHINE:太阳风可压缩脉动的起源和演化及其在太阳风加热和加速中的作用
  • 批准号:
    2400967
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Standard Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328975
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Continuing Grant
EXCESS: The role of excess topography and peak ground acceleration on earthquake-preconditioning of landslides
过量:过量地形和峰值地面加速度对滑坡地震预处理的作用
  • 批准号:
    NE/Y000080/1
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Research Grant
Market Entry Acceleration of the Murb Wind Turbine into Remote Telecoms Power
默布风力涡轮机加速进入远程电信电力市场
  • 批准号:
    10112700
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Collaborative R&D
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328973
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Continuing Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328972
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Continuing Grant
Collaborative Research: FuSe: R3AP: Retunable, Reconfigurable, Racetrack-Memory Acceleration Platform
合作研究:FuSe:R3AP:可重调、可重新配置、赛道内存加速平台
  • 批准号:
    2328974
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Continuing Grant
Collaborative Research: A new understanding of droplet breakup: hydrodynamic instability under complex acceleration
合作研究:对液滴破碎的新认识:复杂加速下的流体动力学不稳定性
  • 批准号:
    2332916
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Standard Grant
Collaborative Research: A new understanding of droplet breakup: hydrodynamic instability under complex acceleration
合作研究:对液滴破碎的新认识:复杂加速下的流体动力学不稳定性
  • 批准号:
    2332917
  • 财政年份:
    2024
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Standard Grant
Radiation GRMHD with Non-Thermal Particle Acceleration: Next-Generation Models of Black Hole Accretion Flows and Jets
具有非热粒子加速的辐射 GRMHD:黑洞吸积流和喷流的下一代模型
  • 批准号:
    2307983
  • 财政年份:
    2023
  • 资助金额:
    $ 30.61万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了