权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Incorporating molecular network knowledge into predictive data-driven models

将分子网络知识纳入预测数据驱动模型

基本信息

批准号：
10506964
负责人：
Christopher Andrew Mancuso
金额：
$ 0.25万
依托单位：
MICHIGAN STATE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2019
资助国家：
美国
起止时间：
2019-09-01 至 2022-08-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/10506964
关键词：
Accounting Address Age Architecture Back Binding Biological Biological Models Biological Process Biology Cell physiology Complex Computational Technique Data Data Analyses Data Collection Development Plans Dimensions Disease Engineering Ethnic Origin Fellowship Follow-Up Studies Freedom Gene Expression Gene Proteins Genes Genetic Goals Human Joints Knowledge Label Machine Learning Measures Metadata Methods Modeling Modernization Molecular Nature Noise Pathway interactions Patients Pattern Phenotype Play Precision Medicine Initiative Process Research Research Personnel Role Sampling Source Structure Techniques Tissues Translating base data to knowledge data-driven model deep learning deep learning model design disease phenotype driving force experimental study functional genomics gene function gene network genetic association genetic signature genomic data genomic profiles human data interest machine learning method mathematical sciences novel predictive modeling professor sex statistical and machine learning tool trait transcriptome

项目摘要

Modern computational techniques based on machine-learning (ML) and, more recently, deep-learning (DL) are playing a critical role in realizing the precision medicine initiative. However, there is a critical need to systematically combine these powerful data-driven techniques with prior molecular network knowledge to make more accurate predictive models while also satisfactorily explaining their predictions in terms of mechanisms underlying complex traits and diseases. I propose to use domain specific knowledge from biology and computing to tackle three outstanding problems: 1) how to predict missing labels associated with millions of publicly available samples? 2) what molecular/cellular function can be attached to these samples and 3) how can we translate the findings from human data to a model species and back? Network-constrained Deep Learning for Metadata Imputation: Most multifactorial phenotypes are tissue dependent and manifest differently depending on age, sex, and ethnicity. However, a majority of publicly-available genomic data lack these labels. I will develop a network-guided approach to predict missing metadata of samples based on their expression profiles by designing novel data-driven models where the model architecture and/or structure of the input data are constrained by an underlying gene network. Network-guided Functional Analysis of Genomic Data: High-throughput experiments often generate lists of genes of interest that are hard to interpret. Functional enrichment analysis (FEA) is a powerful tool that attaches functional meaning to an experimental set of genes by summarizing them into sets of pathways/processes. However, standard FEA analysis is limited by incomplete knowledge of gene function, lack of context of the underlying gene network, and noise in expression data. I will address these limitations by developing a network-guided approach that jointly captures genes, their interactions, and their known biological pathways/processes into a common, low-dimensional space that facilitates deriving biological meaning by comparing the distance between the experimental gene set and the pathway/process of interest. Joint Multi-Species Genomic Data Analysis and Knowledge Transfer: In particular, finding the optimal model system to use in a follow-up study based on genetic signatures derived from human experiments is challenging because genetic networks can be quite different from species to species. I propose to use data-driven models to embed heterogeneous networks comprised of human genes and model species genes into a common, low-dimensional space to better compare genetic signatures between two (or even multiple) species. I will apply these methods to three specific tasks, but I emphasize that the results of this study will be transferable to any other biological problem where complex gene/protein interactions are a major component. I have surrounded myself with a great support team and developed a strong professional development plan. The freedom and support provided by the F32 fellowship will be instrumental in achieving my goal of becoming a professor with an independent research group.

基于机器学习（ML）和最近的深度学习（DL）的现代计算技术正在在实现精准医疗倡议方面发挥着关键作用。然而，迫切需要系统地将这些强大的数据驱动技术与先前的分子网络知识联合收割机相结合，更准确的预测模型，同时也令人满意地解释了他们的预测机制潜在的复杂特征和疾病。我建议使用生物学领域的特定知识，计算来解决三个突出的问题：1）如何预测与数百万个公开的样品？2)什么样的分子/细胞功能可以附着到这些样品上，以及3）如何附着到这些样品上我们能否将人类数据的发现转化为模式物种的研究结果？网络约束深度学习元数据插补：大多数多因素表型是组织依赖性和明显的根据年龄、性别和种族而有所不同。然而，大多数公开的基因组数据缺乏这些标签。我将开发一种网络引导的方法，根据样本的通过设计新颖的数据驱动模型，其中表达谱的模型架构和/或结构输入数据受到底层基因网络的约束。网络引导的基因组功能分析数据：高通量实验通常会产生难以解释的感兴趣基因列表。功能丰富分析（FEA）是一种强大的工具，它将功能意义附加到实验通过将它们总结成一组途径/过程来分析一组基因。然而，标准的有限元分析是有限的由于对基因功能的不完全了解，缺乏潜在基因网络的背景，以及表达数据。我将通过开发一种网络引导的方法来解决这些限制，基因，它们的相互作用，以及它们已知的生物学途径/过程，通过比较实验基因之间的距离，设置和感兴趣的途径/过程。联合多物种基因组数据分析和知识转移：特别是，找到最佳模型系统，用于基于遗传学的后续研究。来自人类实验的签名具有挑战性，因为基因网络可能截然不同从一个物种到另一个物种我建议使用数据驱动的模型来嵌入异构网络，将人类基因和模式物种基因放入一个公共的低维空间中，以更好地比较遗传两个（甚至多个）物种之间的签名。我将把这些方法应用于三个具体任务，但我我强调，这项研究的结果将转移到任何其他生物学问题，其中复杂的基因/蛋白质相互作用是主要组成部分。我身边有一个很棒的支持团队，制定了强有力的专业发展计划。F32奖学金提供的自由和支持将有助于实现我的目标，成为一名教授，拥有一个独立的研究小组。