权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Discovering interpretable mechanisms explaining high dimensional biomolecular data

发现解释高维生物分子数据的可解释机制

基本信息

批准号：
10711988
负责人：
Milo Lin
金额：
$ 41万
依托单位：
UT SOUTHWESTERN MEDICAL CENTER
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-09-01 至 2028-07-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10711988
关键词：
Acceleration Address Amino Acid Sequence Amyloid beta-Protein Antibiotic Resistance Antibiotics Automobile Driving Behavior Biological Assay Biology Cell Fate Control Collaborations Complex Computing Methodologies Data Data Set Dimensions Directed Molecular Evolution Disease Foundations Future Goals Health Human Informatics Intuition Learning Libraries Liquid substance Methods Modeling Network-based Neurobiology Neurodegenerative Disorders Patients Pattern Peptide Library Peptides Pharmaceutical Preparations Physics Proteins RNA RNA Sequences Resources Running Sampling Science Structure Time Training Work artificial neural network beta-Lactamase deep learning disease diagnosis experimental study high dimensionality insight invention large datasets molecular dynamics monomer mutant neural network neurotoxic protein aggregation self assembly simulation tau Proteins tau aggregation

项目摘要

Discovering interpretable mechanisms explaining high-dimensional biomolecular data Project summary. How protein and RNA sequence encodes folding, aggregation, and function is a fundamental question with wide-ranging human health implications. Discovering predictive principles for this encoding requires computational approaches that offer mechanistic insight, especially for the large fraction of intrinsically disordered proteins for which experimental structural information is limited. Yet the complexity and dimensionality of this problem poses fundamental challenges to existing computational methods. The axiomatic approach, modeling behavior from first-principles, is limited by simulation runtime and unknown context-dependent parameters. Informatics-based approaches such as deep learning could potentially discover principles by integrating large datasets across scales and complexity. However, these models produce “black box” predictions that i) are difficult to understand and ii) generalize poorly beyond their training data (i.e. well-understood regime). My lab developed methods to overcome limitations of both types of approaches. (1) Axiomatic: we developed a statistical physics method to exponentially enhance sampling of protein self-assembly from structurally heterogeneous monomers in molecular dynamics simulations. (2) Informatic: we invented essence neural networks (ENNs) based on neurobiological principles and demonstrated that they overcome the above limitations of deep learning on a wide range of learning tasks, including sequence-to-function prediction. Using both axiomatic and informatic approaches, in the next five years my lab will tackle three instances of the sequence-structure-function problem: 1) Use enhanced sampling molecular dynamics simulations to discover transition states of neurotoxic oligomer and fibril formation of Abeta and tau peptide monomers; 2) Use ENNs to discover the RNA-sequence rules driving RNA-associated tau fibril aggregation in neurodegenerative disease using tau protein and colocalized RNA sequence datasets; 3) Use ENNs to distill the sequence rules determining whether a strain or mutant of beta lactamase protein can neutralize each antibiotic within a diverse drug panel, and identify potential future antibiotic resistant mutants. Our long-term goal is to develop an ENN- based platform for automated transformation of data into axioms. Leveraging well-established collaborations with colleagues of wide expertise, we will pursue these goals by combining our unique computational approaches with experimental resources, including time-resolved protein aggregation assays, patient-derived tau fibrils co- localized with sequence-specific RNA, high-throughput liquid culture antibiotic screens, multiplexed directed evolution experiments of antibiotic resistance, and large in-house libraries of peptide and RNA mutant libraries. This work lays the foundation for transforming large datasets into human-understandable rules connecting sequence to function and relating these rules to physical mechanisms of structural dynamics. This in turn could accelerate disease diagnosis and treatment.

发现解释高维生物分子数据的可解释机制项目摘要。蛋白质和RNA序列如何编码折叠，聚集和功能是一个基本的对人类健康有着广泛影响的问题。发现这种编码的预测原则需要提供机械洞察力的计算方法，特别是对于大部分本质上实验结构信息有限的无序蛋白质。然而，这个问题的解决对现有的计算方法提出了根本性的挑战。公理方法，从第一性原理建模行为，受到仿真运行时间和未知上下文相关性的限制参数基于信息学的方法，如深度学习，可以通过以下方式发现原理：跨规模和复杂性集成大型数据集。然而，这些模型产生“黑箱”预测 i）难以理解，ii）在训练数据之外概括性差（即，良好理解的机制）。我的实验室开发了克服这两种方法局限性的方法。(1)公理：我们开发了一种统计物理方法，以指数方式增强蛋白质自组装的采样，分子动力学模拟中的结构异质单体。(2)信息：我们发明了精华基于神经生物学原理的神经网络（恩斯），并证明它们克服了上述问题深度学习在广泛的学习任务上的局限性，包括序列到功能的预测。在接下来的五年里，我的实验室将同时使用公理化和信息化方法来解决三个问题。序列-结构-功能问题：1）使用增强的采样分子动力学模拟，发现神经毒性寡聚体的过渡态和Abeta和Tau肽单体的原纤维形成; 2）用途恩斯发现驱动神经退行性疾病中RNA相关tau纤维聚集的RNA序列规则使用tau蛋白和共定位的RNA序列数据集的疾病; 3）使用恩斯提取序列规则确定β-内酰胺酶蛋白的菌株或突变体是否可以中和多种抗生素中的每种抗生素。药物面板，并确定潜在的未来抗生素耐药突变体。我们的长期目标是发展一个新的网络- 基于平台的数据自动转换为公理。利用完善的合作关系我们将与具有广泛专业知识的同事一起，通过结合我们独特的计算方法来实现这些目标利用实验资源，包括时间分辨蛋白质聚集测定，患者来源的tau纤维共定位与序列特异性RNA，高通量液体培养抗生素筛选，多重定向抗生素抗性的进化实验，以及肽和RNA突变体库的大型内部库。这项工作为将大型数据集转换为人类可理解的规则奠定了基础将序列与功能联系起来，并将这些规则与结构动力学的物理机制联系起来。这可以加快疾病的诊断和治疗。