Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列

基本信息

项目摘要

ABSTRACT One of the “holy grails” in immunology is to be able to directly predict tight-binding variable chain antibody sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However, only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins. The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure- based network topographical features of the antigens and their cognate antibodies, and will output their respective binding affinity constants. We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based ML approaches for the immune system, to discover associations between the epitope and the variable chain features. This approach requires a large data stream of antigen and cognate antibody sequences, which until recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled with single cell deep sequencing (“linking B cell receptor to antigen specificity through sequencing” or LIBRA- seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity to antigenic epitopes. Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls), chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training sets, along with other data sets already available in public databases, will generate a series of structural features (described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer science to the solution of a general biological engineering problem. Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine- learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial enzymes, will also be dramatically enhanced if this project were to be successful. The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein crystallographer, an immunologist, and a molecular biologist. 1
摘要 免疫学的“圣杯”之一是能够直接预测紧密结合的可变链抗体 计算机模拟的抗外来或非自身“抗原性”蛋白的序列。免疫球蛋白链重排可以 可能编码抗体重链和轻链序列的大约1016种不同变体。然而,在这方面, 通常只有一小部分序列空间可用于进化抗外源蛋白的抗体。 计算的挑战是从抗原的结构模型到预测一组抗体 可以紧密结合抗原的链序列。如果解决了,可能在24小时内移动 从一种新型病毒蛋白的第一个冷冻电子显微镜结构, 用于测试的分子候选物。为了解决这个问题,该项目旨在开发一种深度学习 体系结构,将作为输入热力学,量子力学(密度泛函),和本地结构- 基于抗原及其同源抗体的网络拓扑特征,并将输出其 各自的结合亲和力常数。 我们将设计一个生成对抗网络(GAN),我们认为它非常适合基于回归的 ML方法用于免疫系统,以发现表位和可变链之间的关联 功能.这种方法需要大量的抗原和同源抗体序列的数据流, 最近很难获得。最近描述的单个B细胞受体(BCR)特异性标记方法, 用单细胞深度测序(“通过测序将B细胞受体与抗原特异性连接”或LIBRA- seq)可以快速分离和测序能够以高选择性结合的BCR可变链编码区 抗原表位。 针对具体的项目目标,在任务1中,LIBRA-seq将用于快速识别和生成候选人 免疫球蛋白编码序列对特异性线性和非线性表位的应答(对照), 通过计算/分子建模选择并优先考虑SARS-CoV-2刺突蛋白表位(但 不限于这些),注射到小鼠模型中,以生成大型训练集;在任务2中,这些训练 沿着公共数据库中已有的其他数据集,将产生一系列结构特征 (如上所述),其将用于训练GAN;在任务3中,预测的表位-抗体相互作用 将通过合成抗体和噬菌体展示系统的直接实验进行验证。因此,拟议的 战略结合了进化生物学、基因组学、结构化学和计算机的基本原理 解决一般生物工程问题的科学。 该项目的结果预计将为经过严格测试的全自动机器奠定基础- 一个学习系统,可以从一种新病毒的结构中快速生成合成抗体候选物。 蛋白质,这可以提高对未来大流行病的快速反应能力。有针对性地开发能力 针对非传染性或慢性疾病的抗体治疗,以及基于抗体的工业生产 如果这个项目成功的话,酶也将得到极大的提高。 团队:这个多机构研究项目的团队领导包括一位计算机科学家,一位蛋白质科学家, 晶体学家、免疫学家和分子生物学家。 1

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jeniffer Bertha Hernandez其他文献

Jeniffer Bertha Hernandez的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jeniffer Bertha Hernandez', 18)}}的其他基金

Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
  • 批准号:
    10274223
  • 财政年份:
    2021
  • 资助金额:
    $ 121.99万
  • 项目类别:

相似海外基金

Cerebral infarction treatment strategy using collagen-like "triple helix peptide" containing functional amino acid sequence
含功能氨基酸序列的类胶原“三螺旋肽”治疗脑梗塞策略
  • 批准号:
    23K06972
  • 财政年份:
    2023
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Establishment of a screening method for functional microproteins independent of amino acid sequence conservation
不依赖氨基酸序列保守性的功能性微生物蛋白筛选方法的建立
  • 批准号:
    23KJ0939
  • 财政年份:
    2023
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Effects of amino acid sequence and lipids on the structure and self-association of transmembrane helices
氨基酸序列和脂质对跨膜螺旋结构和自缔合的影响
  • 批准号:
    19K07013
  • 财政年份:
    2019
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Construction of electron-transfer amino acid sequence probe with an interaction for protein and cell
蛋白质与细胞相互作用的电子转移氨基酸序列探针的构建
  • 批准号:
    16K05820
  • 财政年份:
    2016
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Development of artificial antibody of anti-bitter taste receptor using random amino acid sequence library
利用随机氨基酸序列库开发抗苦味受体人工抗体
  • 批准号:
    16K08426
  • 财政年份:
    2016
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The aa15-17 amino acid sequence in the terminal protein domain of HBV polymerase as a viral factor affect-ing in vivo as well as in vitro replication activity of the virus.
HBV聚合酶末端蛋白结构域中的aa15-17氨基酸序列作为影响病毒体内和体外复制活性的病毒因子。
  • 批准号:
    25461010
  • 财政年份:
    2013
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Amino acid sequence analysis of fossil proteins using mass spectrometry
使用质谱法分析化石蛋白质的氨基酸序列
  • 批准号:
    23654177
  • 财政年份:
    2011
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
Precise hybrid synthesis of glycoprotein through amino acid sequence-specific introduction of oligosaccharide followed by enzymatic transglycosylation reaction
通过氨基酸序列特异性引入寡糖,然后进行酶促糖基转移反应,精确杂合合成糖蛋白
  • 批准号:
    22550105
  • 财政年份:
    2010
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Estimating selection on amino-acid sequence polymorphisms in Drosophila
果蝇氨基酸序列多态性选择的估计
  • 批准号:
    NE/D00232X/1
  • 财政年份:
    2006
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Research Grant
Construction of a neural network for detecting novel domains from amino acid sequence information only
构建仅从氨基酸序列信息检测新结构域的神经网络
  • 批准号:
    16500189
  • 财政年份:
    2004
  • 资助金额:
    $ 121.99万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了