权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen

快速应对流行病：单细胞测序和深度学习预测针对新兴抗原的抗体序列

基本信息

批准号：
10845715
负责人：
Jeniffer Bertha Hernandez
金额：
$ 121.99万
依托单位：
KECK GRADUATE INST OF APPLIED LIFE SCIS
依托单位国家：
美国
项目类别：
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-16 至 2024-08-31
项目状态：
已结题

项目摘要

ABSTRACT One of the “holy grails” in immunology is to be able to directly predict tight-binding variable chain antibody sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However, only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins. The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure- based network topographical features of the antigens and their cognate antibodies, and will output their respective binding affinity constants. We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based ML approaches for the immune system, to discover associations between the epitope and the variable chain features. This approach requires a large data stream of antigen and cognate antibody sequences, which until recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled with single cell deep sequencing (“linking B cell receptor to antigen specificity through sequencing” or LIBRA- seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity to antigenic epitopes. Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls), chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training sets, along with other data sets already available in public databases, will generate a series of structural features (described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer science to the solution of a general biological engineering problem. Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine- learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial enzymes, will also be dramatically enhanced if this project were to be successful. The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein crystallographer, an immunologist, and a molecular biologist. 1

摘要免疫学的“圣杯”之一是能够直接预测紧密结合的可变链抗体计算机模拟的抗外来或非自身“抗原性”蛋白的序列。免疫球蛋白链重排可以可能编码抗体重链和轻链序列的大约1016种不同变体。然而，在这方面，通常只有一小部分序列空间可用于进化抗外源蛋白的抗体。计算的挑战是从抗原的结构模型到预测一组抗体可以紧密结合抗原的链序列。如果解决了，可能在24小时内移动从一种新型病毒蛋白的第一个冷冻电子显微镜结构，用于测试的分子候选物。为了解决这个问题，该项目旨在开发一种深度学习体系结构，将作为输入热力学，量子力学（密度泛函），和本地结构- 基于抗原及其同源抗体的网络拓扑特征，并将输出其各自的结合亲和力常数。我们将设计一个生成对抗网络（GAN），我们认为它非常适合基于回归的 ML方法用于免疫系统，以发现表位和可变链之间的关联功能.这种方法需要大量的抗原和同源抗体序列的数据流，最近很难获得。最近描述的单个B细胞受体（BCR）特异性标记方法，用单细胞深度测序（“通过测序将B细胞受体与抗原特异性连接”或LIBRA- seq）可以快速分离和测序能够以高选择性结合的BCR可变链编码区抗原表位。针对具体的项目目标，在任务1中，LIBRA-seq将用于快速识别和生成候选人免疫球蛋白编码序列对特异性线性和非线性表位的应答（对照），通过计算/分子建模选择并优先考虑SARS-CoV-2刺突蛋白表位（但不限于这些），注射到小鼠模型中，以生成大型训练集;在任务2中，这些训练沿着公共数据库中已有的其他数据集，将产生一系列结构特征（如上所述），其将用于训练GAN;在任务3中，预测的表位-抗体相互作用将通过合成抗体和噬菌体展示系统的直接实验进行验证。因此，拟议的战略结合了进化生物学、基因组学、结构化学和计算机的基本原理解决一般生物工程问题的科学。该项目的结果预计将为经过严格测试的全自动机器奠定基础- 一个学习系统，可以从一种新病毒的结构中快速生成合成抗体候选物。蛋白质，这可以提高对未来大流行病的快速反应能力。有针对性地开发能力针对非传染性或慢性疾病的抗体治疗，以及基于抗体的工业生产如果这个项目成功的话，酶也将得到极大的提高。团队：这个多机构研究项目的团队领导包括一位计算机科学家，一位蛋白质科学家，晶体学家、免疫学家和分子生物学家。 1