Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
基本信息
- 批准号:10274223
- 负责人:
- 金额:$ 185.16万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-09-16 至 2024-08-31
- 项目状态:已结题
- 来源:
- 关键词:AffinityAmino Acid SequenceAntibodiesAntibody FormationAntibody SpecificityAntibody TherapyAntigen-Antibody ComplexAntigensArchitectureB-Cell Antigen ReceptorB-Cell Receptor BindingB-LymphocytesBase SequenceBindingBiologicalBiologyCellsChronic DiseaseCodeComputer ModelsComputersComputing MethodologiesCoupledDataData SetDatabasesDegenerative DisorderDevelopmentDiagnosisEconomicsElectronsEngineeringEnzymesEpitopesEquilibriumFoundationsFutureGenesGenomicsGoalsHourImmune systemImmunizeImmunoassayImmunoglobulinsImmunologistImmunologyIndustrializationLigandsLightLinkMachine LearningMalignant NeoplasmsMeasurableMechanicsMethodsMicroscopicModelingMolecularMolecular BiologyMolecular ComputationsMusNatureNetwork-basedNeural Network SimulationOutputPassive ImmunotherapyPhage DisplayPhasePlayProblem SolvingProcessProductionProteinsReadinessReagentResearch Project GrantsSARS-CoV-2 antigenSARS-CoV-2 spike proteinSavingsScientistSeriesSpecificityStructural ChemistryStructural ModelsStructureSurface Plasmon ResonanceSystemTestingTherapeuticTherapeutic antibodiesThermodynamicsTimeTrainingVaccinesValidationVariantViralViral AntigensViral ProteinsWorkbasecombatcomputer sciencedata streamsdatabase structuredeep learningdeep neural networkdeep sequencingdensitydesignexperimental studyhigh dimensionalityin silicoinnovationinsightlarge datasetsmachine learning methodmolecular modelingmouse modelneutralizing antibodynovelnovel viruspandemic diseasepandemic preparednesspathogenphysical propertyprotein structurequantumresponsescaffoldsimulationsingle cell sequencingsynthetic antibodiestherapeutic evaluationthree dimensional structure
项目摘要
ABSTRACT
One of the “holy grails” in immunology is to be able to directly predict tight-binding variable chain antibody
sequences in silico against foreign or non-self `antigenic' proteins. Immunoglobulin chain rearrangement can
potentially encode approximately 1016 different variants of antibody heavy and light chain sequences. However,
only a small fraction of the sequence space is generally accessed for evolving antibodies against foreign proteins.
The computational challenge is to go from a model of the structure of an antigen to predicting a set of antibody
chain sequences that can bind tightly to the antigen. If solved, it might be possible to move in less than 24 hours
from the first cryo-electron-microscopic structure of a novel viral protein to advance a set of potent antibody-like
molecular candidates for testing. Towards solving this problem, this project aims to develop a deep learning
architecture that will take as input thermodynamic, quantum mechanical (density functional), and local structure-
based network topographical features of the antigens and their cognate antibodies, and will output their
respective binding affinity constants.
We will design a generative adversarial network (GAN), which we think is uniquely suited for regression-based
ML approaches for the immune system, to discover associations between the epitope and the variable chain
features. This approach requires a large data stream of antigen and cognate antibody sequences, which until
recently was difficult to obtain. A recently described single B-cell receptor (BCR) specific tagging method coupled
with single cell deep sequencing (“linking B cell receptor to antigen specificity through sequencing” or LIBRA-
seq) can rapidly isolate and sequence the BCR variable chain coding regions that can bind with high selectivity
to antigenic epitopes.
Towards the specific project goals, in Task 1, LIBRA-seq will be used to rapidly identify and generate candidate
immunoglobulin coding sequences in response to specific linear and nonlinear epitopes (against controls),
chosen through computational/molecular modeling and prioritized with SARS-CoV-2 Spike protein epitopes (but
not restricted to these), injected into a mouse model, to generate large training sets; in Task 2, these training
sets, along with other data sets already available in public databases, will generate a series of structural features
(described above), which will be used to train the GAN; in Task 3, the predicted epitope-antibody interactions
will be validated by direct experiments with synthetic antibody and phage-display systems. Thus, the proposed
strategy combines foundational principles in evolutionary biology, genomics, structural chemistry, and computer
science to the solution of a general biological engineering problem.
Results from this project are expected to lay the foundations for a rigorously tested and fully automated machine-
learning system that could rapidly generate synthetic antibody candidates from the structure of a novel virus
protein, which can enhance the rapid response ability against a future pandemic. The ability to develop targeted
antibody therapy against non-infectious or chronic diseases, and on the production of antibody-based industrial
enzymes, will also be dramatically enhanced if this project were to be successful.
The team: The team-leads of this multi-institutional research project comprise a computer scientist, a protein
crystallographer, an immunologist, and a molecular biologist.
1
摘要
免疫学的“圣杯”之一是能够直接预测紧密结合的可变链抗体
计算机模拟的抗外来或非自身“抗原性”蛋白的序列。免疫球蛋白链重排可以
可能编码抗体重链和轻链序列的大约1016种不同变体。然而,在这方面,
通常只有一小部分序列空间可用于进化抗外源蛋白的抗体。
计算的挑战是从抗原的结构模型到预测一组抗体
可以紧密结合抗原的链序列。如果解决了,可能在24小时内移动
从一种新型病毒蛋白的第一个冷冻电子显微镜结构,
用于测试的分子候选物。为了解决这个问题,该项目旨在开发一种深度学习
体系结构,将作为输入热力学,量子力学(密度泛函),和本地结构-
基于抗原及其同源抗体的网络拓扑特征,并将输出其
各自的结合亲和力常数。
我们将设计一个生成对抗网络(GAN),我们认为它非常适合基于回归的
免疫系统的ML方法,以发现表位和可变链之间的关联
功能.这种方法需要大量的抗原和同源抗体序列的数据流,
最近很难获得。最近描述的单个B细胞受体(BCR)特异性标记方法,
用单细胞深度测序(“通过测序将B细胞受体与抗原特异性连接”或LIBRA-
seq)可以快速分离和测序能够以高选择性结合的BCR可变链编码区
抗原表位。
针对具体的项目目标,在任务1中,LIBRA-seq将用于快速识别和生成候选人
免疫球蛋白编码序列对特异性线性和非线性表位的应答(对照),
通过计算/分子建模选择并优先考虑SARS-CoV-2刺突蛋白表位(但
不限于这些),注射到小鼠模型中,以生成大型训练集;在任务2中,这些训练
沿着公共数据库中已有的其他数据集,将产生一系列结构特征
(如上所述),其将用于训练GAN;在任务3中,预测的表位-抗体相互作用
将通过合成抗体和噬菌体展示系统的直接实验进行验证。因此,拟议的
战略结合了进化生物学、基因组学、结构化学和计算机的基本原理
解决一般生物工程问题的科学。
该项目的结果预计将为经过严格测试的全自动机器奠定基础-
一个学习系统,可以从一种新病毒的结构中快速生成合成抗体候选物。
蛋白质,这可以提高对未来大流行病的快速反应能力。有针对性地开发能力
针对非传染性或慢性疾病的抗体治疗,以及基于抗体的工业生产
如果这个项目成功的话,酶也将得到极大的提高。
团队:这个多机构研究项目的团队领导包括一位计算机科学家,一位蛋白质科学家,
晶体学家、免疫学家和分子生物学家。
1
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jeniffer Bertha Hernandez其他文献
Jeniffer Bertha Hernandez的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jeniffer Bertha Hernandez', 18)}}的其他基金
Rapid response for pandemics: single cell sequencing and deep learning to predict antibody sequences against an emerging antigen
快速应对流行病:单细胞测序和深度学习预测针对新兴抗原的抗体序列
- 批准号:
10845715 - 财政年份:2021
- 资助金额:
$ 185.16万 - 项目类别:
相似海外基金
Cerebral infarction treatment strategy using collagen-like "triple helix peptide" containing functional amino acid sequence
含功能氨基酸序列的类胶原“三螺旋肽”治疗脑梗塞策略
- 批准号:
23K06972 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Establishment of a screening method for functional microproteins independent of amino acid sequence conservation
不依赖氨基酸序列保守性的功能性微生物蛋白筛选方法的建立
- 批准号:
23KJ0939 - 财政年份:2023
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Effects of amino acid sequence and lipids on the structure and self-association of transmembrane helices
氨基酸序列和脂质对跨膜螺旋结构和自缔合的影响
- 批准号:
19K07013 - 财政年份:2019
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Construction of electron-transfer amino acid sequence probe with an interaction for protein and cell
蛋白质与细胞相互作用的电子转移氨基酸序列探针的构建
- 批准号:
16K05820 - 财政年份:2016
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Development of artificial antibody of anti-bitter taste receptor using random amino acid sequence library
利用随机氨基酸序列库开发抗苦味受体人工抗体
- 批准号:
16K08426 - 财政年份:2016
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The aa15-17 amino acid sequence in the terminal protein domain of HBV polymerase as a viral factor affect-ing in vivo as well as in vitro replication activity of the virus.
HBV聚合酶末端蛋白结构域中的aa15-17氨基酸序列作为影响病毒体内和体外复制活性的病毒因子。
- 批准号:
25461010 - 财政年份:2013
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Amino acid sequence analysis of fossil proteins using mass spectrometry
使用质谱法分析化石蛋白质的氨基酸序列
- 批准号:
23654177 - 财政年份:2011
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Challenging Exploratory Research
Precise hybrid synthesis of glycoprotein through amino acid sequence-specific introduction of oligosaccharide followed by enzymatic transglycosylation reaction
通过氨基酸序列特异性引入寡糖,然后进行酶促糖基转移反应,精确杂合合成糖蛋白
- 批准号:
22550105 - 财政年份:2010
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Estimating selection on amino-acid sequence polymorphisms in Drosophila
果蝇氨基酸序列多态性选择的估计
- 批准号:
NE/D00232X/1 - 财政年份:2006
- 资助金额:
$ 185.16万 - 项目类别:
Research Grant
Construction of a neural network for detecting novel domains from amino acid sequence information only
构建仅从氨基酸序列信息检测新结构域的神经网络
- 批准号:
16500189 - 财政年份:2004
- 资助金额:
$ 185.16万 - 项目类别:
Grant-in-Aid for Scientific Research (C)