A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
基本信息
- 批准号:9894822
- 负责人:
- 金额:$ 47.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-06-05 至 2021-03-31
- 项目状态:已结题
- 来源:
- 关键词:3&apos Untranslated Regions5&apos Untranslated RegionsAddressAffectAlternative SplicingBinding SitesBiologicalBiological AssayBiologyBiotechnologyCodeComputational TechniqueDNA LibraryDataData SetDiseaseElementsEngineeringExpression LibraryGene ExpressionGenesGenetic ProgrammingGenetic TranscriptionGenetic TranslationGenetic VariationGenomeGenomicsHumanHuman EngineeringHuman GeneticsHuman GenomeImageIn VitroLearningLibrariesMachine LearningMeasuresMessenger RNAMethodsModelingNeural Network SimulationPerformancePolyribosomesProductionPropertyProteinsRNARNA SplicingRNA StabilityRNA-Binding ProteinsRandomizedRegulator GenesRegulatory ElementResearch Project GrantsResolutionRibosomesRoleSiteSourceStructureTechniquesTestingTherapeuticThiouridineTimeTrainingTranscriptTranslatingTranslationsUntranslated RegionsValidationVariantWorkbasecomparativecomputer scienceconvolutional neural networkdesignexperimental studygenetic variantmRNA Stabilitymachine visionmemberneural networknovelpolysome profilingpractical applicationpredictive modelingprotein expressionribosome profilingscreeningstable cell linestatistical learningsynthetic constructvoice recognition
项目摘要
The leading and trailing untranslated regions (UTRs) of an mRNA, along with the coding sequence (CDS),
control protein production by modulating translation and mRNA stability. However, although we have identified
a vast number of regulatory features in these regions, we are still far from being able to predict, for example,
whether and how a sequence variant affects the levels of protein being made. Here, we propose to combine
high-throughput experimental characterization of protein expression in synthetic libraries with machine learning
to create predictive models of translation and mRNA stability, addressing an urgent need. Recent progress in
machine vision, voice recognition and other fields of computer science has been driven by the availability of
enormous data sets on which to train models. Machine learning approaches have also had remarkable impact
in biology, but biological data sets often are comparatively small, limiting the quality of models that can be
learned. For example, there are only around 20,000 genes in the human genome, a restrictively small set of
examples for training a predictive model that captures the full extent of the genome’s “regulatory code.” In this
proposal, we aim to overcome this data size limitation by training predictive models of protein expression on
data from millions of synthetic constructs -- a data set several orders of magnitude larger than the number of
genes in the genome. Specifically, we will create libraries of in vitro transcribed mRNA with targeted variation
in the UTRs and CDS and will assay protein expression of each library member by performing high-throughput
polysome profiling, ribosome profiling, and mRNA stability assays. We will then use neural network
approaches to learn predictive models of the relationship between mRNA sequence and levels of protein
production. We will apply our models to three applications of practical importance: first, we expect to uncover
novel biology, for example identifying regulatory sequence elements and interactions between them. Second,
we will validate our models through the de novo design and experimental testing of sequences that result in
higher levels or protein production than any of the millions of randomly generated members of the original
library or than the endogenous UTR sequences currently used in biotechnology. Such stable and highly
translating mRNA constructs would be of particular value for the field or mRNA therapeutics. Third, we will
predict the functional consequences of genetic variation in UTRs on protein production and we will validate
these predictions experimentally. We are far from understanding which genetic variants compromise gene
regulatory function in ways that may contribute to disease, making such a comprehensive and quantitative
analysis of variants valuable.
mRNA的前导和尾随非翻译区(UTR),沿着编码序列(CDS),
通过调节翻译和mRNA稳定性来控制蛋白质的产生。然而,尽管我们已经确定
这些地区的大量监管特征,我们还远远不能预测,例如,
序列变异是否以及如何影响蛋白质的合成水平。在此,我们建议将联合收割机
利用机器学习对合成文库中蛋白质表达进行高通量实验表征
创建翻译和mRNA稳定性的预测模型,以满足迫切的需求。研究进展
机器视觉、语音识别和计算机科学的其他领域一直受到
巨大的数据集来训练模型。机器学习方法也产生了显着的影响
在生物学中,但生物学数据集往往相对较小,限制了可以
学例如,人类基因组中只有大约20,000个基因,
用于训练预测模型的示例,该模型捕获了基因组的“调控代码”的全部范围。在这
建议,我们的目标是通过训练蛋白质表达的预测模型来克服这种数据大小的限制,
来自数百万个合成结构的数据--一个比合成结构的数量大几个数量级的数据集。
基因组中的基因。具体来说,我们将创建具有靶向变异的体外转录mRNA库
在UTR和CDS中,并将通过进行高通量分析来测定每个文库成员的蛋白质表达。
多核糖体分析、核糖体分析和mRNA稳定性测定。我们将使用神经网络
了解mRNA序列和蛋白质水平之间关系的预测模型的方法
生产我们将把我们的模型应用到三个具有实际意义的应用中:首先,我们希望发现
新的生物学,例如鉴定调控序列元件和它们之间的相互作用。第二、
我们将通过重新设计和实验测试序列来验证我们的模型,
更高的水平或蛋白质生产比任何的数百万随机产生的成员的原始
文库或目前生物技术中使用的内源UTR序列。如此稳定和高度
翻译mRNA构建体对于该领域或mRNA治疗学将具有特别的价值。三是
预测UTR中遗传变异对蛋白质产生的功能后果,我们将验证
这些预测实验。我们还远未了解哪些遗传变异会损害基因
调节功能的方式,可能有助于疾病,使这样一个全面的和定量的
变种分析有价值
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georg Seelig其他文献
Georg Seelig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georg Seelig', 18)}}的其他基金
Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
- 批准号:
10633765 - 财政年份:2023
- 资助金额:
$ 47.31万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10625987 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10375354 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
9886581 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
10341212 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
10161803 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
9977420 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
- 批准号:
10112854 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences
数百万随机序列的选择性剪接和聚腺苷酸化的预测模型
- 批准号:
9306648 - 财政年份:2017
- 资助金额:
$ 47.31万 - 项目类别:
相似海外基金
Impact of alternative polyadenylation of 3'-untranslated regions in the PI3K/AKT cascade on microRNA
PI3K/AKT 级联中 3-非翻译区的替代多聚腺苷酸化对 microRNA 的影响
- 批准号:
573541-2022 - 财政年份:2022
- 资助金额:
$ 47.31万 - 项目类别:
University Undergraduate Student Research Awards
How do untranslated regions of cannabinoid receptor type 1 mRNA determine receptor subcellular localisation and function?
1 型大麻素受体 mRNA 的非翻译区如何决定受体亚细胞定位和功能?
- 批准号:
2744317 - 财政年份:2022
- 资助金额:
$ 47.31万 - 项目类别:
Studentship
MICA:Synthetic untranslated regions for direct delivery of therapeutic mRNAs
MICA:用于直接递送治疗性 mRNA 的合成非翻译区
- 批准号:
MR/V010948/1 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
Research Grant
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10019570 - 财政年份:2019
- 资助金额:
$ 47.31万 - 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10223370 - 财政年份:2019
- 资助金额:
$ 47.31万 - 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10455108 - 财政年份:2019
- 资助金额:
$ 47.31万 - 项目类别:
Synergistic microRNA-binding sites, and 3' untranslated regions: a dialogue of silence
协同的 microRNA 结合位点和 3 非翻译区:沉默的对话
- 批准号:
255762 - 财政年份:2012
- 资助金额:
$ 47.31万 - 项目类别:
Operating Grants
Analysis of long untranslated regions in Nipah virus genome
尼帕病毒基因组长非翻译区分析
- 批准号:
20790351 - 财政年份:2008
- 资助金额:
$ 47.31万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Search for mRNA elements involved in the compatibility between 5' untranslated regions and coding regions in chloroplast translation
寻找参与叶绿体翻译中 5 非翻译区和编码区之间兼容性的 mRNA 元件
- 批准号:
19370021 - 财政年份:2007
- 资助金额:
$ 47.31万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Post-transcriptional Regulation of PPAR-g Expression by 5'-Untranslated Regions
5-非翻译区对 PPAR-g 表达的转录后调控
- 批准号:
7131841 - 财政年份:2006
- 资助金额:
$ 47.31万 - 项目类别: