A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
基本信息
- 批准号:9894822
- 负责人:
- 金额:$ 47.31万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-06-05 至 2021-03-31
- 项目状态:已结题
- 来源:
- 关键词:3&apos Untranslated Regions5&apos Untranslated RegionsAddressAffectAlternative SplicingBinding SitesBiologicalBiological AssayBiologyBiotechnologyCodeComputational TechniqueDNA LibraryDataData SetDiseaseElementsEngineeringExpression LibraryGene ExpressionGenesGenetic ProgrammingGenetic TranscriptionGenetic TranslationGenetic VariationGenomeGenomicsHumanHuman EngineeringHuman GeneticsHuman GenomeImageIn VitroLearningLibrariesMachine LearningMeasuresMessenger RNAMethodsModelingNeural Network SimulationPerformancePolyribosomesProductionPropertyProteinsRNARNA SplicingRNA StabilityRNA-Binding ProteinsRandomizedRegulator GenesRegulatory ElementResearch Project GrantsResolutionRibosomesRoleSiteSourceStructureTechniquesTestingTherapeuticThiouridineTimeTrainingTranscriptTranslatingTranslationsUntranslated RegionsValidationVariantWorkbasecomparativecomputer scienceconvolutional neural networkdesignexperimental studygenetic variantmRNA Stabilitymachine visionmemberneural networknovelpolysome profilingpractical applicationpredictive modelingprotein expressionribosome profilingscreeningstable cell linestatistical learningsynthetic constructvoice recognition
项目摘要
The leading and trailing untranslated regions (UTRs) of an mRNA, along with the coding sequence (CDS),
control protein production by modulating translation and mRNA stability. However, although we have identified
a vast number of regulatory features in these regions, we are still far from being able to predict, for example,
whether and how a sequence variant affects the levels of protein being made. Here, we propose to combine
high-throughput experimental characterization of protein expression in synthetic libraries with machine learning
to create predictive models of translation and mRNA stability, addressing an urgent need. Recent progress in
machine vision, voice recognition and other fields of computer science has been driven by the availability of
enormous data sets on which to train models. Machine learning approaches have also had remarkable impact
in biology, but biological data sets often are comparatively small, limiting the quality of models that can be
learned. For example, there are only around 20,000 genes in the human genome, a restrictively small set of
examples for training a predictive model that captures the full extent of the genome’s “regulatory code.” In this
proposal, we aim to overcome this data size limitation by training predictive models of protein expression on
data from millions of synthetic constructs -- a data set several orders of magnitude larger than the number of
genes in the genome. Specifically, we will create libraries of in vitro transcribed mRNA with targeted variation
in the UTRs and CDS and will assay protein expression of each library member by performing high-throughput
polysome profiling, ribosome profiling, and mRNA stability assays. We will then use neural network
approaches to learn predictive models of the relationship between mRNA sequence and levels of protein
production. We will apply our models to three applications of practical importance: first, we expect to uncover
novel biology, for example identifying regulatory sequence elements and interactions between them. Second,
we will validate our models through the de novo design and experimental testing of sequences that result in
higher levels or protein production than any of the millions of randomly generated members of the original
library or than the endogenous UTR sequences currently used in biotechnology. Such stable and highly
translating mRNA constructs would be of particular value for the field or mRNA therapeutics. Third, we will
predict the functional consequences of genetic variation in UTRs on protein production and we will validate
these predictions experimentally. We are far from understanding which genetic variants compromise gene
regulatory function in ways that may contribute to disease, making such a comprehensive and quantitative
analysis of variants valuable.
MRNA的前导和尾随非翻译区(UTRs),以及编码序列(CDS),
通过调节翻译和信使核糖核酸的稳定性控制蛋白质的产生。然而,尽管我们已经确定
这些地区的大量监管特征,我们还远远无法预测,例如,
序列变异是否以及如何影响所制造的蛋白质水平。在这里,我们建议将
用机器学习高通量实验表征合成文库中的蛋白质表达
建立翻译和信使核糖核酸稳定性的预测模型,以满足迫切的需求。的最新进展
机器视觉、语音识别和其他计算机科学领域的驱动因素是
用来训练模型的海量数据集。机器学习方法也产生了显著的影响
在生物学中,但生物学数据集往往相对较小,限制了可以
学到了东西。例如,人类基因组中只有大约20,000个基因,这是一个限制性很小的集合
训练预测模型的例子,该模型捕捉到基因组“调控密码”的全部范围。在这
建议,我们的目标是通过训练蛋白质表达的预测模型来克服这种数据大小限制
来自数百万个合成结构的数据--一个数据集比
基因组中的基因。具体地说,我们将创建具有靶向变异的体外转录的mRNA文库
在UTRs和CDS中,并将通过执行高通量分析每个文库成员的蛋白质表达
多聚体图谱、核糖体图谱和信使核糖核酸稳定性分析。然后我们将使用神经网络
学习mRNA序列与蛋白质水平关系预测模型的方法
制作。我们将把我们的模型应用于三个具有实际意义的应用:第一,我们希望发现
新的生物学,例如识别调控序列元件以及它们之间的相互作用。第二,
我们将通过从头设计和序列的实验测试来验证我们的模型
比原始的数百万随机产生的成员中的任何一个产生更高的水平或蛋白质
文库或比目前生物技术中使用的内源非编码区序列更多的序列。这样的稳定和高度
翻译信使核糖核酸结构将对该领域或信使核糖核酸疗法具有特殊的价值。第三,我们将
预测UTRs遗传变异对蛋白质生产的功能影响,我们将验证
这些预测是实验性的。我们还远不能理解哪些基因变异会损害基因
以可能导致疾病的方式发挥调节功能,使这样一个全面和量化的
对变异的分析很有价值。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georg Seelig其他文献
Georg Seelig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georg Seelig', 18)}}的其他基金
Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
- 批准号:
10633765 - 财政年份:2023
- 资助金额:
$ 47.31万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10625987 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10375354 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
9886581 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
10341212 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
10161803 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
9977420 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
- 批准号:
10112854 - 财政年份:2020
- 资助金额:
$ 47.31万 - 项目类别:
Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences
数百万随机序列的选择性剪接和聚腺苷酸化的预测模型
- 批准号:
9306648 - 财政年份:2017
- 资助金额:
$ 47.31万 - 项目类别:
相似海外基金
Impact of alternative polyadenylation of 3'-untranslated regions in the PI3K/AKT cascade on microRNA
PI3K/AKT 级联中 3-非翻译区的替代多聚腺苷酸化对 microRNA 的影响
- 批准号:
573541-2022 - 财政年份:2022
- 资助金额:
$ 47.31万 - 项目类别:
University Undergraduate Student Research Awards
How do untranslated regions of cannabinoid receptor type 1 mRNA determine receptor subcellular localisation and function?
1 型大麻素受体 mRNA 的非翻译区如何决定受体亚细胞定位和功能?
- 批准号:
2744317 - 财政年份:2022
- 资助金额:
$ 47.31万 - 项目类别:
Studentship
MICA:Synthetic untranslated regions for direct delivery of therapeutic mRNAs
MICA:用于直接递送治疗性 mRNA 的合成非翻译区
- 批准号:
MR/V010948/1 - 财政年份:2021
- 资助金额:
$ 47.31万 - 项目类别:
Research Grant
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10019570 - 财政年份:2019
- 资助金额:
$ 47.31万 - 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10223370 - 财政年份:2019
- 资助金额:
$ 47.31万 - 项目类别:
Translational Control by 5'-untranslated regions
5-非翻译区域的翻译控制
- 批准号:
10455108 - 财政年份:2019
- 资助金额:
$ 47.31万 - 项目类别:
Synergistic microRNA-binding sites, and 3' untranslated regions: a dialogue of silence
协同的 microRNA 结合位点和 3 非翻译区:沉默的对话
- 批准号:
255762 - 财政年份:2012
- 资助金额:
$ 47.31万 - 项目类别:
Operating Grants
Analysis of long untranslated regions in Nipah virus genome
尼帕病毒基因组长非翻译区分析
- 批准号:
20790351 - 财政年份:2008
- 资助金额:
$ 47.31万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Search for mRNA elements involved in the compatibility between 5' untranslated regions and coding regions in chloroplast translation
寻找参与叶绿体翻译中 5 非翻译区和编码区之间兼容性的 mRNA 元件
- 批准号:
19370021 - 财政年份:2007
- 资助金额:
$ 47.31万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Post-transcriptional Regulation of PPAR-g Expression by 5'-Untranslated Regions
5-非翻译区对 PPAR-g 表达的转录后调控
- 批准号:
7131841 - 财政年份:2006
- 资助金额:
$ 47.31万 - 项目类别:














{{item.name}}会员




