权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics

用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型

基本信息

批准号：
9894822
负责人：
Georg Seelig
金额：
$ 47.31万
依托单位：
UNIVERSITY OF WASHINGTON
依托单位国家：
美国
项目类别：
财政年份：
2018
资助国家：
美国
起止时间：
2018-06-05 至 2021-03-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/9894822
关键词：
3&apos Untranslated Regions 5&apos Untranslated Regions Address Affect Alternative Splicing Binding Sites Biological Biological Assay Biology Biotechnology Code Computational Technique DNA Library Data Data Set Disease Elements Engineering Expression Library Gene Expression Genes Genetic Programming Genetic Transcription Genetic Translation Genetic Variation Genome Genomics Human Human Engineering Human Genetics Human Genome Image In Vitro Learning Libraries Machine Learning Measures Messenger RNA Methods Modeling Neural Network Simulation Performance Polyribosomes Production Property Proteins RNA RNA Splicing RNA Stability RNA-Binding Proteins Randomized Regulator Genes Regulatory Element Research Project Grants Resolution Ribosomes Role Site Source Structure Techniques Testing Therapeutic Thiouridine Time Training Transcript Translating Translations Untranslated Regions Validation Variant Work base comparative computer science convolutional neural network design experimental study genetic variant mRNA Stability machine vision member neural network novel polysome profiling practical application predictive modeling protein expression ribosome profiling screening stable cell line statistical learning synthetic construct voice recognition

项目摘要

The leading and trailing untranslated regions (UTRs) of an mRNA, along with the coding sequence (CDS), control protein production by modulating translation and mRNA stability. However, although we have identified a vast number of regulatory features in these regions, we are still far from being able to predict, for example, whether and how a sequence variant affects the levels of protein being made. Here, we propose to combine high-throughput experimental characterization of protein expression in synthetic libraries with machine learning to create predictive models of translation and mRNA stability, addressing an urgent need. Recent progress in machine vision, voice recognition and other fields of computer science has been driven by the availability of enormous data sets on which to train models. Machine learning approaches have also had remarkable impact in biology, but biological data sets often are comparatively small, limiting the quality of models that can be learned. For example, there are only around 20,000 genes in the human genome, a restrictively small set of examples for training a predictive model that captures the full extent of the genome’s “regulatory code.” In this proposal, we aim to overcome this data size limitation by training predictive models of protein expression on data from millions of synthetic constructs -- a data set several orders of magnitude larger than the number of genes in the genome. Specifically, we will create libraries of in vitro transcribed mRNA with targeted variation in the UTRs and CDS and will assay protein expression of each library member by performing high-throughput polysome profiling, ribosome profiling, and mRNA stability assays. We will then use neural network approaches to learn predictive models of the relationship between mRNA sequence and levels of protein production. We will apply our models to three applications of practical importance: first, we expect to uncover novel biology, for example identifying regulatory sequence elements and interactions between them. Second, we will validate our models through the de novo design and experimental testing of sequences that result in higher levels or protein production than any of the millions of randomly generated members of the original library or than the endogenous UTR sequences currently used in biotechnology. Such stable and highly translating mRNA constructs would be of particular value for the field or mRNA therapeutics. Third, we will predict the functional consequences of genetic variation in UTRs on protein production and we will validate these predictions experimentally. We are far from understanding which genetic variants compromise gene regulatory function in ways that may contribute to disease, making such a comprehensive and quantitative analysis of variants valuable.

mRNA 的前导和尾随非翻译区 (UTR) 以及编码序列 (CDS)，通过调节翻译和 mRNA 稳定性来控制蛋白质的产生。然而，尽管我们已经确定这些地区的大量监管特征，我们还远远无法预测，例如，序列变异是否以及如何影响所产生的蛋白质水平。在这里，我们建议结合通过机器学习对合成文库中蛋白质表达进行高通量实验表征创建翻译和 mRNA 稳定性的预测模型，解决迫切需求。最近的进展机器视觉、语音识别和计算机科学的其他领域一直受到以下因素的推动：用于训练模型的巨大数据集。机器学习方法也产生了显着的影响在生物学中，但生物数据集通常相对较小，限制了可建模模型的质量学到了。例如，人类基因组中只有大约 20,000 个基因，这是一个有限的小集合。训练预测模型的示例，该模型捕获基因组“监管代码”的全部范围。在这个建议，我们的目标是通过训练蛋白质表达的预测模型来克服这种数据大小的限制来自数百万个合成结构的数据——一个比合成结构的数量大几个数量级的数据集基因组中的基因。具体来说，我们将创建具有目标变异的体外转录 mRNA 文库在 UTR 和 CDS 中，并将通过执行高通量分析来分析每个文库成员的蛋白质表达多核糖体分析、核糖体分析和 mRNA 稳定性测定。然后我们将使用神经网络学习 mRNA 序列和蛋白质水平之间关系的预测模型的方法生产。我们将把我们的模型应用到三个具有实际重要性的应用中：首先，我们期望发现新颖的生物学，例如识别调控序列元件及其之间的相互作用。第二，我们将通过从头设计和序列实验测试来验证我们的模型比原始数百万个随机生成的成员中的任何一个都具有更高的水平或蛋白质产量文库或当前生物技术中使用的内源UTR序列。如此稳定且高度翻译 mRNA 构建体对于 mRNA 治疗领域具有特殊价值。第三，我们将预测 UTR 遗传变异对蛋白质生产的功能影响，我们将验证这些预测是通过实验得出的。我们还远未了解哪些遗传变异会损害基因调节功能可能会导致疾病，从而使这种全面和定量的有价值的变体分析。