Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences

数百万随机序列的选择性剪接和聚腺苷酸化的预测模型

基本信息

  • 批准号:
    9306648
  • 负责人:
  • 金额:
    $ 59.66万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-04-21 至 2021-01-31
  • 项目状态:
    已结题

项目摘要

The proportion of the human genome that underlies gene regulation dwarfs the proportion that encodes proteins. However, we remain poorly equipped for identifying which genetic variants compromise gene regulatory function in ways that may contribute to risk for both rare and common human diseases. Understanding how non-coding sequences regulate gene expression, as well as being able to predict the functional consequences of genetic variation for gene regulation, are paramount challenges for the field. Here, we propose to combine synthetic biology, massively parallel functional assays, and machine learning to profoundly advance our understanding of the `regulatory code' of the human genome. While challenging, the task of unravelling complex codes from large amounts of empirical data is not without precedent. For example, over the past decade, computer scientists working in natural language processing have made immense progress, driven in large part by a combination of algorithmic and computational improvements and enormously larger training datasets than were available to the previous generations of scientists working in this area. Inspired by the revolutionizing impact of “big data” for traditional problems in machine learning, we propose to model gene regulatory phenomena using training datasets with several orders of magnitude more examples than naturally exist in the human genome. We predict that the models learned from massive numbers of synthetic examples will strongly outperform models learned from the small number of natural examples. We will demonstrate our approach by developing comprehensive, quantitative, and predictive models for alternative splicing and alternative polyadenylation, two widespread regulatory mechanisms by which a single gene can code for multiple transcripts and proteins. However, we anticipate that this basic paradigm – specifically, the massively parallel measurement of the functional behavior of extremely large numbers of synthetic sequences followed by quantitative modeling of sequence-function relationships – can be generalized to advance our understanding of diverse forms of gene regulation.
构成基因调控基础的人类基因组的比例使编码基因的比例相形见绌 蛋白质。然而,我们仍然不具备识别哪些基因变异危害基因的能力。 监管职能可能会增加罕见和常见人类疾病的风险。 了解非编码序列如何调节基因表达,以及能够预测 基因变异对基因调控的功能后果是该领域面临的最大挑战。这里, 我们建议将合成生物学、大规模并行功能分析和机器学习结合起来 这将极大地促进我们对人类基因组“调控密码”的理解。在具有挑战性的同时, 从大量经验数据中解开复杂代码的任务并非没有先例。例如, 在过去的十年里,致力于自然语言处理的计算机科学家取得了巨大的成就 进步,在很大程度上是由算法和计算改进以及 比从事这项工作的前几代科学家可用的训练数据集要大得多 区域。受大数据对机器学习中传统问题的革命性影响的启发,我们 建议使用多几个数量级的训练数据集来模拟基因调控现象 这些例子比人类基因组中自然存在的要多。我们预测,从海量数据中学习的模型 合成示例的数量将大大超过从少量自然样本学习的模型 举个例子。我们将通过开发全面、定量和可预测的方法来展示我们的方法 选择性剪接和选择性聚腺苷酸化的模型,这是两种广泛存在的调控机制 单个基因可以编码多个转录本和蛋白质。然而,我们预计这一基本的 范式-具体地说,是对极大的 在序列-功能关系的定量建模之后的合成序列的数量-可以是 以促进我们对不同形式的基因调控的理解。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Georg Seelig其他文献

Georg Seelig的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Georg Seelig', 18)}}的其他基金

Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
  • 批准号:
    10633765
  • 财政年份:
    2023
  • 资助金额:
    $ 59.66万
  • 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
  • 批准号:
    10625987
  • 财政年份:
    2021
  • 资助金额:
    $ 59.66万
  • 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
  • 批准号:
    10375354
  • 财政年份:
    2021
  • 资助金额:
    $ 59.66万
  • 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
  • 批准号:
    9886581
  • 财政年份:
    2020
  • 资助金额:
    $ 59.66万
  • 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
  • 批准号:
    10341212
  • 财政年份:
    2020
  • 资助金额:
    $ 59.66万
  • 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
  • 批准号:
    10161803
  • 财政年份:
    2020
  • 资助金额:
    $ 59.66万
  • 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
  • 批准号:
    9977420
  • 财政年份:
    2020
  • 资助金额:
    $ 59.66万
  • 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
  • 批准号:
    10112854
  • 财政年份:
    2020
  • 资助金额:
    $ 59.66万
  • 项目类别:
A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
  • 批准号:
    9894822
  • 财政年份:
    2018
  • 资助金额:
    $ 59.66万
  • 项目类别:

相似海外基金

DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
  • 批准号:
    EP/Y029089/1
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
  • 批准号:
    2337776
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
  • 批准号:
    2338816
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
  • 批准号:
    2348261
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
  • 批准号:
    2348346
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
  • 批准号:
    2348457
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
  • 批准号:
    2404989
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
  • 批准号:
    2339310
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
  • 批准号:
    2339669
  • 财政年份:
    2024
  • 资助金额:
    $ 59.66万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了