Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences
数百万随机序列的选择性剪接和聚腺苷酸化的预测模型
基本信息
- 批准号:9306648
- 负责人:
- 金额:$ 59.66万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-04-21 至 2021-01-31
- 项目状态:已结题
- 来源:
- 关键词:AdoptedAlgorithmsAlternative SplicingAreaBasic ScienceBehaviorBig DataBiological AssayBiological PhenomenaCRISPR/Cas technologyClinical MedicineCodeComplexComputersDNA SequenceDataData SetDatabasesDependencyDiseaseGene ExpressionGene Expression RegulationGenerationsGenesGeneticGenetic PolymorphismGenetic VariationGenomeGenomicsHaplotypesHumanHuman GenomeLeadLearningLibrariesMachine LearningMeasurementMeasuresMediatingMendelian disorderModelingMutationNatural Language ProcessingNucleotidesPolyadenylationProtein IsoformsProteinsPublishingRNA SplicingRNA-Binding ProteinsRegulationRegulator GenesReporterResearchRiskScientistShapesSpecific qualifier valueTestingTrainingTranscriptUntranslated RNAValidationVariantWorkbaseclinically relevantdata modelingdisease-causing mutationexon skippingexperimental studygenetic varianthuman diseaseknock-downnovel strategiespredictive modelingrepairedsynthetic biologysynthetic construct
项目摘要
The proportion of the human genome that underlies gene regulation dwarfs the proportion that encodes
proteins. However, we remain poorly equipped for identifying which genetic variants compromise gene
regulatory function in ways that may contribute to risk for both rare and common human diseases.
Understanding how non-coding sequences regulate gene expression, as well as being able to predict the
functional consequences of genetic variation for gene regulation, are paramount challenges for the field. Here,
we propose to combine synthetic biology, massively parallel functional assays, and machine learning to
profoundly advance our understanding of the `regulatory code' of the human genome. While challenging, the
task of unravelling complex codes from large amounts of empirical data is not without precedent. For example,
over the past decade, computer scientists working in natural language processing have made immense
progress, driven in large part by a combination of algorithmic and computational improvements and
enormously larger training datasets than were available to the previous generations of scientists working in this
area. Inspired by the revolutionizing impact of “big data” for traditional problems in machine learning, we
propose to model gene regulatory phenomena using training datasets with several orders of magnitude more
examples than naturally exist in the human genome. We predict that the models learned from massive
numbers of synthetic examples will strongly outperform models learned from the small number of natural
examples. We will demonstrate our approach by developing comprehensive, quantitative, and predictive
models for alternative splicing and alternative polyadenylation, two widespread regulatory mechanisms by
which a single gene can code for multiple transcripts and proteins. However, we anticipate that this basic
paradigm – specifically, the massively parallel measurement of the functional behavior of extremely large
numbers of synthetic sequences followed by quantitative modeling of sequence-function relationships – can be
generalized to advance our understanding of diverse forms of gene regulation.
人类基因组中作为基因调控基础的比例使编码的比例相形见绌
蛋白质。然而,我们仍然缺乏识别哪些遗传变异损害基因的能力
调节功能可能会增加罕见和常见人类疾病的风险。
了解非编码序列如何调节基因表达,以及能够预测
遗传变异对基因调控的功能影响是该领域面临的首要挑战。这里,
我们建议将合成生物学、大规模并行功能分析和机器学习结合起来
深刻推进我们对人类基因组“调控密码”的理解。在充满挑战的同时,
从大量经验数据中解开复杂密码的任务并非没有先例。例如,
在过去的十年中,从事自然语言处理工作的计算机科学家取得了巨大的成就
进步在很大程度上是由算法和计算改进的结合推动的
比前几代从事该领域工作的科学家可用的训练数据集要大得多
区域。受到“大数据”对机器学习传统问题的革命性影响的启发,我们
提议使用几个数量级的训练数据集来模拟基因调控现象
人类基因组中自然存在的例子。我们预测模型会从大量的数据中学习
大量的合成示例将远远优于从少量自然示例中学习到的模型
例子。我们将通过开发全面的、定量的和预测性的方法来展示我们的方法
选择性剪接和选择性多腺苷酸化的模型,这两种广泛的调节机制
单个基因可以编码多个转录本和蛋白质。然而,我们预计这一基本
范式——具体来说,是对极大的功能行为的大规模并行测量
合成序列的数量,然后对序列-功能关系进行定量建模 - 可以是
推广以增进我们对多种形式的基因调控的理解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georg Seelig其他文献
Georg Seelig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georg Seelig', 18)}}的其他基金
Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
- 批准号:
10633765 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10625987 - 财政年份:2021
- 资助金额:
$ 59.66万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10375354 - 财政年份:2021
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
9886581 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
10341212 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
10161803 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
9977420 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
- 批准号:
10112854 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
- 批准号:
9894822 - 财政年份:2018
- 资助金额:
$ 59.66万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant














{{item.name}}会员




