Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences
数百万随机序列的选择性剪接和聚腺苷酸化的预测模型
基本信息
- 批准号:9306648
- 负责人:
- 金额:$ 59.66万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-04-21 至 2021-01-31
- 项目状态:已结题
- 来源:
- 关键词:AdoptedAlgorithmsAlternative SplicingAreaBasic ScienceBehaviorBig DataBiological AssayBiological PhenomenaCRISPR/Cas technologyClinical MedicineCodeComplexComputersDNA SequenceDataData SetDatabasesDependencyDiseaseGene ExpressionGene Expression RegulationGenerationsGenesGeneticGenetic PolymorphismGenetic VariationGenomeGenomicsHaplotypesHumanHuman GenomeLeadLearningLibrariesMachine LearningMeasurementMeasuresMediatingMendelian disorderModelingMutationNatural Language ProcessingNucleotidesPolyadenylationProtein IsoformsProteinsPublishingRNA SplicingRNA-Binding ProteinsRegulationRegulator GenesReporterResearchRiskScientistShapesSpecific qualifier valueTestingTrainingTranscriptUntranslated RNAValidationVariantWorkbaseclinically relevantdata modelingdisease-causing mutationexon skippingexperimental studygenetic varianthuman diseaseknock-downnovel strategiespredictive modelingrepairedsynthetic biologysynthetic construct
项目摘要
The proportion of the human genome that underlies gene regulation dwarfs the proportion that encodes
proteins. However, we remain poorly equipped for identifying which genetic variants compromise gene
regulatory function in ways that may contribute to risk for both rare and common human diseases.
Understanding how non-coding sequences regulate gene expression, as well as being able to predict the
functional consequences of genetic variation for gene regulation, are paramount challenges for the field. Here,
we propose to combine synthetic biology, massively parallel functional assays, and machine learning to
profoundly advance our understanding of the `regulatory code' of the human genome. While challenging, the
task of unravelling complex codes from large amounts of empirical data is not without precedent. For example,
over the past decade, computer scientists working in natural language processing have made immense
progress, driven in large part by a combination of algorithmic and computational improvements and
enormously larger training datasets than were available to the previous generations of scientists working in this
area. Inspired by the revolutionizing impact of “big data” for traditional problems in machine learning, we
propose to model gene regulatory phenomena using training datasets with several orders of magnitude more
examples than naturally exist in the human genome. We predict that the models learned from massive
numbers of synthetic examples will strongly outperform models learned from the small number of natural
examples. We will demonstrate our approach by developing comprehensive, quantitative, and predictive
models for alternative splicing and alternative polyadenylation, two widespread regulatory mechanisms by
which a single gene can code for multiple transcripts and proteins. However, we anticipate that this basic
paradigm – specifically, the massively parallel measurement of the functional behavior of extremely large
numbers of synthetic sequences followed by quantitative modeling of sequence-function relationships – can be
generalized to advance our understanding of diverse forms of gene regulation.
在人类基因组中,基因调控所占的比例远远超过了编码
proteins.然而,我们仍然没有足够的能力来确定哪些遗传变异损害基因,
可能导致罕见和常见人类疾病风险的方式。
了解非编码序列如何调节基因表达,以及能够预测
遗传变异对基因调控的功能性后果是该领域的最大挑战。在这里,
我们建议将合成生物学、大规模并行功能测定和机器学习结合起来,以联合收割机
深刻推进我们对人类基因组“调控密码”的理解。虽然具有挑战性,
从大量经验数据中解开复杂代码的任务并非没有先例。比如说,
在过去的十年里,从事自然语言处理的计算机科学家们取得了巨大的成就。
进步,在很大程度上是由算法和计算改进的结合驱动的,
比前几代科学家可用的训练数据集要大得多。
区受“大数据”对机器学习中传统问题的革命性影响的启发,我们
我建议使用多几个数量级的训练数据集来模拟基因调控现象
比自然存在于人类基因组中的例子。我们预测,从大规模学习的模型
大量的合成样本将大大优于从少量自然样本中学习到的模型。
例子.我们将通过开发全面的、定量的和可预测的
选择性剪接和选择性多聚腺苷酸化的模型,这两种广泛的调节机制,
单个基因可以编码多种转录物和蛋白质。然而,我们预计,
范式-特别是,极大的功能行为的大规模并行测量
许多合成序列,然后是序列-功能关系的定量建模-可以是
推广到推进我们对不同形式的基因调控的理解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georg Seelig其他文献
Georg Seelig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georg Seelig', 18)}}的其他基金
Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
- 批准号:
10633765 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10625987 - 财政年份:2021
- 资助金额:
$ 59.66万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10375354 - 财政年份:2021
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
9886581 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
10341212 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
10161803 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
9977420 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
- 批准号:
10112854 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
- 批准号:
9894822 - 财政年份:2018
- 资助金额:
$ 59.66万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Research Grant














{{item.name}}会员




