Predictive Modeling of Alternative Splicing and Polyadenylation from Millions of Random Sequences
数百万随机序列的选择性剪接和聚腺苷酸化的预测模型
基本信息
- 批准号:9306648
- 负责人:
- 金额:$ 59.66万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-04-21 至 2021-01-31
- 项目状态:已结题
- 来源:
- 关键词:AdoptedAlgorithmsAlternative SplicingAreaBasic ScienceBehaviorBig DataBiological AssayBiological PhenomenaCRISPR/Cas technologyClinical MedicineCodeComplexComputersDNA SequenceDataData SetDatabasesDependencyDiseaseGene ExpressionGene Expression RegulationGenerationsGenesGeneticGenetic PolymorphismGenetic VariationGenomeGenomicsHaplotypesHumanHuman GenomeLeadLearningLibrariesMachine LearningMeasurementMeasuresMediatingMendelian disorderModelingMutationNatural Language ProcessingNucleotidesPolyadenylationProtein IsoformsProteinsPublishingRNA SplicingRNA-Binding ProteinsRegulationRegulator GenesReporterResearchRiskScientistShapesSpecific qualifier valueTestingTrainingTranscriptUntranslated RNAValidationVariantWorkbaseclinically relevantdata modelingdisease-causing mutationexon skippingexperimental studygenetic varianthuman diseaseknock-downnovel strategiespredictive modelingrepairedsynthetic biologysynthetic construct
项目摘要
The proportion of the human genome that underlies gene regulation dwarfs the proportion that encodes
proteins. However, we remain poorly equipped for identifying which genetic variants compromise gene
regulatory function in ways that may contribute to risk for both rare and common human diseases.
Understanding how non-coding sequences regulate gene expression, as well as being able to predict the
functional consequences of genetic variation for gene regulation, are paramount challenges for the field. Here,
we propose to combine synthetic biology, massively parallel functional assays, and machine learning to
profoundly advance our understanding of the `regulatory code' of the human genome. While challenging, the
task of unravelling complex codes from large amounts of empirical data is not without precedent. For example,
over the past decade, computer scientists working in natural language processing have made immense
progress, driven in large part by a combination of algorithmic and computational improvements and
enormously larger training datasets than were available to the previous generations of scientists working in this
area. Inspired by the revolutionizing impact of “big data” for traditional problems in machine learning, we
propose to model gene regulatory phenomena using training datasets with several orders of magnitude more
examples than naturally exist in the human genome. We predict that the models learned from massive
numbers of synthetic examples will strongly outperform models learned from the small number of natural
examples. We will demonstrate our approach by developing comprehensive, quantitative, and predictive
models for alternative splicing and alternative polyadenylation, two widespread regulatory mechanisms by
which a single gene can code for multiple transcripts and proteins. However, we anticipate that this basic
paradigm – specifically, the massively parallel measurement of the functional behavior of extremely large
numbers of synthetic sequences followed by quantitative modeling of sequence-function relationships – can be
generalized to advance our understanding of diverse forms of gene regulation.
人类基因组中构成基因调控的部分使编码的部分相形见绌
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Georg Seelig其他文献
Georg Seelig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Georg Seelig', 18)}}的其他基金
Engineering cell type-specific splicing regulation
工程细胞类型特异性剪接调控
- 批准号:
10633765 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10625987 - 财政年份:2021
- 资助金额:
$ 59.66万 - 项目类别:
Joint receptor and protein expression immunophenotyping through split-pool barcoding
通过分池条形码进行联合受体和蛋白质表达免疫表型
- 批准号:
10375354 - 财政年份:2021
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
9886581 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化的高分辨率空间转录组学
- 批准号:
10341212 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
10161803 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A massively parallel reporter assay for measuring chromatin effects on alternative splicing
用于测量染色质对选择性剪接的影响的大规模并行报告分析
- 批准号:
9977420 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
High-resolution spatial transcriptomics through light patterning
通过光图案化进行高分辨率空间转录组学
- 批准号:
10112854 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
A predictive model of mRNA stability and translation for variant interpretation and mRNA therapeutics
用于变异解释和 mRNA 治疗的 mRNA 稳定性和翻译的预测模型
- 批准号:
9894822 - 财政年份:2018
- 资助金额:
$ 59.66万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant














{{item.name}}会员




