Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
基本信息
- 批准号:8599836
- 负责人:
- 金额:$ 21.79万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-09-05 至 2016-05-31
- 项目状态:已结题
- 来源:
- 关键词:AccelerationAlgorithmsAmino Acid SequenceAreaBioinformaticsBiologicalBiologyComplexComputer softwareComputing MethodologiesDataData SetDiseaseDisease ManagementDrug DesignFutureGene ExpressionGenomeGenomicsGoalsGrowthLeadMolecularNeurodevelopmental DisorderParkinson DiseasePatientsPeptide Sequence DeterminationRetrievalTechniquesTechnologyWorkautism spectrum disorderbasecomputerized toolsdesignempoweredhigh throughput analysisinnovationinterestmeetingsnext generation sequencingnoveltool
项目摘要
DESCRIPTION (provided by applicant): High-throughput experimental technologies are generating increasingly massive and complex genomic sequence data sets. While these data hold the promise of uncovering entirely new biology, their sheer enormity threatens to make their interpretation computationally infeasible. The goal of this project is to design and develop innovative compression-based algorithmic techniques and publicly-available software for large-scale genomic sequence data sets. The key underlying observation is that most genomes currently being sequenced share much similarity with genomes that have already been collected. Thus, the amount of new sequence information is growing much more slowly than the total size of genomic sequence data sets. In very recent work, we have provided a proof-of-concept that this redundancy can be exploited by compressing sequence data in such a way as to allow direct computation on the compressed data, a methodological paradigm we term "compressive genomics." In this proposal we broaden the framework of compressive genomics to several additional application areas in which algorithmic advances are urgently needed in order to keep pace with the growth in both genomic and protein sequencing data. In particular, we will build a novel comprehensive framework for compressive representation and highly efficient downstream analysis of large-scale next-generation sequencing (NGS) data sets; this will significantly advance the state of the art and scale over existing algorithms as the volume of
genomic data grows, thus meeting the challenge of the expected future acceleration of sequencing technologies. Additionally, we will develop advanced, compressively-accelerated algorithms and software for specific applications of current interest in bioinformatics and apply them to real large-scale 'omics' data sets to accelerate data analytics and lead to novel biological discoveries. Namely, we will collaborate with the Kohane lab on analysis of high-throughput gene expression and NGS data sets from patients with neurodevelopmental disorders, including Autism Spectrum Disorder and Parkinson's; the broad, long-term goal is to apply our compressive approach to such massive data sets to elucidate the still obscure molecular landscape of these diseases. Understanding massive 'omics' data from patients will empower both rational, targeted drug design and more intelligent disease management, yet their sheer enormity threatens to make the arising problems computationally infeasible. Here, we develop computational methods and tools that will fundamentally advance the state-of-the-art in storage, retrieval and analysis of these rapidly expanding data sets.
描述(申请人提供):高通量实验技术正在产生日益庞大和复杂的基因组序列数据集。虽然这些数据有望发现全新的生物学,但它们的巨大规模可能会使它们的解释在计算上变得不可行。该项目的目标是设计和开发用于大规模基因组序列数据集的创新的基于压缩的算法技术和公开可用的软件。关键的潜在观察是,目前正在测序的大多数基因组与已经收集的基因组有许多相似之处。因此,新序列信息量的增长比基因组序列数据集的总大小慢得多。在最近的工作中,我们提供了一个概念证明,可以通过以一种允许对压缩数据进行直接计算的方式压缩序列数据来利用这种冗余,这是一种我们称之为“压缩基因组学”的方法范式。在这个提案中,我们将压缩基因组学的框架扩展到几个额外的应用领域,在这些领域中迫切需要算法的进步,以跟上基因组和蛋白质测序数据的增长步伐。特别是,我们将构建一个新的全面的框架,用于大规模下一代测序(NGS)数据集的压缩表示和高效的下游分析;这将显著提高技术水平并扩展现有算法的规模,因为
基因组数据增长,从而满足了预期的未来测序技术加速的挑战。此外,我们将为当前生物信息学感兴趣的特定应用开发先进的压缩加速算法和软件,并将它们应用于真正的大规模“组学”数据集,以加快数据分析并导致新的生物发现。也就是说,我们将与Kohane实验室合作分析来自神经发育障碍患者的高通量基因表达和NGS数据集,包括自闭症谱系障碍和帕金森氏症;广泛的长期目标是将我们的压缩方法应用于如此海量的数据集,以阐明这些疾病仍然鲜为人知的分子图景。理解患者的海量“组学”数据将使合理、有针对性的药物设计和更智能的疾病管理成为可能,但它们的巨大规模可能会使所产生的问题在计算上无法实现。在这里,我们开发的计算方法和工具将从根本上推动这些快速增长的数据集的存储、检索和分析的最先进水平。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
BONNIE BERGER其他文献
BONNIE BERGER的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('BONNIE BERGER', 18)}}的其他基金
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10401890 - 财政年份:2021
- 资助金额:
$ 21.79万 - 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10207091 - 财政年份:2021
- 资助金额:
$ 21.79万 - 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10670057 - 财政年份:2021
- 资助金额:
$ 21.79万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10004966 - 财政年份:2020
- 资助金额:
$ 21.79万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10212991 - 财政年份:2020
- 资助金额:
$ 21.79万 - 项目类别:
Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
大型组学数据集的压缩基因组学:算法、应用程序和工具
- 批准号:
9546755 - 财政年份:2013
- 资助金额:
$ 21.79万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8849927 - 财政年份:2013
- 资助金额:
$ 21.79万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 21.79万 - 项目类别:
Research Grant