Compressive genomics for large omics data sets: Algorithms applications & tools

大型组学数据集的压缩基因组学:算法应用

基本信息

  • 批准号:
    8849927
  • 负责人:
  • 金额:
    $ 20.94万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2013
  • 资助国家:
    美国
  • 起止时间:
    2013-09-05 至 2016-05-31
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): High-throughput experimental technologies are generating increasingly massive and complex genomic sequence data sets. While these data hold the promise of uncovering entirely new biology, their sheer enormity threatens to make their interpretation computationally infeasible. The goal of this project is to design and develop innovative compression-based algorithmic techniques and publicly-available software for large-scale genomic sequence data sets. The key underlying observation is that most genomes currently being sequenced share much similarity with genomes that have already been collected. Thus, the amount of new sequence information is growing much more slowly than the total size of genomic sequence data sets. In very recent work, we have provided a proof-of-concept that this redundancy can be exploited by compressing sequence data in such a way as to allow direct computation on the compressed data, a methodological paradigm we term "compressive genomics." In this proposal we broaden the framework of compressive genomics to several additional application areas in which algorithmic advances are urgently needed in order to keep pace with the growth in both genomic and protein sequencing data. In particular, we will build a novel comprehensive framework for compressive representation and highly efficient downstream analysis of large-scale next-generation sequencing (NGS) data sets; this will significantly advance the state of the art and scale over existing algorithms as the volume of genomic data grows, thus meeting the challenge of the expected future acceleration of sequencing technologies. Additionally, we will develop advanced, compressively-accelerated algorithms and software for specific applications of current interest in bioinformatics and apply them to real large-scale 'omics' data sets to accelerate data analytics and lead to novel biological discoveries. Namely, we will collaborate with the Kohane lab on analysis of high-throughput gene expression and NGS data sets from patients with neurodevelopmental disorders, including Autism Spectrum Disorder and Parkinson's; the broad, long-term goal is to apply our compressive approach to such massive data sets to elucidate the still obscure molecular landscape of these diseases. Understanding massive 'omics' data from patients will empower both rational, targeted drug design and more intelligent disease management, yet their sheer enormity threatens to make the arising problems computationally infeasible. Here, we develop computational methods and tools that will fundamentally advance the state-of-the-art in storage, retrieval and analysis of these rapidly expanding data sets.
描述(申请人提供):高通量实验技术正在产生日益庞大和复杂的基因组序列数据集。虽然这些数据有望发现全新的生物学,但它们的巨大规模可能会使它们的解释在计算上变得不可行。该项目的目标是设计和开发用于大规模基因组序列数据集的创新的基于压缩的算法技术和公开可用的软件。关键的潜在观察是,目前正在测序的大多数基因组与已经收集的基因组有许多相似之处。因此,新序列信息量的增长比基因组序列数据集的总大小慢得多。在最近的工作中,我们提供了一个概念证明,可以通过以一种允许对压缩数据进行直接计算的方式压缩序列数据来利用这种冗余,这是一种我们称之为“压缩基因组学”的方法范式。在这个提案中,我们将压缩基因组学的框架扩展到几个额外的应用领域,在这些领域中迫切需要算法的进步,以跟上基因组和蛋白质测序数据的增长步伐。特别是,我们将构建一个新的全面的框架,用于大规模下一代测序(NGS)数据集的压缩表示和高效的下游分析;这将显著提高技术水平并扩展现有算法的规模,因为 基因组数据增长,从而满足了预期的未来测序技术加速的挑战。此外,我们将为当前生物信息学感兴趣的特定应用开发先进的压缩加速算法和软件,并将它们应用于真正的大规模“组学”数据集,以加快数据分析并导致新的生物发现。也就是说,我们将与Kohane实验室合作分析来自神经发育障碍患者的高通量基因表达和NGS数据集,包括自闭症谱系障碍和帕金森氏症;广泛的长期目标是将我们的压缩方法应用于如此海量的数据集,以阐明这些疾病仍然鲜为人知的分子图景。理解患者的海量“组学”数据将使合理、有针对性的药物设计和更智能的疾病管理成为可能,但它们的巨大规模可能会使所产生的问题在计算上无法实现。在这里,我们开发的计算方法和工具将从根本上推动这些快速增长的数据集的存储、检索和分析的最先进水平。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

BONNIE BERGER其他文献

BONNIE BERGER的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('BONNIE BERGER', 18)}}的其他基金

Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
  • 批准号:
    10401890
  • 财政年份:
    2021
  • 资助金额:
    $ 20.94万
  • 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
  • 批准号:
    10207091
  • 财政年份:
    2021
  • 资助金额:
    $ 20.94万
  • 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
  • 批准号:
    10670057
  • 财政年份:
    2021
  • 资助金额:
    $ 20.94万
  • 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
  • 批准号:
    10004966
  • 财政年份:
    2020
  • 资助金额:
    $ 20.94万
  • 项目类别:
Privacy-preserving genomic medicine at scale
大规模保护隐私的基因组医学
  • 批准号:
    10266081
  • 财政年份:
    2020
  • 资助金额:
    $ 20.94万
  • 项目类别:
Privacy-preserving genomic medicine at scale
大规模保护隐私的基因组医学
  • 批准号:
    10459604
  • 财政年份:
    2020
  • 资助金额:
    $ 20.94万
  • 项目类别:
Privacy-preserving genomic medicine at scale
大规模保护隐私的基因组医学
  • 批准号:
    10662349
  • 财政年份:
    2020
  • 资助金额:
    $ 20.94万
  • 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
  • 批准号:
    10212991
  • 财政年份:
    2020
  • 资助金额:
    $ 20.94万
  • 项目类别:
Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
大型组学数据集的压缩基因组学:算法、应用程序和工具
  • 批准号:
    9546755
  • 财政年份:
    2013
  • 资助金额:
    $ 20.94万
  • 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
  • 批准号:
    8599836
  • 财政年份:
    2013
  • 资助金额:
    $ 20.94万
  • 项目类别:

相似海外基金

Medcircuit, the algorithmic software reducing waiting times in emergency department and general practice waiting rooms.
MedCircuit,一种算法软件,可减少急诊科和全科候诊室的等待时间。
  • 批准号:
    133416
  • 财政年份:
    2018
  • 资助金额:
    $ 20.94万
  • 项目类别:
    Feasibility Studies
SHF: Small: Programming Abstractions for Algorithmic Software Synthesis
SHF:小型:算法软件综合的编程抽象
  • 批准号:
    0916351
  • 财政年份:
    2009
  • 资助金额:
    $ 20.94万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了