Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
大型组学数据集的压缩基因组学:算法、应用程序和工具
基本信息
- 批准号:9546755
- 负责人:
- 金额:$ 35.02万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2013
- 资助国家:美国
- 起止时间:2013-09-05 至 2020-08-31
- 项目状态:已结题
- 来源:
- 关键词:AccelerationAddressAdoptionAgeAlgorithmsAutistic DisorderAutoimmune DiseasesBioinformaticsBiologicalBiological ProcessBiologyBiomedical ResearchClinicalCloud ComputingCohort StudiesCollaborationsCommunitiesComplexComputer softwareComputersComputing MethodologiesDNA sequencingDataData CompressionData FilesData SetDevelopmentDimensionsDiseaseEnsureExhibitsFoundationsFractalsGenetic VariationGenomeGenomicsGoalsGrantHealthHumanIndianaIndividualIndustryInformaticsIntuitionInvestigationMainstreamingMalignant NeoplasmsMapsMetagenomicsMethodologyMolecularNew EnglandPatientsPatternPharmacogenomicsPrivacyProcessProgress ReportsResearchResearch PersonnelSavingsSecureSecuritySequence AnalysisStructureTechniquesTechnologyThe Cancer Genome AtlasTherapeuticTimeTranscriptUniversitiesVariantWorkautism spectrum disorderbasecloud platformcohortcomputer frameworkcomputerized toolscryptographydata exchangedata structuredesignencryptionfile formatgenomic datahuman DNAhuman RNA sequencinghuman datahuman diseaseimprovedinnovationinsightinterestmicrobiomemicrobiome therapeuticsmonomethoxypolyethylene glycolnext generation sequencingnovelsoftware developmenttooltransmission processwhole genome
项目摘要
Project Summary
High-throughput experimental technologies are generating increasingly massive and complex genomic
sequence data sets. While these data hold the promise of uncovering entirely new biology, their sheer
enormity threatens to make their interpretation computationally infeasible. The continued goal of this
project is to design and develop innovative compression-based algorithmic techniques for efficiently
processing massive biological data. We will branch out beyond compressive search to address the
imminent need to securely store and process large-scale genomic data in the cloud, as well as to gain
insights from massive metagenomic data.
The key underlying observation is that genomic data is highly structured, exhibiting high degrees of
self-similarity. In our previous granting period, we exploited its high redundancy and low fractal
dimension to enable scalable compressive storage and acceleration for search of sequence data as
well as other biological data types relevant to structural bioinformatics and chemogenomics. In this
renewal, we will continue to capitalize on the structure (i.e., compressibility) of genomic data to: (i)
overcome privacy concerns that arise in sharing sensitive human data (e.g. on the cloud); (ii) address
new challenges, beyond search, with metagenomic data; and (iii) seek to widen the adoption of the
previous and newly-proposed compressive algorithms for industry, research, and clinical use. We will
demonstrate the utility of our compressive techniques to the characterization of human genomic and
metagenomic variation.
We will collaborate with co-I Sahinalp's lab (Indiana University, Bloomington) on developing and
applying these tools to high-throughput data sets including autism spectrum disorder (with Isaac
Kohane and Evan Eichler) and cancer (with PCAWG, Pan Cancer Analysis of Whole Genomes), the
microbiome (with Eric Alm and Jian Peng), as well as human variation analysis (GATK, with Eric
Lander and Eric Banks). The broad, long-term goal is to apply our compressive approach to
massive biological data sets to elucidate the still obscure molecular landscape of diseases.
Successful completion of these aims will result in computational methods and tools that will significantly
increase our ability to securely store, access and analyze massive data sets and will reveal
fundamental aspects of genetic variation, as well as testable hypotheses for experimental
investigations. Not only will all developed software be made publicly available, but as part of our
integration aim, we will also ensure that the research community can make use of our innovations with
minimal effort. Through our research collaborations, we will both build these tools and demonstrate
their relevance to the characterization of human health and disease.
项目摘要
高通量实验技术正在产生越来越大的质量和复杂的基因组
序列数据集。尽管这些数据有望发现全新的生物学,但他们的纯粹
巨大的威胁使他们的解释在计算上是不可行的。持续的目标
项目是设计和开发基于创新的压缩算法技术,以有效地
处理大量的生物学数据。我们将超越压缩搜索,以解决
迫在眉睫的需要在云中安全地存储和处理大规模基因组数据,并获得
大量宏基因组数据的见解。
关键的基础观察是基因组数据是高度结构化的,表现出高度的
自相似。在以前的授予期内,我们探索了其高冗余性和低分形
维度以启用可扩展的压缩存储和加速度以搜索序列数据作为
以及其他与结构生物信息学和化学组学有关的生物数据类型。在这个
更新,我们将继续利用基因组数据的结构(即补充性)到:(i)
克服共享敏感人类数据(例如在云上)时出现的隐私问题; (ii)地址
带有宏基因组数据的新挑战,超越搜索; (iii)试图扩大采用
用于行业,研究和临床用途的以前和新提供的压缩算法。我们将
证明我们的压缩技术对人类基因组的表征和
宏基因组变异。
我们将与Co-I Sahinalp的实验室(布卢明顿印第安纳大学)合作开发和
将这些工具应用于包括自闭症谱系障碍在内的高通量数据集(与以撒有关
Kohane和Evan Eichler)和癌症(与PCAWG,PAN癌的整个基因组分析),
微生物组(与Eric Alm和Jian Peng)以及人类变异分析(GATK,ERIC
Lander和Eric Banks)。广泛的长期目标是将我们的压缩方法应用于
大量的生物数据集,以阐明疾病的分子景观仍然晦涩。
这些目标成功完成将导致计算方法和工具
提高我们安全存储,访问和分析大量数据集的能力,并将揭示
遗传变异的基本方面以及实验的可检验假设
投资。不仅可以使所有开发的软件公开可用,而且还可以作为我们的一部分
整合目标,我们还将确保研究界可以利用我们的创新
最小的努力。通过我们的研究合作,我们将构建这些工具并证明
它们与人类健康和疾病的特征相关。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
BONNIE BERGER其他文献
BONNIE BERGER的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('BONNIE BERGER', 18)}}的其他基金
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10401890 - 财政年份:2021
- 资助金额:
$ 35.02万 - 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10207091 - 财政年份:2021
- 资助金额:
$ 35.02万 - 项目类别:
Manifold representations and active learning for 21 st century biology
21 世纪生物学的流形表示和主动学习
- 批准号:
10670057 - 财政年份:2021
- 资助金额:
$ 35.02万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10004966 - 财政年份:2020
- 资助金额:
$ 35.02万 - 项目类别:
Developing high-throughput genetic perturbation strategies for single cells in cancer organoids
开发癌症类器官中单细胞的高通量遗传扰动策略
- 批准号:
10212991 - 财政年份:2020
- 资助金额:
$ 35.02万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8849927 - 财政年份:2013
- 资助金额:
$ 35.02万 - 项目类别:
Compressive genomics for large omics data sets: Algorithms applications & tools
大型组学数据集的压缩基因组学:算法应用
- 批准号:
8599836 - 财政年份:2013
- 资助金额:
$ 35.02万 - 项目类别:
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
The University of Miami AIDS Research Center on Mental Health and HIV/AIDS - Center for HIV & Research in Mental Health (CHARM)Research Core - EIS
迈阿密大学艾滋病心理健康和艾滋病毒/艾滋病研究中心 - Center for HIV
- 批准号:
10686546 - 财政年份:2023
- 资助金额:
$ 35.02万 - 项目类别:
Core A: Administrative, Career Development, and Research Integration Core
核心 A:行政、职业发展和研究整合核心
- 批准号:
10630466 - 财政年份:2023
- 资助金额:
$ 35.02万 - 项目类别:
Extensible Open Source Zero-Footprint Web Viewer for Cancer Imaging Research
用于癌症成像研究的可扩展开源零足迹 Web 查看器
- 批准号:
10644112 - 财政年份:2023
- 资助金额:
$ 35.02万 - 项目类别:
Bioethical, Legal, and Anthropological Study of Technologies (BLAST)
技术的生物伦理、法律和人类学研究 (BLAST)
- 批准号:
10831226 - 财政年份:2023
- 资助金额:
$ 35.02万 - 项目类别: