Meaningful Data Compression and Reduction of High-Throughput Sequencing Data
有意义的数据压缩和高通量测序数据的缩减
基本信息
- 批准号:9336154
- 负责人:
- 金额:$ 24.33万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-09-18 至 2018-08-31
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsAreaBig DataBiologicalBiologyBiomedical ResearchClinicalCohort StudiesComputer softwareConsensusConsensus SequenceDNA SequenceDataData CompressionData SetDevelopmentDiseaseEvolutionFundingFutureGenesGenetic ScreeningGenetic VariationGenetic screening methodGenomeGenomic approachGenomicsGoalsGraphHealthcareHigh-Throughput Nucleotide SequencingHybridsIndividualJointsLeadLibrariesLifeLocationMalignant NeoplasmsMapsMemoryMethodsNoiseOutcomePositioning AttributePrevalencePrivacyResearchResearch PersonnelSavingsSchemeScientific InquirySecureSeedsSequence AlignmentSourceStandardizationTechnologyTimeUnited States National Institutes of HealthVariantanticancer researchapplication programming interfacebasebig biomedical dataclinical carecohortcomputer infrastructurecomputing resourcescostdata reductionexperimental studyflexibilitygenetic risk factorgenetic variantgenomic datagenomic toolshigh throughput analysisimprovedindexinginsightmiddlewareopen sourceoperationpersonalized medicinepublic health relevancereference genometooltransmission processvirtual
项目摘要
DESCRIPTION (provided by applicant): High-throughput sequencing (HTS), a technology to unravel DNA sequences on a large scale, is pervasive in clinical and biological applications such as studying the spectrum of genetic variations and their relation to disease. Due to further reductions in cost, sequencing is expected to gain significant momentum, since it will replace commonly used genetic tests in clinical care for life-threatening diseases such as cancer, and consequently produce enormous amounts of data. The rise of personalized medicine will eventually lead to the point where every individual can be routinely screened for genetic risk factors using HTS. The goal of the proposed research is to boost the analysis of HTS data with a compressive genomics middle-ware that provides compressed reduced representations of HTS data. The representations are meaningful in that sequence information which is likely to cover the same genomic location in the sequenced genome will be brought together. As existing and future methods and algorithms can operate directly on this representation, the proposal not only realizes a saving in space and transmission times, but also in CPU time needed for analysis. The project has three aims: 1) Develop a clustering algorithm for single and paired HTS read libraries that rapidly recognized overlapping. Establish a lossless compression scheme based on clusters, which facilitates downstream computations directly on the compressed data without decompression. Extend the approach to joint compression of multiple HTS libraries. 2) Introduce meaningful reduced representations which further decrease memory demands by prioritizing sequence information likely to be correct and discarding information likely to be erroneous. 3) Adapt important HTS analysis tools to our compressive genomics approach, in particular read mapping, de novo genome assembly by using cluster consensus sequences as virtual, elongated reads for a hybrid assembly scheme, and discovery of structural variants based on cluster mapping positions and ambiguities in assignment of sequences to clusters. Our results will aid in improving health care outcomes by increasing analysis quality, lowering costs and making the analysis of HTS data more widely accessible. This will impact areas of scientific inquiry from understanding genetic variations underlying disease to personal genomics.
描述(由申请人提供):高通量测序(HTS)是一种大规模解析DNA序列的技术,在临床和生物学应用中非常普遍,例如研究遗传变异谱及其与疾病的关系。由于成本的进一步降低,预计测序将获得巨大的发展势头,因为它将取代癌症等危及生命的疾病临床护理中常用的基因检测,从而产生大量数据。个性化医疗的兴起最终将导致每个人都可以使用HTS常规筛查遗传风险因素。所提出的研究的目标是提高HTS数据的分析与压缩基因组学中间件,提供压缩减少HTS数据的表示。所述表示是有意义的,因为可能覆盖测序的基因组中的相同基因组位置的序列信息将被汇集在一起。由于现有的和未来的方法和算法可以直接在这种表示上操作,该建议不仅节省了空间和传输时间,而且还节省了分析所需的CPU时间。该项目有三个目标:1)开发快速识别重叠的单个和成对HTS读段库的聚类算法。建立了基于簇的无损压缩方案,该方案便于直接在压缩数据上进行下游计算而无需解压缩。将该方法扩展到多个HTS库的联合压缩。2)引入有意义的简化表示,通过优先考虑可能正确的序列信息并丢弃可能错误的信息,进一步降低内存需求。3)使重要的HTS分析工具适应我们的压缩基因组学方法,特别是读段映射,通过使用簇共有序列作为虚拟的从头基因组组装,用于混合组装方案的延长读段,以及基于簇映射位置和序列分配到簇的模糊性发现结构变体。我们的研究结果将有助于通过提高分析质量,降低成本和使HTS数据的分析更广泛地获得来改善医疗保健结果。这将影响从理解疾病背后的遗传变异到个人基因组学的科学研究领域。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression.
- DOI:10.1371/journal.pcbi.1004871
- 发表时间:2016-05
- 期刊:
- 影响因子:4.3
- 作者:Wiedenhoeft J;Brugel E;Schliep A
- 通讯作者:Schliep A
Change by challenge: A common genetic basis behind childhood cognitive development and cognitive training.
挑战的变化:儿童认知发展和认知训练背后的共同遗传基础。
- DOI:10.1038/s41539-021-00096-6
- 发表时间:2021-06-02
- 期刊:
- 影响因子:4.2
- 作者:Sauce B;Wiedenhoeft J;Judd N;Klingberg T
- 通讯作者:Klingberg T
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Martin Farach-Colton其他文献
Martin Farach-Colton的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Approximate algorithms and architectures for area efficient system design
区域高效系统设计的近似算法和架构
- 批准号:
LP170100311 - 财政年份:2018
- 资助金额:
$ 24.33万 - 项目类别:
Linkage Projects
AMPS: Rank Minimization Algorithms for Wide-Area Phasor Measurement Data Processing
AMPS:用于广域相量测量数据处理的秩最小化算法
- 批准号:
1736326 - 财政年份:2017
- 资助金额:
$ 24.33万 - 项目类别:
Standard Grant
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2017
- 资助金额:
$ 24.33万 - 项目类别:
Discovery Grants Program - Individual
Rigorous simulation of speckle fields caused by large area rough surfaces using fast algorithms based on higher order boundary element methods
使用基于高阶边界元方法的快速算法对大面积粗糙表面引起的散斑场进行严格模拟
- 批准号:
375876714 - 财政年份:2017
- 资助金额:
$ 24.33万 - 项目类别:
Research Grants
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2016
- 资助金额:
$ 24.33万 - 项目类别:
Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2015
- 资助金额:
$ 24.33万 - 项目类别:
Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2014
- 资助金额:
$ 24.33万 - 项目类别:
Discovery Grants Program - Individual
AREA: Optimizing gene expression with mRNA free energy modeling and algorithms
区域:利用 mRNA 自由能建模和算法优化基因表达
- 批准号:
8689532 - 财政年份:2014
- 资助金额:
$ 24.33万 - 项目类别:
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Monitoring of Power Systems
CPS:协同:协作研究:用于电力系统广域监控的分布式异步算法和软件系统
- 批准号:
1329780 - 财政年份:2013
- 资助金额:
$ 24.33万 - 项目类别:
Standard Grant
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Mentoring of Power Systems
CPS:协同:协作研究:用于电力系统广域指导的分布式异步算法和软件系统
- 批准号:
1329745 - 财政年份:2013
- 资助金额:
$ 24.33万 - 项目类别:
Standard Grant