Complete genome de novo assembly software for the emerging long read sequencing era
适用于新兴长读长测序时代的完整基因组从头组装软件
基本信息
- 批准号:9255092
- 负责人:
- 金额:$ 74.98万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-03-01 至 2019-02-28
- 项目状态:已结题
- 来源:
- 关键词:AdoptionAlgorithmsAllelesAlternative SplicingAwarenessBacteriaBacterial GenomeBioinformaticsBiological SciencesChromosomesChromosomes, Human, Pair 2CodeComputer softwareComputersConsensusConsensus SequenceDNA SequenceDataData SetDetectionDevicesDiploidyFoundationsGenomeGenomicsGoalsGraphHaploidyHaplotypesHourHumanHuman GenomeHybridsIndividualLegal patentMethodsOrganismOutputPerformancePhaseProtein IsoformsProviderRecording of previous eventsResourcesRunningSWI1ScienceSisterSoftware DesignSolidTechnologyTimeTranscriptVariantWorkbasecomputing resourcesdesignexperimental studygenome-widehuman diseaseinsertion/deletion mutationinstrumentmicrobialmicrobial genomenanoporenew technologynext generationnext generation sequencingnovelpathogenportabilityprogramsprototypepublic health relevancerelative costsequencing platformsoftware developmentsuccesstooltranscriptome sequencingtranscriptomics
项目摘要
Despite the tremendous success of short read next-generation sequencing (NGS) technologies, their inherent
inability to establish long range connectivity makes fundamental tasks such as genome closure, haplotype
phasing and alternatively spliced transcript characterization all but impossible. Now, two long read sequencing
providers, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), are producing data that
can overcome these critical shortcomings. PacBio is capable of producing 10-20kb reads and has seen
increased adoption for closing microbial genomes in particular, but also for eurkaryotic genomics and
transcriptomics. ONT’s MinION device is a portable real-time sequencing platform capable of producing 100kb
reads and has already been successfully applied to microbial sequencing and pathogen identification. ONT’s
new high-throughput instrument, the PromethION, is being released in 2016 and will have sufficient output for
human genome scale experiments. The tremendous potential of both technologies is currently hampered by
high error rates (10-20%) which makes assembly and consensus calling extremely computationally
challenging. Various command line software programs have been developed to tackle these challenges, but
they typically require substantial bioinformatic expertise and computing resources/savvy and do not address
the critical hurdles associated with diploid genomes. With long read sequencing poised to become a major
resource for genomics, there is clearly an urgent need for integrated easy-to-use assembly and analysis
software that can handle and exploit the unique aspects of this data. Toward that end, we have developed a
prototype de novo assembler based on our patented Disk Sort Alignment (DSA) algorithm that can assemble
an uncorrected bacterial genome data set into a single contig with >99.2% base accuracy on a standard
desktop computer in less than 3.5 hours. The assembler uses DSA-determined read overlaps to construct an
assembly string graph from which a layout is fed to a novel consensus generator designed to maximize
accuracy from this error prone data. The overall goal of this direct to Phase II proposal is to transform the
prototype into a fully scalable long read de novo assembler for both haploid and diploid genomes. We will first
optimize the performance of the assembler components, building a solid foundation from which to incorporate
the essential diploid-aware capabilities of 1) identifying large structural variation between two sister
chromosomes, 2) adapting the consensus base caller to handle heterozygous SNVs and small indels and 3)
exploiting the long range connectivity of the data to properly phase the variants and produce accurate
haplotype sequences. Finally, we will leverage these tools to identify alternatively spliced transcripts and allele-
specific expression from long read RNA-Seq data. Consistent with DNASTAR’s 30 year history of delivering
easy-to-use expert level software, this assembler will give any user access to these revolutionary long read
sequencing technologies and those to come.
尽管短读新一代测序(NGS)技术取得了巨大成功,但其固有的
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
TIMOTHY J DURFEE其他文献
TIMOTHY J DURFEE的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('TIMOTHY J DURFEE', 18)}}的其他基金
Long read based sequencing software for the comprehensive analysis of clinical samples
基于长读长的测序软件,用于临床样本的综合分析
- 批准号:
10009727 - 财政年份:2020
- 资助金额:
$ 74.98万 - 项目类别:
Scalable post-assembly editing software for finishing and annotating personal genomes
可扩展的组装后编辑软件,用于完成和注释个人基因组
- 批准号:
9883809 - 财政年份:2018
- 资助金额:
$ 74.98万 - 项目类别:
Scalable post-assembly editing software for finishing and annotating personal genomes
可扩展的组装后编辑软件,用于完成和注释个人基因组
- 批准号:
9767335 - 财政年份:2018
- 资助金额:
$ 74.98万 - 项目类别:
Complete genome de novo assembly software for the emerging long read sequencing era
适用于新兴长读长测序时代的完整基因组从头组装软件
- 批准号:
9747613 - 财政年份:2017
- 资助金额:
$ 74.98万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8236680 - 财政年份:2012
- 资助金额:
$ 74.98万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8727829 - 财政年份:2012
- 资助金额:
$ 74.98万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8703156 - 财政年份:2012
- 资助金额:
$ 74.98万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8624982 - 财政年份:2012
- 资助金额:
$ 74.98万 - 项目类别:
A Desktop Assembly and Analysis Pipeline for Next-gen Metagenomic Sequencing
用于下一代宏基因组测序的桌面组装和分析流程
- 批准号:
8200467 - 财政年份:2011
- 资助金额:
$ 74.98万 - 项目类别:
Integrated Assembly Software for Sanger and Next Generation Sequence Technologies
适用于 Sanger 和下一代序列技术的集成装配软件
- 批准号:
8011298 - 财政年份:2007
- 资助金额:
$ 74.98万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 74.98万 - 项目类别:
Continuing Grant