Scalable post-assembly editing software for finishing and annotating personal genomes
可扩展的组装后编辑软件,用于完成和注释个人基因组
基本信息
- 批准号:9883809
- 负责人:
- 金额:$ 75万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-09-01 至 2022-02-28
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsAllelesAutomated AnnotationAwarenessBacterial GenomeBase SequenceBiological MarkersCatalogingCatalogsChromosomesComplementComplexComputer softwareComputersConsensus SequenceDNA ResequencingDNA sequencingDataDiagnosisDiploidyDisease susceptibilityFoundationsGenerationsGenesGeneticGenetic VariationGenomeGenomicsGleanGoalsHaplotypesHourHuman GenomeIndividualManualsMapsPerformancePersonsPhasePhenotypePolishesPopulationProteinsRecording of previous eventsResourcesRunningTechnologyVariantVisualizationWritingbasecausal variantcohortcontigcostdesignexperiencefile formatgenome annotationgraphical user interfacehuman diseaseimprovedknowledge basenext generation sequencingopen sourcepersonalized medicineprogramsprototypereference genomescaffoldsuccesstoolwhole genome
项目摘要
We are entering a new era of personal genomics where an individual's genome sequence will be used to
identify disease susceptibility, improve diagnosis and better treat illnesses as well as be combined across
cohorts and populations to identify new biomarkers and causal mutations underlying any phenotype. Despite
the tremendous success of mapping short read next-generation sequencing (NGS) data onto a reference
genome (resequencing) in identifying genetic variation in a new genome, the inherent lack of long range
connectivity together with reference-induced biases make obtaining complete haplotype-phased genomes
exceedingly difficult. Emerging long read technologies are beginning to address this critical shortcoming by
direct de novo assembly of an individual's genome. However, initial de novo assemblies typically consist of
many thousands of unordered contigs that require extensive post-assembly processing to produce finished
sequences that can be effectively mined for genetic content and variation. Thus, there is an urgent need for
integrated, scalable post-assembly software that 1) automatically organizes, joins and phases the initial contigs
into complete haplotype sequences, 2) supports optional NGS and/or manual polishing and 3) provides initial
automated annotation of those sequences. Currently, such software does not exist and instead users must
cobble together a confusing array of difficult-to-use, task-specific pieces of open source programs.
DNASTAR's post-assembly editing program, SeqMan Pro (SMP), has a proven history in finishing bacterial
sized genomes although it currently lacks the scalability and all the needed functionality to tackle human
genome sized problems. The primary goal of this Fast Track proposal is to create a fully scalable version of
SMP for the automated finishing and annotation of de novo assembled large eukaryotic genomes while also
providing a manual editing platform when needed. During Phase I, we will develop two key prototypes: 1) a
new assembly file format, eBAM, which is interconvertible with the BAM format, but also is editable like our
SQD files and 2) a rapid reference-assisted contig scaffolding tool adapted from our proprietary Disk Sort
Alignment (DSA) algorithm. With that foundation, we will complete the transformation of SMP in Phase II by: 1)
refining the eBAM format for optimal editing performance, 2) building a new 64-bit version of the SMP editing
engine that incorporates the additional functionality necessary for post-assembly finishing of large eukaryotic
genomes including automated DSA-based scaffolding and phase-aware gap filling, contig joining and
haplotype refinement, 3) creating a new DSA-based genome aligner for rapidly aligning a finished sequence to
an annotated reference genome which together with 4) a new feature transfer and analysis module, will permit
initial annotation of the finished genome along with a cataloging of variants and their impact in both native and
reference coordinates. Inclusion of the reference coordinates allows variants in the new genome to be easily
associated with the wealth of information available through the numerous online knowledgebase resources.
我们正在进入一个个人基因组学的新时代,个人的基因组序列将被用于
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
TIMOTHY J DURFEE其他文献
TIMOTHY J DURFEE的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('TIMOTHY J DURFEE', 18)}}的其他基金
Long read based sequencing software for the comprehensive analysis of clinical samples
基于长读长的测序软件,用于临床样本的综合分析
- 批准号:
10009727 - 财政年份:2020
- 资助金额:
$ 75万 - 项目类别:
Scalable post-assembly editing software for finishing and annotating personal genomes
可扩展的组装后编辑软件,用于完成和注释个人基因组
- 批准号:
9767335 - 财政年份:2018
- 资助金额:
$ 75万 - 项目类别:
Complete genome de novo assembly software for the emerging long read sequencing era
适用于新兴长读长测序时代的完整基因组从头组装软件
- 批准号:
9255092 - 财政年份:2017
- 资助金额:
$ 75万 - 项目类别:
Complete genome de novo assembly software for the emerging long read sequencing era
适用于新兴长读长测序时代的完整基因组从头组装软件
- 批准号:
9747613 - 财政年份:2017
- 资助金额:
$ 75万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8236680 - 财政年份:2012
- 资助金额:
$ 75万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8703156 - 财政年份:2012
- 资助金额:
$ 75万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8727829 - 财政年份:2012
- 资助金额:
$ 75万 - 项目类别:
Association Analysis Software for Mining Clinical Next-Gen Sequencing Data
用于挖掘临床下一代测序数据的关联分析软件
- 批准号:
8624982 - 财政年份:2012
- 资助金额:
$ 75万 - 项目类别:
A Desktop Assembly and Analysis Pipeline for Next-gen Metagenomic Sequencing
用于下一代宏基因组测序的桌面组装和分析流程
- 批准号:
8200467 - 财政年份:2011
- 资助金额:
$ 75万 - 项目类别:
Integrated Assembly Software for Sanger and Next Generation Sequence Technologies
适用于 Sanger 和下一代序列技术的集成装配软件
- 批准号:
8011298 - 财政年份:2007
- 资助金额:
$ 75万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 75万 - 项目类别:
Research Grant