权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Adapting Phred/Phrap/Consed for NextGen Sequencing

调整 Phred/Phrap/Consed 进行下一代测序

基本信息

批准号：
7847401
负责人：
PHILIP P GREEN
金额：
$ 59.13万
依托单位：
UNIVERSITY OF WASHINGTON
依托单位国家：
美国
项目类别：
财政年份：
2010
资助国家：
美国
起止时间：
2010-09-18 至 2013-06-30
项目状态：
已结题

项目摘要

DESCRIPTION (provided by applicant): Adapting Phred/Phrap/Consed to Next-Generation Sequencing New methods for DNA sequencing are allowing the production of much more data at a fraction of the cost of traditional technologies, such that DNA sequencing is now being used more than ever before in biomedical research. However software to analyze the output from these new technologies could be significantly improved. This proposal is to upgrade the widely used phred/phrap/consed package for these "next-generation" sequencers. We have developed a new base-calling and image analysis program, next_phred, for the Illumina sequencer which gives 80%-90% more reads than the Illumina software and 50% fewer base- calling errors, thus significantly reducing sequencing costs and allowing more confident detection of sequence variants. We will make further performance improvements and investigate whether changes to the Illumina experimental protocol can increase yield still further. We will also calibrate the error probabilities for the base-callers of other next-generation sequencers. We will enable consed (the visualization, finishing, and analysis tool) to nimbly handle assemblies of up to several billion reads, a large reference sequence, and high depth of coverage; to detect structural variants and determine SNPs using a probabilistic model; to directly read the output of assemblers commonly used with next-generation data; and to perform batch correction of erroneous assemblies and consensus bases. We will further improve cross_match (the flexible sequence alignment program which is part of phred/phrap/consed) and our new ultrafast aligner phaster for mapping large numbers of genomic or RNA-Seq reads to a reference genome. Both programs will be given speed and functionality enhancements, including the capability to handle paired reads and to output alignments in a more compact file format. We will create a bioinformatics environment allowing even small labs to manage the massive amounts of data from next-generation sequencers. This will include the implementation of compact file formats, prescriptions for data storage, generation of files usable in a variety of applications, and pipelines for Illumina and 454 data processing. PUBLIC HEALTH RELEVANCE: New DNA sequencing technologies are vastly increasing the amount of data available to decipher the genetic basis of human disease. Software able to fully exploit this data is currently lacking. Our software, commonly used for older types of sequencing machines, will be improved to meet this challenge and to significantly lower sequencing costs.

描述（由申请人提供）：使Phred/Phrap/Consed适应下一代测序DNA测序的新方法允许以传统技术的一小部分成本产生更多的数据，使得DNA测序现在比以往任何时候都更多地用于生物医学研究。然而，分析这些新技术输出的软件还可以大大改进。这项建议是为了升级这些“下一代”测序仪广泛使用的phred/phrap/consed软件包。我们已经为Illumina测序仪开发了一种新的碱基识别和图像分析程序next_phred，其给出比Illumina软件多80%-90%的读数和少50%的碱基识别错误，从而显著降低测序成本并允许更有信心地检测序列变体。我们将进一步改进性能，并研究对Illumina实验协议的更改是否可以进一步提高产量。我们还将校准其他下一代测序仪的碱基调用者的错误概率。我们将使consed（可视化，整理和分析工具）能够灵活地处理高达数十亿读取，大参考序列和高覆盖深度的组件;使用概率模型检测结构变异并确定SNP;直接读取下一代数据常用的汇编程序的输出;并对错误的组件和共识碱基进行批量校正。我们将进一步改进cross_match（phred/phrap/consed的一部分，灵活的序列比对程序）和我们新的超快比对器phaster，用于将大量基因组或RNA-Seq读数映射到参考基因组。这两个程序将被赋予速度和功能增强，包括处理配对读取和以更紧凑的文件格式输出比对的能力。我们将创建一个生物信息学环境，使即使是小实验室也能管理来自下一代测序仪的大量数据。这将包括实现紧凑的文件格式，数据存储的处方，可用于各种应用程序的文件的生成，以及Illumina和454数据处理的管道。公共卫生关系：新的DNA测序技术大大增加了可用于破译人类疾病遗传基础的数据量。目前缺乏能够充分利用这些数据的软件。我们的软件通常用于较旧类型的测序仪，将得到改进，以应对这一挑战，并显着降低测序成本。