A CRITICAL EVALUATION AND COMPARISON OF COMPUTERIZED SEQUENCE ANALYSIS PROGRAMS

计算机化序列分析程序的批判性评估和比较

基本信息

  • 批准号:
    3752825
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

In collaboration with Dr. M. Miller, NCI, a critical, quantitative analysis was done of several commercial sequence assembly and analysis packages. A fundamental problem in contemporary molecular biology is the determination and interpretation of DNA sequences. Due to limitations of current sequencing technology, sequence determination entails the piecing together of short, overlapping sequence fragments into a single, long contiguous sequence. A number of commercial computer programs have been marketed to automate this process. While reviews of individual packages have been published, this is the first known study that critically compares the accuracy of assembly by these programs. Eleven programs were selected, primarily on the basis of their availability on the NIH campus. Sequence data is not random, but contains ordered repeated sequences. Likewise, errors in sequencing determinations are not randomly distributed. In order to provide a controlled and realistic dataset for measuring performance and accuracy, a known sequence, the rat multidrug resistance gene (RATMDRM, 5254 base pairs, accession number M62425) was split into 58 random overlapping fragments of 200 to 400 base pairs in length. These were then randomly seeded with 0 to 15% error based on the error distribution of the fragments originally used to determine the sequence. Errors were in the form of miscalled bases, deleted bases or added bases. The programs tested fell into three general groups based on accuracy. In order to rule out conditions unique to the chosen test sequence, four other sequences of between 4500 and 4600 base pairs were used to repeat the tests. With one exception, the error rates were comparable to those encountered using RATMDRM. Additionally, some programs were tested with different permutations of RATMDRM to ascertain their capacity to properly assemble the sequence regardless of the order of input of the fragments. Ease of editing the assembled sequences was also compared. Results of this study were accepted for publication by the Journal of Biological Computation.
与 NCI 的 M. Miller 博士合作,进行了一项关键的定量研究 对几个商业序列组装和分析进行了分析 包。当代分子生物学的一个基本问题是 DNA 序列的测定和解释。由于限制 目前的测序技术,序列确定需要拼接 将短的、重叠的序列片段组合成一个长的 连续的序列。许多商业计算机程序已 销售以自动化此过程。虽然对单个包进行审查 已发表,这是第一个已知的批判性研究 比较这些程序的装配精度。 选择了 11 个项目,主要是根据它们的情况 NIH 校园内的可用性。序列数据不是随机的,而是包含 有序的重复序列。同样,测序测定中的错误 不是随机分布的。为了提供受控和 用于测量性能和准确性的真实数据集,一个已知的 序列,大鼠多药耐药基因(RATMDRM,5254个碱基对, 登录号 M62425) 被分成 58 个随机重叠片段 长度为 200 至 400 个碱基对。然后将它们随机播种为 0 到 基于最初使用的片段的错误分布,有 15% 的错误 来确定顺序。错误以错误的碱基形式出现, 删除的碱基或添加的碱基。 根据准确性,测试的程序分为三组。在 为了排除所选测试序列特有的条件,四个 4500到4600个碱基对之间的其他序列用于重复 测试。除一个例外外,错误率与其他错误率相当 使用 RATMDRM 时遇到的问题。此外,一些程序经过测试 RATMDRM 的不同排列以确定其正确处理的能力 无论片段的输入顺序如何,都组装序列。 还比较了编辑组装序列的难易程度。结果 研究被《生物学杂志》接受发表 计算。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

J I POWELL其他文献

J I POWELL的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('J I POWELL', 18)}}的其他基金

UTILIZATION OF SPECIALIZED HARDWARE FOR DNA SEQUENCE ANALYSIS
利用专用硬件进行 DNA 序列分析
  • 批准号:
    5201629
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
LABORATORY ANALYSIS PACKAGE
实验室分析包
  • 批准号:
    3838532
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
UTILIZATION OF SPECIALIZED HARDWARE FOR DNA SEQUENCE ANALYSIS
利用专用硬件进行 DNA 序列分析
  • 批准号:
    3774984
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
UTILIZATION OF SPECIALIZED HARDWARE FOR DNA SEQUENCE ANALYSIS
利用专用硬件进行 DNA 序列分析
  • 批准号:
    3752823
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
LABORATORY ANALYSIS PACKAGE
实验室分析包
  • 批准号:
    3853630
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
DISCOVERY OF NOVEL HUMAN GENES BY AUTOMATED SEQUENCING OF CDNA LIBRARIES
通过 CDNA 文库自动测序发现新的人类基因
  • 批准号:
    3752824
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
COMPUTER SUPPORT FOR MOLECULAR SEQUENCING AND GENETIC MAPPING
分子测序和遗传图谱的计算机支持
  • 批准号:
    3838531
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
COMPUTATIONAL RESOURCES FOR AUTOMATED DNA SEQUENCING OF SUBTRACTED CDNA LIBARIES
用于扣除 CDNA 文库自动 DNA 测序的计算资源
  • 批准号:
    3774983
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
LABORATORY ANALYSIS PACKAGE
实验室分析包
  • 批准号:
    3874838
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
CSL SUPPORT FOR HIGH VOLUME DNA SEQUENCING
CSL 支持大批量 DNA 测序
  • 批准号:
    3874837
  • 财政年份:
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了