Mapping short reads in RY-space: a novel strategy for extending the phylogenetic range of Next Generation Sequence mapping algorithms

在 RY 空间中映射短读:扩展下一代序列映射算法系统发育范围的新策略

基本信息

  • 批准号:
    BB/I02347X/1
  • 负责人:
  • 金额:
    $ 15.2万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2012
  • 资助国家:
    英国
  • 起止时间:
    2012 至 无数据
  • 项目状态:
    已结题

项目摘要

Modern DNA sequencers can generate billions of short DNA fragments that will typically be only 50 to 100 molecules (nucleotides) in size. To make sense of these fragments, or reads, it is usually necessary to compare them with previously sequenced DNA, often an entire genome, which may be several billions of nucleotides in length. This is a difficult computational challenge. Although comparing one piece of DNA with another is a relatively simple process, the sheer number of fragments generated, combined with the often large size of the reference genome, means that without well designed software it is hard, if not impossible, to process all the data produced by a single sequencing run within a practical timeframe using current computers. Fast and efficient software have been duly developed but, in order to achieve a reasonable mapping speed with the latest computers, compromises have to be made. Current software can accommodate only a few differences between short DNA sequences for a successful match to be identified (often only two or three differences within a short stretch is possible). It is this limitation which makes it very difficult to compare DNA between species, where a high level of variation is to be expected, and so it is hard to make sense of short-read data if the organism in question has not been sequenced before. Unfortunately most species of research, economic, and clinical importance have yet to be sequenced. Species that haven't been sequenced before must go through a far more expensive process of long-read genome sequencing, and this means that the genomic analysis of many organisms remain financially unfeasible with current technologies. We propose a new way of mapping DNA that will, at least in part, overcome this difficulty. Our approach is based on the observation that evolutionary patterns existing within DNA sequences are more fully revealed when the information content is reduced and the sequence pattern simplified. A stretch of DNA can be thought of as a complex pattern of four types of nucleotide: Adenine, Cytosine, Guanine and Thymine (A, C, G, and T for short). A and G are purine nucleotides, whilst C and T are pyrimidines. It has long been recognised that, as organisms evolve, the rate at which a purine mutates into another purine, or a pyrimidine to another pyrimidine, will tend to be higher than when purines mutate into pyrimidines and visa versa. This imbalance in mutation rate will create patterns of purines and pyrimidines within DNA sequences that are more stable motifs of shared ancestry between species than is the case with more noisy nucleotide patterns. We will use this more robust pattern to match sequences together from different species: we will develop software which will simplify DNA sequences down to their purine and pyrimidine content alone, compare them to identify similarities using approaches equivalent to that currently used with raw DNA sequences, then convert them back into their original nucleotides for subsequent analysis. Because the conversion from individual nucleotides to their purine/pyrimidine identities alone is simple, the speed with which translated reads can be mapped will be comparable to that achieved with raw DNA. Thus, using our strategy, mapping should almost be as quick as current methods and use similar levels of computer resources. However, the extent to which one species can be compared with another will be far greater meaning that it will be possible to sequence more organisms with low-cost short-read sequencing technologies even when a reference sequence for that particular species is not available. As a result lower cost sequencing will become practical for a much wider range of organisms than is currently possible, ensuring that the new techniques currently being developed, that rely on short read sequencing, can be applied in many more contexts.
现代DNA测序仪可以产生数十亿个短DNA片段,这些片段的大小通常只有50到100个分子(核苷酸)。为了理解这些片段或读数,通常有必要将它们与之前测序的DNA进行比较,通常是整个基因组,可能有数十亿个核苷酸的长度。这是一项艰巨的计算挑战。尽管将一段DNA与另一段DNA进行比较是一个相对简单的过程,但产生的片段数量之多,加上参考基因组往往很大,意味着如果没有精心设计的软件,使用目前的计算机在一个实际的时间框架内处理一次测序产生的所有数据即使不是不可能,也是困难的。已经适当地开发了快速和高效的软件,但为了用最新的计算机实现合理的测绘速度,必须做出妥协。目前的软件只能适应短DNA序列之间的几个差异,以便识别成功的匹配(通常在短时间内只有两到三个差异是可能的)。正是这种局限性使得比较物种之间的DNA变得非常困难,因为预计物种之间会有很高水平的变异,所以如果之前没有对相关生物体进行测序,就很难理解短读数据的意义。不幸的是,大多数具有研究、经济和临床重要性的物种还没有测序。以前没有测序的物种必须经历一个昂贵得多的长时间基因组测序过程,这意味着用目前的技术对许多生物进行基因组分析在经济上仍然是不可行的。我们提出了一种绘制DNA图谱的新方法,至少在一定程度上可以克服这一困难。我们的方法是基于这样的观察,即当信息量减少,序列模式简化时,DNA序列中存在的进化模式被更充分地揭示。一段DNA可以被认为是四种核苷酸的复杂模式:腺嘌呤、胞嘧啶、鸟嘌呤和胸腺嘧啶(简称A、C、G和T)。A和G是嘌呤核苷酸,而C和T是嘧啶。人们早就认识到,随着生物体的进化,一种嘌呤突变为另一种嘌呤,或一种嘧啶突变为另一种嘧啶的速率,往往会高于嘌呤突变为嘧啶时的速率,反之亦然。这种突变率的不平衡将在DNA序列中产生嘌呤和嘧啶的模式,这些模式是物种之间共同祖先的更稳定的基序,而不是更嘈杂的核苷酸模式。我们将使用这种更强大的模式来匹配来自不同物种的序列:我们将开发软件,将DNA序列简化到仅包含嘌呤和嘧啶的含量,使用与当前用于原始DNA序列的方法相同的方法进行比较,以确定相似性,然后将它们转换回原始核苷酸,用于后续分析。因为从单个核苷酸到它们的嘌呤/嘧啶身份的转换本身很简单,所以翻译阅读图谱的绘制速度将与使用原始DNA实现的速度相当。因此,使用我们的策略,绘制地图应该几乎和当前的方法一样快,并使用类似级别的计算机资源。然而,一个物种可以与另一个物种进行比较的程度将远远超过这一程度,这意味着即使在没有特定物种的参考序列的情况下,也可以使用低成本的短读测序技术对更多的生物进行测序。因此,成本更低的测序将适用于比目前可能的更广泛的生物体,确保目前正在开发的依赖短读测序的新技术可以应用于更多的环境。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.
  • DOI:
    10.1093/nar/gku1341
  • 发表时间:
    2015-03-31
  • 期刊:
  • 影响因子:
    14.9
  • 作者:
    Schirmer M;Ijaz UZ;D'Amore R;Hall N;Sloan WT;Quince C
  • 通讯作者:
    Quince C
Analysis of the bread wheat genome using whole-genome shotgun sequencing.
使用全基因组shot弹枪测序分析面包小麦基因组。
  • DOI:
    10.1038/nature11650
  • 发表时间:
    2012-11-29
  • 期刊:
  • 影响因子:
    64.8
  • 作者:
  • 通讯作者:
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Neil Hall其他文献

It’s only human
  • DOI:
    10.1186/gb4159
  • 发表时间:
    2014-02-12
  • 期刊:
  • 影响因子:
    9.400
  • 作者:
    Neil Hall
  • 通讯作者:
    Neil Hall
Evolutionary genomics reveals variation in structure and genetic content implicated in virulence and lifestyle in the genus Gaeumannomyces
  • DOI:
    10.1186/s12864-025-11432-0
  • 发表时间:
    2025-03-12
  • 期刊:
  • 影响因子:
    3.700
  • 作者:
    Rowena Hill;Michelle Grey;Mariano Olivera Fedi;Daniel Smith;Gail Canning;Sabrina J. Ward;Naomi Irish;Jade Smith;Vanessa E. McMillan;Jess Hammond;Sarah-Jane Osborne;Gillian Reynolds;Ellie Smith;Tania Chancellor;David Swarbreck;Neil Hall;Javier Palma-Guerrero;Kim E. Hammond-Kosack;Mark McMullan
  • 通讯作者:
    Mark McMullan
Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients
  • DOI:
    10.1186/1475-2875-2-21
  • 发表时间:
    2003-07-21
  • 期刊:
  • 影响因子:
    3.000
  • 作者:
    Emilio F Merino;Carmen Fernandez-Becerra;Alda MBN Madeira;Ariane L Machado;Alan Durham;Arthur Gruber;Neil Hall;Hernando A del Portillo
  • 通讯作者:
    Hernando A del Portillo
Why science and synchronized swimming should not be Olympic sports
  • DOI:
    10.1186/gb-2012-13-9-171
  • 发表时间:
    2012-09-01
  • 期刊:
  • 影响因子:
    9.400
  • 作者:
    Neil Hall
  • 通讯作者:
    Neil Hall
Iron working in Anglo-Saxon England: New evidence to show fresh iron smelting of ironstone ores from the 6th–10th centuries CE
  • DOI:
    10.1016/j.jasrep.2018.02.019
  • 发表时间:
    2018-06-01
  • 期刊:
  • 影响因子:
  • 作者:
    Neil Hall
  • 通讯作者:
    Neil Hall

Neil Hall的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Neil Hall', 18)}}的其他基金

Open Access Block Award 2024 - Earlham Institute
2024 年开放获取区块奖 - Earlham Institute
  • 批准号:
    EP/Z531492/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Open Access Block Award 2023 - Earlham Institute
2023 年开放获取区块奖 - Earlham Institute
  • 批准号:
    EP/Y529126/1
  • 财政年份:
    2023
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Open Access Block Award 2022 - Earlham Institute
2022 年开放获取区块奖 - Earlham Institute
  • 批准号:
    EP/X526095/1
  • 财政年份:
    2022
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
ELIXIR-UK Coordination Office
ELIXIR-英国协调办公室
  • 批准号:
    BB/X011100/1
  • 财政年份:
    2022
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
The Earlham Institute 2021 Flexible Talent Mobility Account
厄勒姆学院 2021 年灵活人才流动账户
  • 批准号:
    BB/W510890/1
  • 财政年份:
    2021
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Business Case for a Catalyst Partnership in Artificial Intelligence between the Alan Turing Institute and the Norwich Biosciences Institutes
艾伦图灵研究所和诺里奇生物科学研究所之间人工智能催化剂合作伙伴关系的商业案例
  • 批准号:
    BB/V509267/1
  • 财政年份:
    2020
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Development of single-cell sequencing technology for microbial populations
微生物群体单细胞测序技术的发展
  • 批准号:
    BB/R022526/1
  • 财政年份:
    2018
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Ultra High-Throughput Sequencing for Norwich Research Park and the UK National Capability in Genomics
诺维奇研究园和英国国家基因组学能力的超高通量测序
  • 批准号:
    BB/R014329/1
  • 财政年份:
    2018
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Earlham Institute UKRI Innovation Fellowships: BBSRC Flexible Talent Mobility Accounts
厄勒姆研究所 UKRI 创新奖学金:BBSRC 灵活人才流动账户
  • 批准号:
    BB/R50659X/1
  • 财政年份:
    2017
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
Wheat Pan-Genomics
小麦泛基因组学
  • 批准号:
    BB/P010768/1
  • 财政年份:
    2017
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant

相似国自然基金

ESL1(Erect and Short Leaf 1)调控谷子株型的分子机制解析
  • 批准号:
    32301849
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
Long-TSLP和Short-TSLP佐剂对新冠重组蛋白疫苗免疫应答的影响与作用机制
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    58 万元
  • 项目类别:
    面上项目
与SHORT-ROOT和SCARECROW发育途径相关的IDD家族基因的确定和功能研究
  • 批准号:
    31871493
  • 批准年份:
    2018
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
long-TSLP和short-TSLP调控肺成纤维细胞有氧糖酵解在哮喘气道重塑中的作用和机制研究
  • 批准号:
    81700034
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
哮喘气道上皮来源long-TSLP/short-TSLP失衡对气道重塑中成纤维细胞活化的分子机制研究
  • 批准号:
    81670026
  • 批准年份:
    2016
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
短链脂肪酸上调小肠上皮紧密连接屏障功能的机制
  • 批准号:
    31040041
  • 批准年份:
    2010
  • 资助金额:
    10.0 万元
  • 项目类别:
    专项基金项目
MBR中溶解性微生物产物膜污染界面微距作用机制定量解析
  • 批准号:
    50908133
  • 批准年份:
    2009
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
高通量DNA测序片段的拼接
  • 批准号:
    30871393
  • 批准年份:
    2008
  • 资助金额:
    35.0 万元
  • 项目类别:
    面上项目
短QT综合征新致病基因的定位研究
  • 批准号:
    30771183
  • 批准年份:
    2007
  • 资助金额:
    8.0 万元
  • 项目类别:
    面上项目
基于短寿蛋白肿瘤疫苗诱导的抗瘤作用及其机制的研究
  • 批准号:
    30771999
  • 批准年份:
    2007
  • 资助金额:
    33.0 万元
  • 项目类别:
    面上项目

相似海外基金

Tuneable short-wavelength infrared mode-locked fibre lasers
可调谐短波长红外锁模光纤激光器
  • 批准号:
    EP/Y001915/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Research Grant
The role of nigrostriatal and striatal cell subtype signaling in behavioral impairments related to schizophrenia
黑质纹状体和纹状体细胞亚型信号传导在精神分裂症相关行为障碍中的作用
  • 批准号:
    10751224
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
A robust ensemble Kalman filter to innovate short-range severe weather prediction
强大的集成卡尔曼滤波器创新短程恶劣天气预测
  • 批准号:
    24K07131
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
EAGER: IMPRESS-U: Gradient surface nanostructuring with short laser pulses
EAGER:IMPRESS-U:使用短激光脉冲进行梯度表面纳米结构
  • 批准号:
    2406599
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Standard Grant
Travel Support: A Short Course on The Polymer Physics of Additive Manufacturing; 2024 American Physical Society (APS) Meeting; Minneapolis, Minnesota; 2-3 March 2024
差旅支持:增材制造聚合物物理短期课程;
  • 批准号:
    2403712
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Dynamics of Short Range Order in Multi-Principal Element Alloys
合作研究:多主元合金中的短程有序动力学
  • 批准号:
    2348956
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Standard Grant
Collaborative Research: Dynamics of Short Range Order in Multi-Principal Element Alloys
合作研究:多主元合金中的短程有序动力学
  • 批准号:
    2348955
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Standard Grant
Executive functions in urban Hispanic/Latino youth: exposure to mixture of arsenic and pesticides during childhood
城市西班牙裔/拉丁裔青年的执行功能:童年时期接触砷和农药的混合物
  • 批准号:
    10751106
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
Gain-of-function toxicity in alpha-1 antitrypsin deficient type 2 alveolar epithelial cells
α-1 抗胰蛋白酶缺陷型 2 型肺泡上皮细胞的功能获得毒性
  • 批准号:
    10751760
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
Global Perspectives of the Short-term Association between Suicide and Nighttime Excess Heat during Tropical Nights
热带夜晚自杀与夜间酷热之间短期关联的全球视角
  • 批准号:
    24K10701
  • 财政年份:
    2024
  • 资助金额:
    $ 15.2万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了