BlobToolKit: Identification and analysis of non-target data in all Eukaryotic genome projects

BlobToolKit:所有真核基因组项目中非目标数据的识别和分析

基本信息

  • 批准号:
    BB/P024238/2
  • 负责人:
  • 金额:
    $ 15.22万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2019
  • 资助国家:
    英国
  • 起止时间:
    2019 至 无数据
  • 项目状态:
    已结题

项目摘要

Genomics has become one of the cornerstones of biology. Knowing an organism's genome sequence immediately allows us to work out what kinds of biology it is able to do, and acts as a platform upon which we can build experiments to test, for example, the dynamics of gene activity during stress or disease. If genomes are the cornerstones, genome databases are the libraries built from these data that allow science to collaborate and build upon its successes. Genome sequencing is getting easier, as technologies improve by leaps and bounds: new, high throughput sequencers and advanced computing. The human genome cost $3 billion to sequence the first time round: now it would cost about $15,000. This reduction in cost has opened up genome sequencing to many research projects on new species, and there are now about 30,000 bacterial genomes and 3,000 eukaryotic genomes in public databases.When genomes are contaminated, the genome databases, the reference libraries, are also contaminated, and the scientific process becomes muddied: errors can be made that affect many later steps in understanding the natural world, or exploiting it for bioscience. Obviously no scientist knowingly submits contaminated genome data to the central databases, but as genome sequencing projects become more common, more and more contaminated data are getting into the databases of record.How does contamination happen? Organisms live in environments with other species, and it is often not possible or not advisable to separate these before making DNA to be sequenced. For example, most animals have bacteria in their guts, and getting rid of these before extracting DNA from a whole specimen of a tiny species is difficult. Similarly, plants naturally have communities of fungi and bacteria growing in and on their leaves and roots. In the case of symbiotic organisms, where the interaction is very intimate, the specimen is indivisible. The genomes of the different contributing species will be mixed up in the raw sequence data generated from such samples.We propose to build a set of computational tools, BlobToolKit, that will identify contaminants. BlobToolKit will be useful both during the process of making new genomes for the first time (where they will separate out the different organisms in the mix of raw sequence data), and during reanalyses of existing genome assemblies.BlobToolKit will be made freely available as a standalone program, as a service on the internet, and as a system that will be plugged into the big public databases to report on possible contamination. The project, a collaboration between the University of Edinburgh and the European Bioinformatics Institute, aims, within 3 years, to have identified all the problems in "legacy" genomes already submitted to public databases, and to have in place a system that prevents further contamination happening.BlobToolKit reports will be provided as part of the submission process to those scientists reporting genome assemblies, ensuring the exposure of our technology to its users. We will further promote BlobToolKit by publication of our results in open access journals, presentations and workshops at relevant meetings, discussion with standards organisations, delivering training workshops to interested groups of scientists, and maintaining a rich resource of training and tutorial materials on the web. Our aim is to steer the scientific community to a culture in which contamination in genome assembly is understood and expected, and freely available and versatile software tools are known that can assist in the flagging and prevention of contamination in the public record.
基因组学已成为生物学的基石之一。了解有机体的基因组序列可以立即让我们弄清楚它能做什么生物学,并充当一个平台,我们可以在这个平台上建立实验来测试,例如,在压力或疾病期间基因活动的动态。如果说基因组是基石,那么基因组数据库就是从这些数据中建立起来的库,使科学能够合作并在其成功的基础上再接再厉。随着技术的突飞猛进:新的高通量测序仪和先进的计算,基因组测序变得越来越容易。人类基因组第一次测序花费了30亿美元:现在大约需要1.5万美元。这种成本的降低为许多新物种的研究项目打开了基因组测序的大门,目前公共数据库中约有30,000个细菌基因组和3,000个真核基因组。当基因组受到污染时,基因组数据库和参考图书馆也会受到污染,科学过程变得混乱:可能会出现错误,影响理解自然世界或将其用于生物科学的许多后续步骤。显然,没有科学家故意将受污染的基因组数据提交给中央数据库,但随着基因组测序项目变得越来越常见,越来越多的受污染的数据进入了记录的数据库。污染是如何发生的?生物体与其他物种生活在一起的环境中,在进行DNA测序之前,将它们分离出来通常是不可能的或不可取的。例如,大多数动物的肠道中都有细菌,在从一个微小物种的整个样本中提取DNA之前,清除这些细菌是困难的。类似地,植物的叶和根中自然生长着真菌和细菌群落。在共生生物的情况下,相互作用非常密切,标本是不可分割的。不同物种的基因组将混合在从这些样本产生的原始序列数据中。我们建议建立一套计算工具BlobToolKit,来识别污染物。BlobToolKit将在首次制造新基因组的过程中(它们将在原始序列数据的混合中分离出不同的生物)和在重新分析现有基因组组合的过程中都是有用的。BlobToolKit将作为一个独立的程序、互联网上的一项服务和一个将接入大型公共数据库的系统免费提供,以报告可能的污染。该项目是爱丁堡大学和欧洲生物信息学研究所的合作项目,目标是在3年内找出已经提交给公共数据库的“遗留”基因组中的所有问题,并建立一个防止进一步污染发生的系统。BlobToolKit报告将作为提交过程的一部分提供给那些报告基因组组装的科学家,以确保我们的技术暴露在用户面前。我们将通过在相关会议的开放获取期刊、演示文稿和研讨会上发布我们的成果、与标准组织进行讨论、为感兴趣的科学家团体举办培训研讨会以及在网上维护丰富的培训和教程材料资源来进一步推广BlobToolKit。我们的目标是引导科学界进入这样一种文化,在这种文化中,基因组组装中的污染是可以理解和预期的,并且已知可以免费获得和通用的软件工具,可以帮助标记和预防公共记录中的污染。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The genome sequence of the Buff Ermine, Spilarctia lutea (Hufnagel, 1766).
  • DOI:
    10.12688/wellcomeopenres.19065.1
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
The genome sequence of the small tortoiseshell butterfly, Aglais urticae (Linnaeus, 1758)
小龟甲蝴蝶 Aglais urticae 的基因组序列(Linnaeus,1758)
  • DOI:
    10.12688/wellcomeopenres.17197.1
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Bishop G
  • 通讯作者:
    Bishop G
The complete genome sequence of Eimeria tenella (Tyzzer 1929), a common gut parasite of chickens.
  • DOI:
    10.12688/wellcomeopenres.17100.1
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Aunin E;Böhme U;Blake D;Dove A;Smith M;Corton C;Oliver K;Betteridge E;Quail MA;McCarthy SA;Wood J;Tracey A;Torrance J;Sims Y;Howe K;Challis R;Berriman M;Reid A
  • 通讯作者:
    Reid A
The genome sequence of the grey top shell, Steromphala cineraria (Linnaeus, 1758)
灰顶壳 Steromphala cineraria 的基因组序列(Linnaeus,1758)
  • DOI:
    10.12688/wellcomeopenres.17677.1
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adkins P
  • 通讯作者:
    Adkins P
The genome sequence of the Brown Scallop, Philereme vetulata (Denis and Schiffermüller, 1775)
棕色扇贝 Philereme vetulata 的基因组序列(Denis 和 Schiffermüller,1775)
  • DOI:
    10.12688/wellcomeopenres.18948.1
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Boyes D
  • 通讯作者:
    Boyes D
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Mark Blaxter其他文献

Explorer Imagining Sisyphus happy : DNA barcoding and the unnamed majority
探险家想象西西弗斯的快乐:DNA条形码和无名的大多数
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Mark Blaxter
  • 通讯作者:
    Mark Blaxter
Duplication and divergence: the evolution of nematode globins.
复制和分歧:线虫球蛋白的进化。
  • DOI:
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    1.3
  • 作者:
    Paul Hunt;Jody McNally;W. Barris;Mark Blaxter
  • 通讯作者:
    Mark Blaxter
Animal roots and shoots
动物的根和芽
  • DOI:
    10.1038/4341076a
  • 发表时间:
    2005-04-27
  • 期刊:
  • 影响因子:
    48.500
  • 作者:
    Martin Jones;Mark Blaxter
  • 通讯作者:
    Mark Blaxter
Two worms are better than one
两人智慧胜一人。
  • DOI:
    10.1038/426395a
  • 发表时间:
    2003-11-27
  • 期刊:
  • 影响因子:
    48.500
  • 作者:
    Mark Blaxter
  • 通讯作者:
    Mark Blaxter
Sum of the arthropod parts
节肢动物部分的总和
  • DOI:
    10.1038/35093191
  • 发表时间:
    2001-09-13
  • 期刊:
  • 影响因子:
    48.500
  • 作者:
    Mark Blaxter
  • 通讯作者:
    Mark Blaxter

Mark Blaxter的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Mark Blaxter', 18)}}的其他基金

Genomics of Host-Parasite Coevolution: A Test of Arms Race and Red Queen Dynamics in a Wild Insect System
宿主-寄生虫协同进化的基因组学:野生昆虫系统中军备竞赛和红皇后动力学的测试
  • 批准号:
    NE/W001519/1
  • 财政年份:
    2022
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
BBR GenomeHubs - agile genome databasing for neglected organisms of agricultural, development and biodiversity importance
BBR GenomeHubs - 针对农业、发展和生物多样性重要性的被忽视生物体的敏捷基因组数据库
  • 批准号:
    BB/R015325/2
  • 财政年份:
    2020
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
BBR GenomeHubs - agile genome databasing for neglected organisms of agricultural, development and biodiversity importance
BBR GenomeHubs - 针对农业、发展和生物多样性重要性的被忽视生物体的敏捷基因组数据库
  • 批准号:
    BB/R015325/1
  • 财政年份:
    2018
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
BlobToolKit: Identification and analysis of non-target data in all Eukaryotic genome projects
BlobToolKit:所有真核基因组项目中非目标数据的识别和分析
  • 批准号:
    BB/P024238/1
  • 财政年份:
    2017
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
Building a genome analytic resource for the lepidopteran community
为鳞翅目动物群落建立基因组分析资源
  • 批准号:
    BB/K020161/1
  • 财政年份:
    2013
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
Genetic basis of reproductive and plumage polymorphism in the ruff
颈毛生殖和羽毛多态性的遗传基础
  • 批准号:
    BB/J018791/1
  • 财政年份:
    2012
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
The evolutionary genomics of sexual recombination
性重组的进化基因组学
  • 批准号:
    NE/J011355/1
  • 财政年份:
    2012
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
Future-Proofing the Sustainability of the MRC High Throughput Sequencing Hub in Scotland
苏格兰 MRC 高通量测序中心的可持续性发展
  • 批准号:
    MR/K001744/1
  • 财政年份:
    2012
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
NextGenPartiGene: next generation transcriptome assembly annotation and exploitation toolkit
NextGenPartiGene:下一代转录组组装注释和开发工具包
  • 批准号:
    BB/I023585/1
  • 财政年份:
    2011
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
Developing RAD markers as a resource for plant breeding
开发 RAD 标记作为植物育种资源
  • 批准号:
    BB/H023844/1
  • 财政年份:
    2011
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant

相似国自然基金

相似海外基金

SCAnDi: Single-cell and single molecule analysis for DNA identification
SCAnDi:用于 DNA 鉴定的单细胞和单分子分析
  • 批准号:
    ES/Y010655/1
  • 财政年份:
    2024
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Research Grant
Identification of Prospective Predictors of Alcohol Initiation During Early Adolescence
青春期早期饮酒的前瞻性预测因素的鉴定
  • 批准号:
    10823917
  • 财政年份:
    2024
  • 资助金额:
    $ 15.22万
  • 项目类别:
Identification and analysis of a novel pattern-recognition receptor that senses intracellular bacterial components
感知细胞内细菌成分的新型模式识别受体的鉴定和分析
  • 批准号:
    23H02715
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Identification of the place of origin for oriental bezoar by genetic analysis
遗传分析鉴定牛黄产地
  • 批准号:
    23K06198
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Identification and analysis of the three-dimensional solar plasma flows
三维太阳等离子体流的识别与分析
  • 批准号:
    2884318
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Studentship
Locality identification of volcanic rock artifacts excavated from the Teotihuacan site by archaeological scientific analysis and LiDAR survey
通过考古科学分析和激光雷达调查对特奥蒂瓦坎遗址出土的火山岩文物进行地点识别
  • 批准号:
    23K18710
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
    Grant-in-Aid for Research Activity Start-up
varCUT&Tag: A Method for Simultaneous Identification and Characterization of Sequence Variants in Regulatory Elements and Genes
可变剪切
  • 批准号:
    10662799
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
Identification of Genetic and Molecular Bases of Derived Phenotypes in Primate Brain Development
灵长类动物大脑发育中衍生表型的遗传和分子基础的鉴定
  • 批准号:
    10841947
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
Systematic identification of RNA sequences and protein components regulating circular RNA translation
系统鉴定调节环状 RNA 翻译的 RNA 序列和蛋白质成分
  • 批准号:
    10816653
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
Towards equitable early identification of autism spectrum disorders in females
实现女性自闭症谱系障碍的公平早期识别
  • 批准号:
    10722011
  • 财政年份:
    2023
  • 资助金额:
    $ 15.22万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了