The Terabase Search Engine

Terabase 搜索引擎

基本信息

  • 批准号:
    8688406
  • 负责人:
  • 金额:
    $ 35.5万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
    2014
  • 资助国家:
    美国
  • 起止时间:
    2014-07-01 至 2017-04-30
  • 项目状态:
    已结题

项目摘要

DESCRIPTION (provided by applicant): We propose to create a new system, the Terabase Search Engine that will make it possible for biomedical researchers to search all human DNA sequences that have been sequenced and deposited in public archives. The vast and growing resource of human DNA sequences provides a wealth of opportunities for scientific discovery and for validation of results, but the size of the data sets has already far exceeded the ability o most researchers to use them. For more than two decades, geneticists and geneticists have relied on DNA sequence databases for a wide range of scientific endeavors, including the discovery of new genes and new mutations, the investigation of evolutionary changes within and between species, the forces affecting chromosomal structure and change, and many other molecular and evolutionary processes. The ability to search all known genes and genomes using BLAST and similar programs has long been assumed, and sequence search engines throughout the world provide this ability. However, the raw data pouring out of next-generation sequencing (NGS) projects has exceeded our ability to provide rapid access to it. A single NGS instrument can generate six billion reads encompassing 600 billion bases in a single run, and this capacity is still growing. Traditional alignment programs like BLAST cannot sort through this data in a reasonable amount of time. Newer, faster programs such as Bowtie (developed by our group) allow far faster alignment of NGS reads to the genome, but today the size of the data sets, now in excess of 1 trillion reads, far exceeds the ability of most computers to store it. And even the fastest alignment programs today could not search all this data in a reasonable amount of time. A new approach is required in order to serve up these huge and hugely valuable DNA sequences to the research community. The Terabase Search Engine will be a new, highly efficient system for searching trillions of bases in real time. Using a hierarchical search strategy with extensive pre-processing to speed up response time, the TSE will allow a scientist to align any sequence, human or non-human, to all publicly-available human sequence reads. Reads that match the human genome will be indexed and stored on very high-speed disks for rapid retrieval. Reads that match microbial sequences will be captured and stored separately for use in micro biome and infectious disease research. The system will be made available through a user-friendly web interface, and a local database will store each user's results for further analysis on the TSE site or for download to a local site. This system will make it possible, for the first time ever, for any scientist to align a sequence to the complete set of human DNA sequences and to retrieve everything that matches, without the need to write special-purpose programs or to use complex cloud-based software interfaces. All of the software for this project will be developed under an open-source model that will permit others to use, modify, share, and re-distribute the code without restriction.
描述(由申请人提供):我们建议创建一个新的系统,Terrace搜索引擎,这将使生物医学研究人员有可能搜索所有已测序并保存在公共档案馆的人类DNA序列。人类DNA序列的巨大和不断增长的资源为科学发现和验证结果提供了丰富的机会,但数据集的大小已经远远超过了大多数研究人员使用它们的能力。二十多年来,遗传学家和遗传学家一直依赖DNA序列数据库进行广泛的科学研究,包括发现新基因和新突变,研究物种内和物种间的进化变化,影响染色体结构和变化的力量,以及许多其他分子和进化过程。使用BLAST和类似程序搜索所有已知基因和基因组的能力早已被假定,并且世界各地的序列搜索引擎提供了这种能力。然而,下一代测序(NGS)项目产生的大量原始数据已经超出了我们快速获取数据的能力,一台NGS仪器一次运行就可以产生60亿个读数,包括6000亿个碱基,而且这种能力还在不断增长。像BLAST这样的传统比对程序无法在合理的时间内对这些数据进行排序。更新,更快的程序,如Bowtie(由我们小组开发)允许更快地将NGS读数与基因组对齐,但今天的数据集的大小,现在超过1万亿读数,远远超过大多数计算机存储它的能力。 即使是当今最快的比对程序也不能在合理的时间内搜索所有这些数据。需要一种新的方法来为研究界提供这些巨大且极具价值的DNA序列。Terrace搜索引擎将是一个新的、高效的系统,用于真实的搜索数万亿个碱基。TSE使用具有广泛预处理的分层搜索策略来加快响应时间,将允许科学家将任何序列(人类或非人类)与所有公开的人类序列读数进行比对。与人类基因组匹配的读数将被编入索引并存储在非常高速的磁盘上,以供快速检索。与微生物序列匹配的读数将被捕获并单独储存,用于微生物组和传染病研究。该系统将通过用户友好的网络界面提供,本地数据库将存储每个用户的结果,以便在TSE网站上进行进一步分析或下载到本地网站。该系统将使 这是有史以来第一次,任何科学家都可以将一个序列与完整的人类DNA序列进行比对,并检索所有匹配的序列,而无需编写特殊用途的程序或使用复杂的基于云的软件界面。该项目的所有软件都将在开源模式下开发,允许其他人不受限制地使用,修改,共享和重新分发代码。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Steven L. Salzberg其他文献

The 15th Genomic Standards Consortium meeting
  • DOI:
    10.4056/sigs.3457
  • 发表时间:
    2013-01-01
  • 期刊:
  • 影响因子:
    5.400
  • 作者:
    Lynn Schriml;Ilene Mizrachi;Peter Sterk;Dawn Field;Lynette Hirschman;Tatiana Tatusova;Susanna Sansone;Jack Gilbert;David Schindel;Neil Davies;Chris Meyer;Folker Meyer;George Garrity;Lita Proctor;M. H. Medema;Yemin Lan;Anna Klindworth;Frank Oliver Glöckner;Tonia Korves;Antonia Gonzalez;Peter Dwayndt;Markus Göker;Anjette Johnston;Evangelos Pafilis;Susanne Schneider;K. Baker;Cynthia Parr;G. Sutton;H. H. Creasy;Nikos Kyrpides;K. Eric Wommack;Patricia L. Whetzel;Daniel Nasko;Hilmar Lapp;Takamoto Fujisawa;Adam M. Phillippy;Renzo Kottman;Judith A. Blake;Junhua Li;Elizabeth M. Glass;Petra ten Hoopen;Rob Knight;Susan Holmes;Curtis Huttenhower;Steven L. Salzberg;Bing Ma;Owen White
  • 通讯作者:
    Owen White
C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993
  • DOI:
    10.1007/bf00993309
  • 发表时间:
    1994-09-01
  • 期刊:
  • 影响因子:
    2.900
  • 作者:
    Steven L. Salzberg
  • 通讯作者:
    Steven L. Salzberg
Reply to Austin and Korem, “Compositional transformations can reasonably introduce phenotype-associated values into sparse features”
回复奥斯汀和科雷姆,“组合变换可以合理地将与表型相关的值引入稀疏特征”
  • DOI:
    10.1128/msystems.00248-25
  • 发表时间:
    2025-04-30
  • 期刊:
  • 影响因子:
    4.600
  • 作者:
    Steven L. Salzberg
  • 通讯作者:
    Steven L. Salzberg
Yeast rises again
酵母再次兴起
  • DOI:
    10.1038/423233a
  • 发表时间:
    2003-05-15
  • 期刊:
  • 影响因子:
    48.500
  • 作者:
    Steven L. Salzberg
  • 通讯作者:
    Steven L. Salzberg
Q UALITY ASSESSMENT OF SPLICE SITE ANNOTATION BASED ON CONSERVATION ACROSS MULTIPLE SPECIES
基于多物种保护的剪接位点注释质量评估
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ilia Minkin;Steven L. Salzberg
  • 通讯作者:
    Steven L. Salzberg

Steven L. Salzberg的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Steven L. Salzberg', 18)}}的其他基金

Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness
大脑中综合人类表达序列(CHESS-BRAIN)及其在神经精神疾病中的作用
  • 批准号:
    10541887
  • 财政年份:
    2021
  • 资助金额:
    $ 35.5万
  • 项目类别:
Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness
大脑中综合人类表达序列(CHESS-BRAIN)及其在神经精神疾病中的作用
  • 批准号:
    10362615
  • 财政年份:
    2021
  • 资助金额:
    $ 35.5万
  • 项目类别:
Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness
大脑中综合人类表达序列(CHESS-BRAIN)及其在神经精神疾病中的作用
  • 批准号:
    10205617
  • 财政年份:
    2021
  • 资助金额:
    $ 35.5万
  • 项目类别:
Computational Methods for Microbial and Microbiome Sequence Analysis
微生物和微生物组序列分析的计算方法
  • 批准号:
    10331733
  • 财政年份:
    2019
  • 资助金额:
    $ 35.5万
  • 项目类别:
Computational Methods for Microbial and Microbiome Sequence Analysis
微生物和微生物组序列分析的计算方法
  • 批准号:
    10550160
  • 财政年份:
    2019
  • 资助金额:
    $ 35.5万
  • 项目类别:
Computational Methods for Microbial and Microbiome Sequence Analysis
微生物和微生物组序列分析的计算方法
  • 批准号:
    10083744
  • 财政年份:
    2019
  • 资助金额:
    $ 35.5万
  • 项目类别:
The Terabase Search Engine
Terabase 搜索引擎
  • 批准号:
    8882493
  • 财政年份:
    2014
  • 资助金额:
    $ 35.5万
  • 项目类别:
Computational Gene Modeling and Genome Sequence Assembly
计算基因建模和基因组序列组装
  • 批准号:
    8329127
  • 财政年份:
    2011
  • 资助金额:
    $ 35.5万
  • 项目类别:
Alignment Software for Second-Generation Sequencing
用于第二代测序的比对软件
  • 批准号:
    8068060
  • 财政年份:
    2011
  • 资助金额:
    $ 35.5万
  • 项目类别:
Alignment Software for Second-Generation Sequencing
用于第二代测序的比对软件
  • 批准号:
    8464182
  • 财政年份:
    2011
  • 资助金额:
    $ 35.5万
  • 项目类别:

相似海外基金

How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
  • 批准号:
    BB/Z514391/1
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Training Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
  • 批准号:
    2312555
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Standard Grant
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
  • 批准号:
    2327346
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Standard Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
  • 批准号:
    ES/Z502595/1
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Fellowship
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
  • 批准号:
    23K24936
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
  • 批准号:
    ES/Z000149/1
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Research Grant
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
  • 批准号:
    2901648
  • 财政年份:
    2024
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Studentship
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
  • 批准号:
    488039
  • 财政年份:
    2023
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Operating Grants
New Tendencies of French Film Theory: Representation, Body, Affect
法国电影理论新动向:再现、身体、情感
  • 批准号:
    23K00129
  • 财政年份:
    2023
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The Protruding Void: Mystical Affect in Samuel Beckett's Prose
突出的虚空:塞缪尔·贝克特散文中的神秘影响
  • 批准号:
    2883985
  • 财政年份:
    2023
  • 资助金额:
    $ 35.5万
  • 项目类别:
    Studentship
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了