The Terabase Search Engine
Terabase 搜索引擎
基本信息
- 批准号:8882493
- 负责人:
- 金额:$ 34.61万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-07-01 至 2017-04-30
- 项目状态:已结题
- 来源:
- 关键词:AccelerationAffectAlgorithmsArchivesChromosome StructuresCodeCommunitiesComplexComputational algorithmComputer softwareComputersCoupledDNA SequenceDNA Sequence DatabasesDataData CompressionData SetDatabasesDepositionDiseaseGalaxyGenesGenomeGoalsHealthHumanHuman GeneticsHuman GenomeInfectious Diseases ResearchInvestigationModelingMolecularMutationPositioning AttributeProcessReaction TimeReadingReal-Time SystemsResearchResearch PersonnelResourcesRetrievalRunningScientistSequence AnalysisServicesSiteSolutionsSorting - Cell MovementSpeedSystemTimeValidationVariantWritingbasecloud baseddesignhuman DNA sequencinghuman diseasehuman genome sequencingindexinginstrumentmicrobialnext generation sequencingnovel strategiesopen sourceprogramstooltraituser-friendlyweb interface
项目摘要
DESCRIPTION (provided by applicant): We propose to create a new system, the Terabase Search Engine that will make it possible for biomedical researchers to search all human DNA sequences that have been sequenced and deposited in public archives. The vast and growing resource of human DNA sequences provides a wealth of opportunities for scientific discovery and for validation of results, but the size of the data sets has already far exceeded the ability o most researchers to use them. For more than two decades, geneticists and geneticists have relied on DNA sequence databases for a wide range of scientific endeavors, including the discovery of new genes and new mutations, the investigation of evolutionary changes within and between species, the forces affecting chromosomal structure and change, and many other molecular and evolutionary processes. The ability to search all known genes and genomes using BLAST and similar programs has long been assumed, and sequence search engines throughout the world provide this ability. However, the raw data pouring out of next-generation sequencing (NGS) projects has exceeded our ability to provide rapid access to it. A single NGS instrument can generate six billion reads encompassing 600 billion bases in a single run, and this capacity is still growing. Traditional alignment programs like BLAST cannot sort through this data in a reasonable amount of time. Newer, faster programs such as Bowtie (developed by our group) allow far faster alignment of NGS reads to the genome, but today the size of the data sets, now in excess of 1 trillion reads, far exceeds the ability of most computers to store it. And
even the fastest alignment programs today could not search all this data in a reasonable amount of time. A new approach is required in order to serve up these huge and hugely valuable DNA sequences to the research community. The Terabase Search Engine will be a new, highly efficient system for searching trillions of bases in real time. Using a hierarchical search strategy with extensive pre-processing to speed up response time, the TSE will allow a scientist to align any sequence, human or non-human, to all publicly-available human sequence reads. Reads that match the human genome will be indexed and stored on very high-speed disks for rapid retrieval. Reads that match microbial sequences will be captured and stored separately for use in micro biome and infectious disease research. The system will be made available through a user-friendly web interface, and a local database will store each user's results for further analysis on the TSE site or for download to a local site. This system will make
it possible, for the first time ever, for any scientist to align a sequence to the complete set of human DNA sequences and to retrieve everything that matches, without the need to write special-purpose programs or to use complex cloud-based software interfaces. All of the software for this project will be developed under an open-source model that will permit others to use, modify, share, and re-distribute the code without restriction.
描述(由申请人提供):我们建议创建一个新的系统,Terabase搜索引擎,这将使生物医学研究人员能够搜索所有已测序并存放在公共档案中的人类DNA序列。人类DNA序列的巨大和不断增长的资源为科学发现和结果验证提供了丰富的机会,但是数据集的规模已经远远超过了大多数研究人员使用它们的能力。二十多年来,遗传学家和遗传学家依靠DNA序列数据库进行广泛的科学研究,包括发现新基因和新突变,研究物种内部和物种之间的进化变化,影响染色体结构和变化的力量,以及许多其他分子和进化过程。使用BLAST和类似程序搜索所有已知基因和基因组的能力早已被假设,全世界的序列搜索引擎都提供了这种能力。然而,从下一代测序(NGS)项目中涌出的原始数据已经超出了我们提供快速访问的能力。一台单一的NGS仪器可以在一次运行中产生60亿次读取,包含6000亿个碱基,而且这种能力还在不断增长。像BLAST这样的传统对齐程序不能在合理的时间内对这些数据进行排序。更新,更快的程序,如Bowtie(由我们的团队开发)可以更快地将NGS读取到基因组,但是今天的数据集的大小,现在超过1万亿的读取,远远超过了大多数计算机存储它的能力。和
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Steven L. Salzberg其他文献
The 15th Genomic Standards Consortium meeting
- DOI:
10.4056/sigs.3457 - 发表时间:
2013-01-01 - 期刊:
- 影响因子:5.400
- 作者:
Lynn Schriml;Ilene Mizrachi;Peter Sterk;Dawn Field;Lynette Hirschman;Tatiana Tatusova;Susanna Sansone;Jack Gilbert;David Schindel;Neil Davies;Chris Meyer;Folker Meyer;George Garrity;Lita Proctor;M. H. Medema;Yemin Lan;Anna Klindworth;Frank Oliver Glöckner;Tonia Korves;Antonia Gonzalez;Peter Dwayndt;Markus Göker;Anjette Johnston;Evangelos Pafilis;Susanne Schneider;K. Baker;Cynthia Parr;G. Sutton;H. H. Creasy;Nikos Kyrpides;K. Eric Wommack;Patricia L. Whetzel;Daniel Nasko;Hilmar Lapp;Takamoto Fujisawa;Adam M. Phillippy;Renzo Kottman;Judith A. Blake;Junhua Li;Elizabeth M. Glass;Petra ten Hoopen;Rob Knight;Susan Holmes;Curtis Huttenhower;Steven L. Salzberg;Bing Ma;Owen White - 通讯作者:
Owen White
C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993
- DOI:
10.1007/bf00993309 - 发表时间:
1994-09-01 - 期刊:
- 影响因子:2.900
- 作者:
Steven L. Salzberg - 通讯作者:
Steven L. Salzberg
Reply to Austin and Korem, “Compositional transformations can reasonably introduce phenotype-associated values into sparse features”
回复奥斯汀和科雷姆,“组合变换可以合理地将与表型相关的值引入稀疏特征”
- DOI:
10.1128/msystems.00248-25 - 发表时间:
2025-04-30 - 期刊:
- 影响因子:4.600
- 作者:
Steven L. Salzberg - 通讯作者:
Steven L. Salzberg
Yeast rises again
酵母再次兴起
- DOI:
10.1038/423233a - 发表时间:
2003-05-15 - 期刊:
- 影响因子:48.500
- 作者:
Steven L. Salzberg - 通讯作者:
Steven L. Salzberg
Q UALITY ASSESSMENT OF SPLICE SITE ANNOTATION BASED ON CONSERVATION ACROSS MULTIPLE SPECIES
基于多物种保护的剪接位点注释质量评估
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Ilia Minkin;Steven L. Salzberg - 通讯作者:
Steven L. Salzberg
Steven L. Salzberg的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Steven L. Salzberg', 18)}}的其他基金
Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness
大脑中综合人类表达序列(CHESS-BRAIN)及其在神经精神疾病中的作用
- 批准号:
10541887 - 财政年份:2021
- 资助金额:
$ 34.61万 - 项目类别:
Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness
大脑中综合人类表达序列(CHESS-BRAIN)及其在神经精神疾病中的作用
- 批准号:
10362615 - 财政年份:2021
- 资助金额:
$ 34.61万 - 项目类别:
Comprehensive Human Expressed Sequences in Brain (CHESS-BRAIN) and their roles in neuropsychiatric illness
大脑中综合人类表达序列(CHESS-BRAIN)及其在神经精神疾病中的作用
- 批准号:
10205617 - 财政年份:2021
- 资助金额:
$ 34.61万 - 项目类别:
Computational Methods for Microbial and Microbiome Sequence Analysis
微生物和微生物组序列分析的计算方法
- 批准号:
10331733 - 财政年份:2019
- 资助金额:
$ 34.61万 - 项目类别:
Computational Methods for Microbial and Microbiome Sequence Analysis
微生物和微生物组序列分析的计算方法
- 批准号:
10550160 - 财政年份:2019
- 资助金额:
$ 34.61万 - 项目类别:
Computational Methods for Microbial and Microbiome Sequence Analysis
微生物和微生物组序列分析的计算方法
- 批准号:
10083744 - 财政年份:2019
- 资助金额:
$ 34.61万 - 项目类别:
Computational Gene Modeling and Genome Sequence Assembly
计算基因建模和基因组序列组装
- 批准号:
8329127 - 财政年份:2011
- 资助金额:
$ 34.61万 - 项目类别:
Alignment Software for Second-Generation Sequencing
用于第二代测序的比对软件
- 批准号:
8068060 - 财政年份:2011
- 资助金额:
$ 34.61万 - 项目类别:
Alignment Software for Second-Generation Sequencing
用于第二代测序的比对软件
- 批准号:
8464182 - 财政年份:2011
- 资助金额:
$ 34.61万 - 项目类别:
相似海外基金
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Standard Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:
2312555 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Standard Grant
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:
BB/Z514391/1 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Training Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:
ES/Z502595/1 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Fellowship
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:
ES/Z000149/1 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Research Grant
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:
23K24936 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:
2901648 - 财政年份:2024
- 资助金额:
$ 34.61万 - 项目类别:
Studentship
ERI: Developing a Trust-supporting Design Framework with Affect for Human-AI Collaboration
ERI:开发一个支持信任的设计框架,影响人类与人工智能的协作
- 批准号:
2301846 - 财政年份:2023
- 资助金额:
$ 34.61万 - 项目类别:
Standard Grant
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:
488039 - 财政年份:2023
- 资助金额:
$ 34.61万 - 项目类别:
Operating Grants
How motor impairments due to neurodegenerative diseases affect masticatory movements
神经退行性疾病引起的运动障碍如何影响咀嚼运动
- 批准号:
23K16076 - 财政年份:2023
- 资助金额:
$ 34.61万 - 项目类别:
Grant-in-Aid for Early-Career Scientists