Data Discovery: Computational Methods for Searching Short-Read Sequencing Experiments
数据发现:搜索短读测序实验的计算方法
基本信息
- 批准号:9287168
- 负责人:
- 金额:$ 28.43万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-05-01 至 2021-04-30
- 项目状态:已结题
- 来源:
- 关键词:AlgorithmsArchivesAreaBasic ScienceBiologicalCellsCodeCollectionComplexComputing MethodologiesDNA Sequencing FacilityDarknessDataData DiscoveryData SetDatabasesDisease ProgressionDistributed SystemsElementsEnvironmentExhibitsExonsFamilyFoundationsGenerationsGenesGenetic VariationGenomicsGoalsHealthcareHospitalsHuman MicrobiomeIndividualInvestigationMalignant NeoplasmsMetadataMetagenomicsMethodsMicrobeMutationOrganismPathway interactionsPharmacologic SubstancePrivatizationProtein IsoformsReproducibilityResearchResearch PersonnelResourcesSamplingSchemeSilicon DioxideSomatic MutationSourceSpeedSystemTechniquesTechnologyTestingThe Cancer Genome AtlasTimeTreesUnited States National Institutes of HealthVariantVisionWorkbasecell typedata sharingexperimental studyfusion genegene functiongenetic variantgenome sequencingimprovedindexinginsertion/deletion mutationmicrobial communitynovelnovel strategiesopen sourcepetabyterepositorytranscriptome sequencingtranscriptomicstumorwhole genome
项目摘要
PROJECT SUMMARY / ABSTRACT
This proposal aims to solve the sequencing experiment discovery problem. The data from hundreds of thou-
sands of short-read sequencing experiments are now publicly available, and private collections of sequencing
experiments are also growing rapidly. These experiments include hundreds of thousands of whole genome
sequencing experiments, and tens of thousands of RNA-seq, metagenomic, and tumor sequencing samples.
However, these experiments are vastly underused, with few analyses making use of more than a handful of ex-
periments at a time and most analyses ignoring this collection of raw data entirely. One crucial reason for this is
that merely finding the appropriate experiments is a significant barrier to their use in downstream analyses. This
is due to the lack of a computational platform that can search for relevant short-read sequencing data sets by the
sequences they contain. It is not currently possible to find all the metagenomic experiments in which the genes
that form a particular pathway are present or to find all experiments in which a novel lncRNA is observed. The
experiment discovery problem is that of finding — on a global scale — those experiments that are relevant to an
isoform, variant, or species under study. By building on our existing work in large-scale sequence search, we
propose to develop a new distributed platform to index and search hundreds of thousands of raw short-read se-
quencing data sets to enable researchers to quickly find experiments that contain their query sequences. We will
apply this system to searching RNA-seq, metagenomic, and cancer tumor samples. The research questions
we will solve include how to improve the computational scaling, increase the types of biologically meaningful
queries that can be answered, and increase our ability to find relevant experiments in situations where muta-
tions are common. We will produce a high-quality open-source implementation of the developed computational
methods. The project will significantly expand the usefulness of large repositories of raw sequencing reads and
enabled new approaches for large-scale reanalysis and reuse of short-read experiments. The system will unlock
a rich source of biological information for gene function prediction, for understanding microbial communities, and
for connecting genetic variation with disease progression.
项目摘要/摘要
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Carleton Lee Kingsford其他文献
Carleton Lee Kingsford的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Carleton Lee Kingsford', 18)}}的其他基金
Improved genomic sketching for MUMmer and metagenomics
改进了 MUMmer 和宏基因组的基因组草图
- 批准号:
10453031 - 财政年份:2022
- 资助金额:
$ 28.43万 - 项目类别:
Improved genomic sketching for MUMmer and metagenomics
改进了 MUMmer 和宏基因组的基因组草图
- 批准号:
10670162 - 财政年份:2022
- 资助金额:
$ 28.43万 - 项目类别:
Data Discovery: Computational Methods for Searching Short-Read Sequencing Experiments - Administrative Supplement
数据发现:搜索短读测序实验的计算方法 - 行政补充
- 批准号:
10393953 - 财政年份:2017
- 资助金额:
$ 28.43万 - 项目类别:
Algorithms for Managing Uncertainty in Chromosome Conformation Capture Data
管理染色体构象捕获数据不确定性的算法
- 批准号:
8739540 - 财政年份:2013
- 资助金额:
$ 28.43万 - 项目类别:
Algorithms for Managing Uncertainty in Chromosome Conformation Capture Data
管理染色体构象捕获数据不确定性的算法
- 批准号:
8579049 - 财政年份:2013
- 资助金额:
$ 28.43万 - 项目类别:
Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly
快速 k-mer 计数可量化基因表达并改善基因组组装
- 批准号:
8642468 - 财政年份:2012
- 资助金额:
$ 28.43万 - 项目类别:
Fast k-mer Counting to Quantify Gene Expression and Improve Genome Assembly
快速 k-mer 计数可量化基因表达并改善基因组组装
- 批准号:
8518438 - 财政年份:2012
- 资助金额:
$ 28.43万 - 项目类别:
Accurate Computational Detection of Influenza Reassortments
流感重组的准确计算检测
- 批准号:
8072578 - 财政年份:2010
- 资助金额:
$ 28.43万 - 项目类别:
Accurate Computational Detection of Influenza Reassortments
流感重组的准确计算检测
- 批准号:
7772829 - 财政年份:2010
- 资助金额:
$ 28.43万 - 项目类别:
相似海外基金
Sediment Drilling Facility for environmental and genetic archives
环境和遗传档案沉积物钻探设施
- 批准号:
LE240100064 - 财政年份:2024
- 资助金额:
$ 28.43万 - 项目类别:
Linkage Infrastructure, Equipment and Facilities
Aerial Archives of Race and American-Occupied Japan
种族和美国占领的日本的航空档案
- 批准号:
24K03721 - 财政年份:2024
- 资助金额:
$ 28.43万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
CAREER: Understanding biosphere-geosphere coevolution through carbonate-associated phosphate, community archives, and open-access education in rural schools
职业:通过碳酸盐相关磷酸盐、社区档案和农村学校的开放教育了解生物圈-地圈协同进化
- 批准号:
2338055 - 财政年份:2024
- 资助金额:
$ 28.43万 - 项目类别:
Continuing Grant
Designing a Bridging Model Using Learning Content Information LOD to Link School Education and Digital Archives
使用学习内容信息 LOD 设计桥接模型来链接学校教育和数字档案
- 批准号:
23H03695 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Doris Lessing's Archives: Communism, Decolonisation and Literary Practice
多丽丝·莱辛档案:共产主义、非殖民化和文学实践
- 批准号:
2888789 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Studentship
Integrated High-Definition Visualization of Digital Archives for Borobudur Temple
婆罗浮屠寺数字档案集成高清可视化
- 批准号:
22KJ3026 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Research on multilingual data integration for digital archives of Japanese culture
日本文化数字档案多语言数据集成研究
- 批准号:
23K11780 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Building a sustainable future for anthropology's archives: Researching primary source data lifecycles, infrastructures, and reuse
为人类学档案构建可持续的未来:研究主要源数据生命周期、基础设施和重用
- 批准号:
2314762 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Standard Grant
A Preliminary Study for Constructing International Network of Image Archives on Afghan Cultural Heritages
构建阿富汗文化遗产国际图像档案网络的初步研究
- 批准号:
23K00915 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Reading Writing Lives: Publishing & Preserving Australian Literary Archives
阅读写作生活:出版
- 批准号:
DP230101797 - 财政年份:2023
- 资助金额:
$ 28.43万 - 项目类别:
Discovery Projects