Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
基本信息
- 批准号:8020878
- 负责人:
- 金额:$ 39.48万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-09-27 至 2013-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsBlast CellChimerismClinicalComputer softwareComputersComputing MethodologiesConsensusConserved SequenceDNADataData AnalysesData SetGenesGenomeGoalsHealthHomologous GeneHourHumanHuman MicrobiomeImageryIndividualInformaticsInternetMapsMetagenomicsMethodsModelingMorphologic artifactsPerformanceProcessProtocols documentationReadingRecruitment ActivityResearch PersonnelResourcesRibosomal RNARunningSamplingSolidSpeedStatistical MethodsTechnologyTestingTimeVariantWorkbaseexhaustgene discoveryheuristicsimprovedmetagenomic sequencingmicrobiomenew technologynext generationnovelopen sourceprogramspublic health relevancetool
项目摘要
DESCRIPTION (provided by applicant): The human microbiota is thought to have profound influence on human health. The goal of the Human Microbiome Project (HMP) is to expand our understanding in human microbiome by generating reference microbiome genomes, identifying "core" genomes, studying their variation related to human health, and developing new technologies and informatics tools. Huge amounts of sequences in HMP have been generated utilizing metagenomics and next-generation sequencing technologies. It is becoming very challenging for existing resources and methods to manage and analyze the HMP data. The challenges are not only imposed by the huge volume but also by the great diversity and complexity of sequence data. To address these challenges, we propose several new computational methods to rapidly and effectively analyze very large HMP datasets. (1) Consensus-based meta-assembler and pre-assembly processing. It is to significantly improve the assembly of metagenomic sequences. Instead of developing another assembly program, we will build a meta-assembler on top of available assemblers. We will also develop a pre-assembly protocol to filter and handle extra redundant and problematic sequences. (2) Fast fragment recruitment and large-scale clustering. We plan to develop a fast program to align raw metagenomic reads to reference or homolog genomes. It is to fill the gaps between very fast but very stringent mapping programs (e.g. Bowtie), very slow but very sensitive aligning programs (e.g. BLAST), and fast but less sensitive ones (e.g. BLAT). We also plan to enable our clustering program CD-HIT to handle really large next-generation sequences. (3) Dedicated utilities for annotation and comparison of metagenomes. In recent year, we developed a HMM-based method for identification of rRNAs from raw reads, a fast method to identify artificial 454 duplicates, an automated workflow for metagenome annotation, a rapid and reliable reciprocal sequence comparing protocol, and a statistical method to compare many metagenomes with a unique visualization interface. We plan to improve these metagenomics- specific tools to achieve much better speed, performance and capability. The methods will be available as open source software, as web servers or both. We have obtained very promising preliminary results. The proposed tools will effectively help researchers in HMP data analysis. Other HMP related informatics tools in gene prediction, binning and assembly will greatly benefit from our proposed works.
PUBLIC HEALTH RELEVANCE: The large amount of sequence data from the Human Microbiome Project (HMP) creates great challenges in data analysis. This proposal aims at addressing these challenges by developing novel and effective computational methods in metagenome assembly, annotation and comparison. The proposed methods will help researchers in preliminary data analysis, annotation, clinical sample comparison, novel gene discovery and other analysis in a very rapid way.
描述(由申请人提供):人类微生物群被认为对人类健康具有深远的影响。人类微生物组计划(HMP)的目标是通过生成参考微生物组基因组,识别“核心”基因组,研究其与人类健康相关的变异以及开发新技术和信息学工具来扩大我们对人类微生物组的理解。 HMP中的大量序列已经利用宏基因组学和下一代测序技术产生。这是非常具有挑战性的现有资源和方法来管理和分析HMP数据。这些挑战不仅来自于巨大的数据量,还来自于序列数据的多样性和复杂性。为了应对这些挑战,我们提出了几种新的计算方法来快速有效地分析非常大的HMP数据集。 (1)基于汇编语言的元汇编程序和预汇编处理。它将显著改善宏基因组序列的组装。我们将在可用的汇编器之上构建一个元汇编器,而不是开发另一个汇编程序。我们还将开发一个预组装协议来过滤和处理额外的冗余和有问题的序列。 (2)快速片段募集和大规模聚类。我们计划开发一个快速程序,将原始宏基因组读数与参考或同源基因组进行比对。它填补了非常快但非常严格的定位程序(例如Bowtie),非常慢但非常敏感的比对程序(例如BLAST)和快速但不太敏感的程序(例如BLAT)之间的空白。我们还计划使我们的聚类程序CD-HIT能够处理真正大的下一代序列。 (3)用于宏基因组注释和比较的专用工具。近年来,我们开发了一种基于HMM的方法,用于从原始读数中识别rRNA,一种快速识别人工454重复的方法,一种用于宏基因组注释的自动化工作流程,一种快速可靠的相互序列比较协议,以及一种统计方法,用于比较具有独特可视化界面的许多宏基因组。我们计划改进这些宏基因组学专用工具,以实现更好的速度、性能和功能。 这些方法将作为开源软件、网络服务器或两者兼而有之。我们已经取得了非常有希望的初步结果。所提出的工具将有效地帮助研究人员在HMP数据分析。其他HMP相关的信息学工具在基因预测,分箱和组装将大大受益于我们提出的工作。
公共卫生相关性:来自人类微生物组计划(HMP)的大量序列数据给数据分析带来了巨大挑战。该提案旨在通过开发新的和有效的计算方法在宏基因组组装,注释和比较来解决这些挑战。这些方法将有助于研究人员以非常快速的方式进行初步数据分析、注释、临床样本比较、新基因发现和其他分析。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Weizhong Li其他文献
Weizhong Li的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Weizhong Li', 18)}}的其他基金
A study of antibiotics usage on early gut microbiome colonization and establishment in young children
抗生素使用对幼儿早期肠道微生物定植和建立的研究
- 批准号:
10113538 - 财政年份:2020
- 资助金额:
$ 39.48万 - 项目类别:
Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
- 批准号:
8294893 - 财政年份:2010
- 资助金额:
$ 39.48万 - 项目类别:
Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
- 批准号:
8150493 - 财政年份:2010
- 资助金额:
$ 39.48万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7892867 - 财政年份:2009
- 资助金额:
$ 39.48万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7495498 - 财政年份:2008
- 资助金额:
$ 39.48万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7682840 - 财政年份:2008
- 资助金额:
$ 39.48万 - 项目类别:
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 39.48万 - 项目类别:
Continuing Grant