Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
基本信息
- 批准号:8150493
- 负责人:
- 金额:$ 36.74万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-09-27 至 2013-06-30
- 项目状态:已结题
- 来源:
- 关键词:AddressAlgorithmsChimerismClinicalComputer softwareComputersComputing MethodologiesConsensusConserved SequenceDNADataData AnalysesData SetGenesGenomeGoalsHealthHomologous GeneHourHumanHuman MicrobiomeImageryIndividualInformaticsInternetMapsMetagenomicsMethodsModelingMorphologic artifactsPerformanceProcessProtocols documentationReadingRecruitment ActivityResearch PersonnelResourcesRibosomal RNARunningSamplingSpeedStatistical MethodsTechnologyTestingTimeVariantWorkbaseexhaustgene discoveryheuristicsimprovedmetagenomic sequencingmicrobiomenew technologynext generationnovelopen sourceprogramspublic health relevancetool
项目摘要
DESCRIPTION (provided by applicant): The human microbiota is thought to have profound influence on human health. The goal of the Human Microbiome Project (HMP) is to expand our understanding in human microbiome by generating reference microbiome genomes, identifying "core" genomes, studying their variation related to human health, and developing new technologies and informatics tools. Huge amounts of sequences in HMP have been generated utilizing metagenomics and next-generation sequencing technologies. It is becoming very challenging for existing resources and methods to manage and analyze the HMP data. The challenges are not only imposed by the huge volume but also by the great diversity and complexity of sequence data. To address these challenges, we propose several new computational methods to rapidly and effectively analyze very large HMP datasets. (1) Consensus-based meta-assembler and pre-assembly processing. It is to significantly improve the assembly of metagenomic sequences. Instead of developing another assembly program, we will build a meta-assembler on top of available assemblers. We will also develop a pre-assembly protocol to filter and handle extra redundant and problematic sequences. (2) Fast fragment recruitment and large-scale clustering. We plan to develop a fast program to align raw metagenomic reads to reference or homolog genomes. It is to fill the gaps between very fast but very stringent mapping programs (e.g. Bowtie), very slow but very sensitive aligning programs (e.g. BLAST), and fast but less sensitive ones (e.g. BLAT). We also plan to enable our clustering program CD-HIT to handle really large next-generation sequences. (3) Dedicated utilities for annotation and comparison of metagenomes. In recent year, we developed a HMM-based method for identification of rRNAs from raw reads, a fast method to identify artificial 454 duplicates, an automated workflow for metagenome annotation, a rapid and reliable reciprocal sequence comparing protocol, and a statistical method to compare many metagenomes with a unique visualization interface. We plan to improve these metagenomics- specific tools to achieve much better speed, performance and capability. The methods will be available as open source software, as web servers or both. We have obtained very promising preliminary results. The proposed tools will effectively help researchers in HMP data analysis. Other HMP related informatics tools in gene prediction, binning and assembly will greatly benefit from our proposed works.
PUBLIC HEALTH RELEVANCE: The large amount of sequence data from the Human Microbiome Project (HMP) creates great challenges in data analysis. This proposal aims at addressing these challenges by developing novel and effective computational methods in metagenome assembly, annotation and comparison. The proposed methods will help researchers in preliminary data analysis, annotation, clinical sample comparison, novel gene discovery and other analysis in a very rapid way.
描述(申请人提供):人类微生物区系被认为对人类健康有深远的影响。人类微生物组计划(HMP)的目标是通过生成参考微生物组基因组、鉴定核心基因组、研究与人类健康相关的变异以及开发新的技术和信息学工具来扩大我们对人类微生物组的理解。利用元基因组学和下一代测序技术已经产生了大量的HMP序列。对于现有的资源和方法来说,管理和分析HMP数据变得非常具有挑战性。这些挑战不仅来自于巨大的数据量,也来自于序列数据的多样性和复杂性。为了应对这些挑战,我们提出了几种新的计算方法来快速有效地分析非常大的HMP数据集。(1)基于共识的元装配和预装配处理。它是为了显著改善元基因组序列的组装。我们将在现有汇编程序的基础上构建一个元汇编程序,而不是开发另一个汇编程序。我们还将开发一种预组装协议来过滤和处理额外的冗余和有问题的序列。(2)快速分片招募和大规模集聚。我们计划开发一个快速程序,将原始元基因组读数与参考或同源基因组进行比对。它是为了填补非常快但非常严格的测绘程序(例如Bowtie)、非常慢但非常敏感的比对程序(例如BLAST)和快速但不太敏感的程序(例如BLAT)之间的空白。我们还计划使我们的集群程序CD-HIT能够处理非常大的下一代序列。(3)元基因组的注释和比较专用工具。近年来,我们开发了一种基于HMM的方法来从原始阅读中识别rRNA,一种快速识别人工454个重复的方法,一种自动的元基因组注释工作流程,一种快速可靠的相互序列比较协议,以及一种通过独特的可视化界面比较多个元基因组的统计方法。我们计划改进这些特定于元基因组学的工具,以实现更好的速度、性能和能力。这些方法将以开源软件、网络服务器或两者兼而有之的形式提供。我们已经取得了非常有希望的初步结果。建议的工具将有效地帮助研究人员进行HMP数据分析。其他与HMP相关的信息学工具在基因预测、绑定和组装方面将从我们提出的工作中受益匪浅。
公共卫生相关性:来自人类微生物组计划(HMP)的大量序列数据给数据分析带来了巨大的挑战。这一建议旨在通过开发新的、有效的计算方法来解决这些挑战,包括元基因组的组装、注释和比较。所提出的方法将帮助研究人员以非常快速的方式进行初步数据分析、注释、临床样本比较、新基因发现和其他分析。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Weizhong Li其他文献
Weizhong Li的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Weizhong Li', 18)}}的其他基金
A study of antibiotics usage on early gut microbiome colonization and establishment in young children
抗生素使用对幼儿早期肠道微生物定植和建立的研究
- 批准号:
10113538 - 财政年份:2020
- 资助金额:
$ 36.74万 - 项目类别:
Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
- 批准号:
8294893 - 财政年份:2010
- 资助金额:
$ 36.74万 - 项目类别:
Novel Methods for Effective Analysis Assembly and Comparison of HMP Sequences
HMP 序列有效分析组装和比较的新方法
- 批准号:
8020878 - 财政年份:2010
- 资助金额:
$ 36.74万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7892867 - 财政年份:2009
- 资助金额:
$ 36.74万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7495498 - 财政年份:2008
- 资助金额:
$ 36.74万 - 项目类别:
CD-HIT: A Fast Program to Cluster and Compare Large Sets of Biological Sequences
CD-HIT:对大量生物序列进行聚类和比较的快速程序
- 批准号:
7682840 - 财政年份:2008
- 资助金额:
$ 36.74万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 36.74万 - 项目类别:
Research Grant