Phylogenetic and computational methods for accurate and efficient analyses of large-scale metagenomics datasets
用于准确有效分析大规模宏基因组数据集的系统发育和计算方法
基本信息
- 批准号:10350895
- 负责人:
- 金额:$ 9.05万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:2019-nCoVAlgorithmsAreaBayesian AnalysisBayesian MethodBiodiversityBioinformaticsBotanyCOVID-19COVID-19 detectionCOVID-19 monitoringCOVID-19 surveillanceClassificationCommunitiesComplex MixturesComputer softwareComputing MethodologiesCoronavirusCustomDNADataData SetData Storage and RetrievalDatabasesDevelopmentDiseaseDisease OutbreaksEarly DiagnosisEcosystemEnvironmentEnvironmental MonitoringEyeForensic MedicineGenetic MaterialsGenomeGenomicsGenotypeGoalsGrowthHuman MicrobiomeIndividualLifeLocationMetagenomicsMethodsMicrobeMicroscopicModelingMonitorNucleotidesOrganismPathogen detectionPerformancePhylogenetic AnalysisPhylogenyPopulation SizesPostdoctoral FellowProceduresPropertyPublic HealthPublishingReproducibilityResearchResearch DesignResearch PersonnelResearch ProposalsSamplingSoilSpeedSystems BiologyTaxonomyTerrorismTimeTreesUpdateValidationVariantViralVirusWaterZoologybasecombatcomputer scienceexpectationexperimental studyin silicoinsightmicrobiome analysisnext generation sequencingnovelopen sourcepandemic diseaseperformance testspublic repositorysimulationstatisticssurveillance datatooltransmission processwastewater monitoringwastewater sampleswastewater surveillanceweb portalwhole genome
项目摘要
Project Summary/Abstract
The overall goal of this project is to use approaches from statistics and computer science to solve significant chal-
lenges in the analysis of metabarcode and metagenomics data. Metagenomics, the study of combined genomes
of organisms present in a single community, is an emerging highly interdisciplinary field that combines genomics,
bioinformatics, systems biology, among other areas. Metagenomics has many applications to public health es-
pecially in the areas of pathogen detection, human microbiome analysis, and biodiversity monitoring. The larger
objective of this proposal is to leverage the use of the open source software, tronko, a fast approximate likelihood
phylogenetic placement method that I developed for taxonomic classification, which is the first phylogenetic place-
ment method that truly enables the use of large-scale reference databases and next generation sequencing data
desired as queries. Tronko will be used to solve fundamental problems in analyses of metabarcode and metage-
nomic data in addition to developing an application to analyses of severe acute respiratory syndrome coronavirus
2 (SARS-CoV-2) sequences that will greatly enhance the utility of environmental monitoring of SARS-CoV-2. The
specific aims of this proposal are to (1) solve an important theoretical problem by applying a rigorous species
delineation to assignment, (2) to apply tronko to solve an important practical problem of estimating the compo-
sition of SARS-CoV-2 lineages in wastewater surveillance samples, and (3) to develop a rapid custom reference
database builder for analyzing metabarcode and metagenomics data. For Aim 1, different phylogenetic groups
have different variability in different parts of the tree, therefore, I plan to use Bayesian methods to estimate effec-
tive population sizes locally to establish appropriate cut-off thresholds for species assignments in different parts
of the phylogeny. Current methods use arbitrary thresholds for delineation of taxonomic groups and this method
would provide an elegant solution to a long-standing limitation in species classification. For Aim 2, SARS-CoV-2
monitoring of wastewater is an effective strategy for early detection of outbreaks. I plan to build a pipeline, and
subsequently a web portal for researchers, that uses tronko to first detect the virus within a wastewater sample
then subsequently uses an expectation-maximization algorithm to estimate the proportions of viral strains. This
aim would greatly aid public health researchers in assessing and managing the pandemic since no established
methods are currently available for this type of analysis. For Aim 3, current custom reference database builders
require weeks if not months of consecutive computational time in addition to access to a large amount of data
storage. I propose to build a method which can be completed within a day. The method will perform in silico
amplification of primers and subsequently use the amplified fragments in a kmer-based approach for identifying
relevant sequences within a nucleotide database with utilization both across a network connection and a local
database. Execution of these aims will solve important theoretical, practical, and computational problems in the
field of metagenomics.
项目摘要/摘要
这个项目的总体目标是使用统计学和计算机科学的方法来解决fi不能改变的问题。
在元代码和元基因组数据的分析中花费很长时间。元基因组学,对组合基因组的研究
是一种新兴的高度跨学科的fi领域,它结合了基因组学,
生物信息学、系统生物学等领域。元基因组学在公共卫生领域有许多应用--
特别是在病原体检测、人类微生物组分析和生物多样性监测领域。越大
这项提议的目的是利用开源软件tronko,快速近似地
我开发的分类学Classifi阳离子的系统发育放置方法,这是fi第一个系统发育的地方-
一种真正能够使用大规模参考数据库和下一代测序数据的方法
所需的AS查询。Tronko将用于解决元代码和Metage分析中的基本问题-
除了开发应用于严重急性呼吸综合征冠状病毒分析之外的基因组数据
2(SARS-CoV-2)序列,将大大提高SARS-CoV-2在环境监测中的实用性。这个
这项建议的目的是(1)通过应用严格的物种来解决一个重要的理论问题。fi
(2)应用tronko方法解决一个重要的实际问题--成分估算问题。
污水监测样本中SARS-CoV-2谱系的定位,以及(3)建立快速定制参考物
用于分析元代码和元基因组数据的数据库构建器。对于目标1,不同的系统发育组
在树的不同部分有不同的变异性,因此,我计划使用贝叶斯方法来估计效果。
为不同地区的物种分配设定适当的截止门槛
在系统发展史上。当前的方法使用任意阈值来描述分类组,并且该方法
将为物种类别fi阳离子的长期限制提供一个优雅的解决方案。对于目标2,SARS-CoV-2
监测废水是及早发现疫情的有效战略。我计划修建一条管道,然后
随后,为研究人员提供了一个门户网站,它使用tronko来fi首先在废水样本中检测病毒。
然后使用期望最大化算法来估计病毒株的比例。这
AIM将极大地帮助公共卫生研究人员评估和管理大流行,因为没有建立
目前有多种方法可用于这类分析。对于AIM 3,当前的定制参考数据库构建器
除了访问大量数据外,还需要数周(如果不是数月)的连续计算时间
储藏室。我建议建立一种可以在一天内完成的方法。该方法将在硅胶中执行
对引物进行fi扩增,然后在基于Kmer的方法中使用扩增的fi片段进行鉴定
核苷酸数据库中具有跨网络连接和本地连接的利用的相关序列
数据库。这些目标的实现将解决重要的理论、实践和计算问题
元基因组学的fiAeld。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lenore Pipes其他文献
Lenore Pipes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lenore Pipes', 18)}}的其他基金
Phylogenetic and computational methods for accurate and efficient analyses of large-scale metagenomics datasets
用于准确有效分析大规模宏基因组数据集的系统发育和计算方法
- 批准号:
10542443 - 财政年份:2022
- 资助金额:
$ 9.05万 - 项目类别:
相似海外基金
Approximate algorithms and architectures for area efficient system design
区域高效系统设计的近似算法和架构
- 批准号:
LP170100311 - 财政年份:2018
- 资助金额:
$ 9.05万 - 项目类别:
Linkage Projects
AMPS: Rank Minimization Algorithms for Wide-Area Phasor Measurement Data Processing
AMPS:用于广域相量测量数据处理的秩最小化算法
- 批准号:
1736326 - 财政年份:2017
- 资助金额:
$ 9.05万 - 项目类别:
Standard Grant
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2017
- 资助金额:
$ 9.05万 - 项目类别:
Discovery Grants Program - Individual
Rigorous simulation of speckle fields caused by large area rough surfaces using fast algorithms based on higher order boundary element methods
使用基于高阶边界元方法的快速算法对大面积粗糙表面引起的散斑场进行严格模拟
- 批准号:
375876714 - 财政年份:2017
- 资助金额:
$ 9.05万 - 项目类别:
Research Grants
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2016
- 资助金额:
$ 9.05万 - 项目类别:
Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2015
- 资助金额:
$ 9.05万 - 项目类别:
Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2014
- 资助金额:
$ 9.05万 - 项目类别:
Discovery Grants Program - Individual
AREA: Optimizing gene expression with mRNA free energy modeling and algorithms
区域:利用 mRNA 自由能建模和算法优化基因表达
- 批准号:
8689532 - 财政年份:2014
- 资助金额:
$ 9.05万 - 项目类别:
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Monitoring of Power Systems
CPS:协同:协作研究:用于电力系统广域监控的分布式异步算法和软件系统
- 批准号:
1329780 - 财政年份:2013
- 资助金额:
$ 9.05万 - 项目类别:
Standard Grant
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Mentoring of Power Systems
CPS:协同:协作研究:用于电力系统广域指导的分布式异步算法和软件系统
- 批准号:
1329745 - 财政年份:2013
- 资助金额:
$ 9.05万 - 项目类别:
Standard Grant