Phylogenetic and computational methods for accurate and efficient analyses of large-scale metagenomics datasets
用于准确有效分析大规模宏基因组数据集的系统发育和计算方法
基本信息
- 批准号:10542443
- 负责人:
- 金额:$ 10.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-01-01 至 2024-12-31
- 项目状态:已结题
- 来源:
- 关键词:2019-nCoVAlgorithmsAreaBayesian AnalysisBayesian MethodBiodiversityBioinformaticsBotanyCOVID-19COVID-19 detectionCOVID-19 monitoringCOVID-19 surveillanceClassificationCommunitiesComplex MixturesComputer softwareComputing MethodologiesCoronavirusCustomDNADataData SetData Storage and RetrievalDatabasesDevelopmentDiseaseDisease OutbreaksEarly DiagnosisEcosystemEnvironmentEnvironmental MonitoringEyeForensic MedicineGenetic MaterialsGenomeGenomicsGenotypeGoalsGrowthHuman MicrobiomeIndividualLifeLocationMetagenomicsMethodsMicrobeMicroscopicModelingMonitorNucleotidesOrganismPathogen detectionPerformancePhylogenetic AnalysisPhylogenyPopulation SizesPostdoctoral FellowProceduresPropertyPublic HealthPublishingReproducibilityResearchResearch DesignResearch PersonnelResearch ProposalsSamplingSoilSpeedSystems BiologyTaxonomyTerrorismTimeTreesUpdateValidationVariantViralVirusWaterZoologycombatcomputer scienceexpectationexperimental studyin silicoinsightmicrobiome analysisnext generation sequencingnovelopen sourcepandemic diseaseperformance testspublic repositorysimulationstatisticssurveillance datatooltransmission processviral detectionwastewater monitoringwastewater sampleswastewater surveillanceweb portalwhole genome
项目摘要
Project Summary/Abstract
The overall goal of this project is to use approaches from statistics and computer science to solve significant chal-
lenges in the analysis of metabarcode and metagenomics data. Metagenomics, the study of combined genomes
of organisms present in a single community, is an emerging highly interdisciplinary field that combines genomics,
bioinformatics, systems biology, among other areas. Metagenomics has many applications to public health es-
pecially in the areas of pathogen detection, human microbiome analysis, and biodiversity monitoring. The larger
objective of this proposal is to leverage the use of the open source software, tronko, a fast approximate likelihood
phylogenetic placement method that I developed for taxonomic classification, which is the first phylogenetic place-
ment method that truly enables the use of large-scale reference databases and next generation sequencing data
desired as queries. Tronko will be used to solve fundamental problems in analyses of metabarcode and metage-
nomic data in addition to developing an application to analyses of severe acute respiratory syndrome coronavirus
2 (SARS-CoV-2) sequences that will greatly enhance the utility of environmental monitoring of SARS-CoV-2. The
specific aims of this proposal are to (1) solve an important theoretical problem by applying a rigorous species
delineation to assignment, (2) to apply tronko to solve an important practical problem of estimating the compo-
sition of SARS-CoV-2 lineages in wastewater surveillance samples, and (3) to develop a rapid custom reference
database builder for analyzing metabarcode and metagenomics data. For Aim 1, different phylogenetic groups
have different variability in different parts of the tree, therefore, I plan to use Bayesian methods to estimate effec-
tive population sizes locally to establish appropriate cut-off thresholds for species assignments in different parts
of the phylogeny. Current methods use arbitrary thresholds for delineation of taxonomic groups and this method
would provide an elegant solution to a long-standing limitation in species classification. For Aim 2, SARS-CoV-2
monitoring of wastewater is an effective strategy for early detection of outbreaks. I plan to build a pipeline, and
subsequently a web portal for researchers, that uses tronko to first detect the virus within a wastewater sample
then subsequently uses an expectation-maximization algorithm to estimate the proportions of viral strains. This
aim would greatly aid public health researchers in assessing and managing the pandemic since no established
methods are currently available for this type of analysis. For Aim 3, current custom reference database builders
require weeks if not months of consecutive computational time in addition to access to a large amount of data
storage. I propose to build a method which can be completed within a day. The method will perform in silico
amplification of primers and subsequently use the amplified fragments in a kmer-based approach for identifying
relevant sequences within a nucleotide database with utilization both across a network connection and a local
database. Execution of these aims will solve important theoretical, practical, and computational problems in the
field of metagenomics.
项目总结/摘要
该项目的总体目标是使用统计和计算机科学的方法来解决重大挑战,
在元条形码和宏基因组学数据分析方面的优势。宏基因组学,研究组合基因组
生物存在于一个单一的社区,是一个新兴的高度跨学科的领域,结合基因组学,
生物信息学、系统生物学等领域。宏基因组学在公共卫生领域有许多应用,
特别是在病原体检测、人体微生物组分析和生物多样性监测领域。越大
本提案的目的是利用开放源码软件tronko,
我为分类学分类开发的系统发生位置方法,这是第一个系统发生位置-
真正能够使用大规模参考数据库和下一代测序数据的分析方法
如查询所需。Tronko将用于解决元条形码和metage分析中的基本问题-
除了开发用于分析严重急性呼吸系统综合征冠状病毒的应用程序外,
2(SARS-CoV-2)序列,这将大大提高SARS-CoV-2的环境监测的效用。的
这个建议的具体目标是(1)通过应用严格的物种来解决一个重要的理论问题。
(2)应用tronko解决了一个重要的实际问题-
污水监测样品中SARS-CoV-2谱系的定位,以及(3)开发快速定制参考
用于分析元条形码和元基因组学数据的数据库构建器。对于目标1,不同的系统发育组
在树的不同部分有不同的变异性,因此,我计划使用贝叶斯方法来估计效果。
在当地确定适当的种群规模,为不同地区的物种分配确定适当的截止阈值
关于The Genesis目前的方法使用任意阈值来划分分类组,
将为物种分类的长期局限性提供一个优雅的解决方案。对于目标2,SARS-CoV-2
监测废水是及早发现疾病爆发的有效战略。我计划建一条输油管,
随后为研究人员建立了一个门户网站,使用tronko首先检测废水样本中的病毒
然后随后使用期望最大化算法来估计病毒株的比例。这
aim将极大地帮助公共卫生研究人员评估和管理流行病,因为没有建立
目前已有用于这类分析的方法。对于Aim 3,当前的自定义参考数据库构建器
除了访问大量数据之外,还需要数周(如果不是数月)的连续计算时间
存储.我建议建立一种方法,可以在一天内完成。该方法将在计算机模拟中执行
扩增引物,随后在基于kmer的方法中使用扩增的艾德片段来鉴定
通过跨网络连接和本地连接两者利用核苷酸数据库内的相关序列
数据库这些目标的执行将解决重要的理论,实践和计算问题,
Metagenomics的研究领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lenore Pipes其他文献
Lenore Pipes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lenore Pipes', 18)}}的其他基金
Phylogenetic and computational methods for accurate and efficient analyses of large-scale metagenomics datasets
用于准确有效分析大规模宏基因组数据集的系统发育和计算方法
- 批准号:
10350895 - 财政年份:2022
- 资助金额:
$ 10.8万 - 项目类别:
相似海外基金
Approximate algorithms and architectures for area efficient system design
区域高效系统设计的近似算法和架构
- 批准号:
LP170100311 - 财政年份:2018
- 资助金额:
$ 10.8万 - 项目类别:
Linkage Projects
AMPS: Rank Minimization Algorithms for Wide-Area Phasor Measurement Data Processing
AMPS:用于广域相量测量数据处理的秩最小化算法
- 批准号:
1736326 - 财政年份:2017
- 资助金额:
$ 10.8万 - 项目类别:
Standard Grant
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2017
- 资助金额:
$ 10.8万 - 项目类别:
Discovery Grants Program - Individual
Rigorous simulation of speckle fields caused by large area rough surfaces using fast algorithms based on higher order boundary element methods
使用基于高阶边界元方法的快速算法对大面积粗糙表面引起的散斑场进行严格模拟
- 批准号:
375876714 - 财政年份:2017
- 资助金额:
$ 10.8万 - 项目类别:
Research Grants
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2016
- 资助金额:
$ 10.8万 - 项目类别:
Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2015
- 资助金额:
$ 10.8万 - 项目类别:
Discovery Grants Program - Individual
Low Power, Area Efficient, High Speed Algorithms and Architectures for Computer Arithmetic, Pattern Recognition and Cryptosystems
用于计算机算术、模式识别和密码系统的低功耗、面积高效、高速算法和架构
- 批准号:
1686-2013 - 财政年份:2014
- 资助金额:
$ 10.8万 - 项目类别:
Discovery Grants Program - Individual
AREA: Optimizing gene expression with mRNA free energy modeling and algorithms
区域:利用 mRNA 自由能建模和算法优化基因表达
- 批准号:
8689532 - 财政年份:2014
- 资助金额:
$ 10.8万 - 项目类别:
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Monitoring of Power Systems
CPS:协同:协作研究:用于电力系统广域监控的分布式异步算法和软件系统
- 批准号:
1329780 - 财政年份:2013
- 资助金额:
$ 10.8万 - 项目类别:
Standard Grant
CPS: Synergy: Collaborative Research: Distributed Asynchronous Algorithms and Software Systems for Wide-Area Mentoring of Power Systems
CPS:协同:协作研究:用于电力系统广域指导的分布式异步算法和软件系统
- 批准号:
1329745 - 财政年份:2013
- 资助金额:
$ 10.8万 - 项目类别:
Standard Grant