Phylogenetic and computational methods for accurate and efficient analyses of large-scale metagenomics datasets
用于准确有效分析大规模宏基因组数据集的系统发育和计算方法
基本信息
- 批准号:10542443
- 负责人:
- 金额:$ 10.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-01-01 至 2024-12-31
- 项目状态:已结题
- 来源:
- 关键词:2019-nCoVAlgorithmsAreaBayesian AnalysisBayesian MethodBiodiversityBioinformaticsBotanyCOVID-19COVID-19 detectionCOVID-19 monitoringCOVID-19 surveillanceClassificationCommunitiesComplex MixturesComputer softwareComputing MethodologiesCoronavirusCustomDNADataData SetData Storage and RetrievalDatabasesDevelopmentDiseaseDisease OutbreaksEarly DiagnosisEcosystemEnvironmentEnvironmental MonitoringEyeForensic MedicineGenetic MaterialsGenomeGenomicsGenotypeGoalsGrowthHuman MicrobiomeIndividualLifeLocationMetagenomicsMethodsMicrobeMicroscopicModelingMonitorNucleotidesOrganismPathogen detectionPerformancePhylogenetic AnalysisPhylogenyPopulation SizesPostdoctoral FellowProceduresPropertyPublic HealthPublishingReproducibilityResearchResearch DesignResearch PersonnelResearch ProposalsSamplingSoilSpeedSystems BiologyTaxonomyTerrorismTimeTreesUpdateValidationVariantViralVirusWaterZoologycombatcomputer scienceexpectationexperimental studyin silicoinsightmicrobiome analysisnext generation sequencingnovelopen sourcepandemic diseaseperformance testspublic repositorysimulationstatisticssurveillance datatooltransmission processviral detectionwastewater monitoringwastewater sampleswastewater surveillanceweb portalwhole genome
项目摘要
Project Summary/Abstract
The overall goal of this project is to use approaches from statistics and computer science to solve significant chal-
lenges in the analysis of metabarcode and metagenomics data. Metagenomics, the study of combined genomes
of organisms present in a single community, is an emerging highly interdisciplinary field that combines genomics,
bioinformatics, systems biology, among other areas. Metagenomics has many applications to public health es-
pecially in the areas of pathogen detection, human microbiome analysis, and biodiversity monitoring. The larger
objective of this proposal is to leverage the use of the open source software, tronko, a fast approximate likelihood
phylogenetic placement method that I developed for taxonomic classification, which is the first phylogenetic place-
ment method that truly enables the use of large-scale reference databases and next generation sequencing data
desired as queries. Tronko will be used to solve fundamental problems in analyses of metabarcode and metage-
nomic data in addition to developing an application to analyses of severe acute respiratory syndrome coronavirus
2 (SARS-CoV-2) sequences that will greatly enhance the utility of environmental monitoring of SARS-CoV-2. The
specific aims of this proposal are to (1) solve an important theoretical problem by applying a rigorous species
delineation to assignment, (2) to apply tronko to solve an important practical problem of estimating the compo-
sition of SARS-CoV-2 lineages in wastewater surveillance samples, and (3) to develop a rapid custom reference
database builder for analyzing metabarcode and metagenomics data. For Aim 1, different phylogenetic groups
have different variability in different parts of the tree, therefore, I plan to use Bayesian methods to estimate effec-
tive population sizes locally to establish appropriate cut-off thresholds for species assignments in different parts
of the phylogeny. Current methods use arbitrary thresholds for delineation of taxonomic groups and this method
would provide an elegant solution to a long-standing limitation in species classification. For Aim 2, SARS-CoV-2
monitoring of wastewater is an effective strategy for early detection of outbreaks. I plan to build a pipeline, and
subsequently a web portal for researchers, that uses tronko to first detect the virus within a wastewater sample
then subsequently uses an expectation-maximization algorithm to estimate the proportions of viral strains. This
aim would greatly aid public health researchers in assessing and managing the pandemic since no established
methods are currently available for this type of analysis. For Aim 3, current custom reference database builders
require weeks if not months of consecutive computational time in addition to access to a large amount of data
storage. I propose to build a method which can be completed within a day. The method will perform in silico
amplification of primers and subsequently use the amplified fragments in a kmer-based approach for identifying
relevant sequences within a nucleotide database with utilization both across a network connection and a local
database. Execution of these aims will solve important theoretical, practical, and computational problems in the
field of metagenomics.
项目摘要/摘要
该项目的总体目标是使用统计和计算机科学的方法来解决重要的chal-
分析Metagabarcode和Metagenomics数据的长度。宏基因组学,联合基因组的研究
在一个社区中存在的生物是一个新兴的高度跨学科领域,结合了基因组学,
生物信息学,系统生物学等。宏基因组学在公共卫生中有许多应用
部分是在病原体检测,人类微生物组分析和生物多样性监测的领域。较大
该建议的目的是利用开源软件Tronko的使用,很可能
我为分类学分类而开发的系统发育方法,这是第一个系统发育的地方 -
真正能够使用大规模参考数据库和下一代测序数据的方法方法
希望作为查询。 Tronko将用于解决元法和元模型分析中的基本问题 -
除了开发用于分析严重急性呼吸综合征冠状病毒的应用程序之外
2(SARS-COV-2)序列将大大增强SARS-COV-2环境监测的效用。这
该提案的特定目的是通过应用严格的物种来解决重要的理论问题
分配的描述,(2)应用Tronko来解决一个重要的实际问题,以估算组合
在废水监视样品中的SARS-COV-2谱系中的地点,以及(3)开发快速自定义参考
数据库构建器,用于分析元码和元基因组学数据。对于AIM 1,不同的系统发育组
因此,我计划使用贝叶斯方法来估计效率 -
本地人口规模以建立不同部分的物种分配的适当截止阈值
系统发育。当前方法使用任意阈值来描述分类群,此方法
将为物种分类的长期限制提供优雅的解决方案。对于AIM 2,SARS-COV-2
对废水的监测是早期发现暴发的有效策略。我打算建造一条管道,并
随后,使用Tronko的研究人员的Web门户网站首次检测在废水样品中的病毒
然后随后使用预期最大化算法来估计病毒菌株的比例。这
AIM将极大地帮助公共卫生研究人员评估和管理大流行,因为没有建立
目前可用于此类分析。对于AIM 3,当前自定义参考数据库构建器
除了访问大量数据外,还需要数周的连续计算时间(如果不是数月的连续计算时间)
贮存。我建议建立一种可以在一天内完成的方法。该方法将在计算机中执行
底漆的扩增,然后在基于kmer的方法中使用放大片段来识别
核苷数据库中的相关序列在网络连接和局部都具有利用率
数据库。这些目标的执行将解决重要的理论,实用和计算问题
宏基因组学领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lenore Pipes其他文献
Lenore Pipes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Lenore Pipes', 18)}}的其他基金
Phylogenetic and computational methods for accurate and efficient analyses of large-scale metagenomics datasets
用于准确有效分析大规模宏基因组数据集的系统发育和计算方法
- 批准号:
10350895 - 财政年份:2022
- 资助金额:
$ 10.8万 - 项目类别:
相似国自然基金
无界区域中非局部Klein-Gordon-Schrödinger方程的保结构算法研究
- 批准号:12301508
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
感兴趣区域驱动的主动式采样CT成像算法研究
- 批准号:62301532
- 批准年份:2023
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
面向多区域单元化生产线协同调度问题的自动算法设计研究
- 批准号:62303204
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于深度强化学习的约束多目标群智算法及多区域热电调度应用
- 批准号:62303197
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向二氧化碳封存的高可扩展时空并行区域分解算法及其大规模应用
- 批准号:12371366
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
相似海外基金
De novo design of a generalizable protein biosensor platform for point-of-care testing
用于即时测试的通用蛋白质生物传感器平台的从头设计
- 批准号:
10836196 - 财政年份:2023
- 资助金额:
$ 10.8万 - 项目类别:
Viral Diversity an Innovative Biomarker for Refining Estimates of HIV Incidence
病毒多样性是改进艾滋病毒发病率估计的创新生物标志物
- 批准号:
10676203 - 财政年份:2022
- 资助金额:
$ 10.8万 - 项目类别:
Evolution, transmission, and clinical impacts of SARS-CoV-2 variants among urban and rural populations
城乡人群中 SARS-CoV-2 变种的进化、传播和临床影响
- 批准号:
10535916 - 财政年份:2022
- 资助金额:
$ 10.8万 - 项目类别:
Rapidly Adaptable and Mass-Producible Microscopic Chiplets for Minimally-Instrumented Respiratory Viral Screening
用于微仪器呼吸道病毒筛查的快速适应性和可大规模生产的显微芯片
- 批准号:
10348469 - 财政年份:2022
- 资助金额:
$ 10.8万 - 项目类别:
Identification of Risk Factors for predicting outcomes of COVID-19-Related Multisystem Inflammatory Syndrome in Children (MISC) using Real World Clinical Data
使用真实世界临床数据识别预测 COVID-19 相关儿童多系统炎症综合征 (MISC) 结果的风险因素
- 批准号:
10527735 - 财政年份:2022
- 资助金额:
$ 10.8万 - 项目类别: