Leveraging k-mer sketching statistics to enhance metagenomic methods and alignment algorithms
利用 k-mer 草图统计来增强宏基因组方法和比对算法
基本信息
- 批准号:10675449
- 负责人:
- 金额:$ 44.35万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-08-02 至 2027-05-31
- 项目状态:未结题
- 来源:
- 关键词:AddressAffectAlgorithmsAntibioticsAreaBioinformaticsBiologicalBiomedical ResearchClustered Regularly Interspaced Short Palindromic RepeatsCommunicationCommunitiesComputational BiologyComputational TechniqueConfidence IntervalsDataData SetDatabasesDevelopmentDimensionsEndowmentEnsureFoundationsFrequenciesFutureGenerationsGenetic MaterialsGoalsMathematicsMeasuresMetagenomicsMethodsModernizationMolecular EvolutionMutateMutationOrganismOutcomeOutputPerformanceProbability TheoryProcessResearch PersonnelSamplingSequence AlignmentSequence Read ArchiveSystemTaxonomyTechniquesTestingTimeUncertaintyVariantWorkcostdriving forceimprovedinnovationinsightmetagenomemicroorganismnovelstatisticstheoriestool
项目摘要
Project Summary
In the face of increasing data sizes, sketching techniques such as MinHash sketching and its winnowed
version have been among the most effective in facilitating scalabile analysis. Frequently though, bioinformatic
algorithms using these techniques do not account for the randomness inherent in both the sketching process
and in the mutation processes that generate the data (e.g. sequencing errors or evolutionary mutations). This
project directly addresses this limitation by laying the statistical foundations for how these sketching
approaches interact with mutation processes and k-mer based techniques, resulting in new algorithms for
important biomedical problems. Aim 1 derives, for the first time, confidence and prediction intervals for
frequently utilized sketching-based bioinformatics quantities that until now existed only as point estimates.To
do so, it relies on sophisticated techniques from probability theory. The mathematical foundations laid by Aim 1
will not only help us achieve the biological aims of this proposal, but will also serve as a basis for quantifying
the performance of future sketching-based bioinformatics algorithms. Aim 2 will then use these results to
develop the first metagenomic taxonomic profiling algorithm that accounts for the uncertainty present when
predicting the presence and relative abundance of microorganisms in a sample. This will resolve a
long-standing issue in this field by providing researchers an informed way to filter their noisy data without
sacrificing sensitivity, thereby facilitating biomedical discoveries (e.g. novel CRISPR systems). In addition, this
aim will result in the first scalable method to quickly estimate the fraction of a metagenomic sample that is not
described by current reference databases, thus illuminating which datasets contain the highest quantity of
novel genetic material and hence possibility for biological discovery (e.g. novel antibiotics). Aim 2 will be
achieved using techniques from compressive sensing as well as probability theory. Aim 3 will both use and
extend the results of Aim 1 to quantifiably improve one of the most fundamental tools in a computational
biologist’s toolkit: sequence alignment. This will equip modern sequence aligners with much needed
significance scores and confidence intervals, as well as allow for the automatic selection of parameter settings
to achieve a desired precision or recall. Due to their ubiquity in biomedical research, even a small improvement
in the accuracy and features of an aligner will have tremendous impact. Aim 3 will be achieved using
techniques from probabilistic algorithms. Finally, the long-term objective of this proposal is to provide
researchers a toolkit that enables the development of scalable k-mer-based sketching algorithms without
sacrificing their ability to quantify statistical significance.
项目摘要
面对日益增长的数据量,MinHash草图及其筛选等草图技术
版本在促进可伸缩分析方面是最有效的。生物信息学通常
使用这些技术的算法并没有考虑到草图绘制过程中固有的随机性
以及产生数据的突变过程(例如测序错误或进化突变)。这
项目直接解决了这一限制,奠定了统计基础,这些素描
方法与突变过程和基于k-mer的技术相互作用,导致新的算法,
重要的生物医学问题。Aim 1首次推导出以下指标的置信区间和预测区间:
经常使用基于草图的生物信息学数量,直到现在才作为点估计存在。
要做到这一点,它依赖于概率论的复杂技术。Aim 1奠定的数学基础
这不仅有助于我们实现这项提议的生物学目标,而且还将作为量化的基础。
未来基于草图的生物信息学算法的性能。目标2将使用这些结果,
开发第一个宏基因组分类学分析算法,该算法考虑了以下不确定性:
预测样品中微生物的存在和相对丰度。这将解决
通过为研究人员提供一种明智的方法来过滤他们的噪声数据,
牺牲灵敏度,从而促进生物医学发现(例如新型CRISPR系统)。另外这款
目的将导致第一个可扩展的方法来快速估计宏基因组样品的分数,
由当前参考数据库描述,从而阐明哪些数据集包含最高数量的
新的遗传物质,因此有可能进行生物学发现(如新的抗生素)。目标2将是
使用来自压缩感知以及概率论的技术来实现。Aim 3将使用和
扩展目标1的结果,以量化地改进计算中最基本的工具之一。
生物学家的工具包:序列比对。这将为现代序列比对器提供急需的
显著性分数和置信区间,以及允许自动选择参数设置
以实现期望的精确度或召回率。由于它们在生物医学研究中的普遍存在,即使是一个小的改进,
在对准器的精度和功能方面将产生巨大的影响。目标3将通过
从概率算法的技术。最后,本提案的长期目标是提供
研究人员开发了一个工具包,可以开发可扩展的基于k-mer的草图绘制算法,
牺牲了他们量化统计显著性的能力。
项目成果
期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Deriving confidence intervals for mutation rates across a wide range of evolutionary distances using FracMinHash.
- DOI:10.1101/gr.277651.123
- 发表时间:2023-07
- 期刊:
- 影响因子:7
- 作者:Rahman Hera, Mahmudur;Pierce-Ward, N Tessa;Koslicki, David
- 通讯作者:Koslicki, David
Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: L 2 UniFrac.
寻找宏基因组样本的系统发育感知和生物学意义的平均值:L 2 UniFrac。
- DOI:10.1101/2023.02.02.526854
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Wei,Wei;Millward,Andrew;Koslicki,David
- 通讯作者:Koslicki,David
Finding phylogeny-aware and biologically meaningful averages of metagenomic samples: L2UniFrac.
- DOI:10.1093/bioinformatics/btad238
- 发表时间:2023-06-30
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Antonio Blanca Pimentel其他文献
Antonio Blanca Pimentel的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
How Does Particle Material Properties Insoluble and Partially Soluble Affect Sensory Perception Of Fat based Products
不溶性和部分可溶的颗粒材料特性如何影响脂肪基产品的感官知觉
- 批准号:
BB/Z514391/1 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Training Grant
BRC-BIO: Establishing Astrangia poculata as a study system to understand how multi-partner symbiotic interactions affect pathogen response in cnidarians
BRC-BIO:建立 Astrangia poculata 作为研究系统,以了解多伙伴共生相互作用如何影响刺胞动物的病原体反应
- 批准号:
2312555 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Standard Grant
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Standard Grant
Graduating in Austerity: Do Welfare Cuts Affect the Career Path of University Students?
紧缩毕业:福利削减会影响大学生的职业道路吗?
- 批准号:
ES/Z502595/1 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Fellowship
感性個人差指標 Affect-X の構築とビスポークAIサービスの基盤確立
建立个人敏感度指数 Affect-X 并为定制人工智能服务奠定基础
- 批准号:
23K24936 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Insecure lives and the policy disconnect: How multiple insecurities affect Levelling Up and what joined-up policy can do to help
不安全的生活和政策脱节:多种不安全因素如何影响升级以及联合政策可以提供哪些帮助
- 批准号:
ES/Z000149/1 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Research Grant
How does metal binding affect the function of proteins targeted by a devastating pathogen of cereal crops?
金属结合如何影响谷类作物毁灭性病原体靶向的蛋白质的功能?
- 批准号:
2901648 - 财政年份:2024
- 资助金额:
$ 44.35万 - 项目类别:
Studentship
Investigating how double-negative T cells affect anti-leukemic and GvHD-inducing activities of conventional T cells
研究双阴性 T 细胞如何影响传统 T 细胞的抗白血病和 GvHD 诱导活性
- 批准号:
488039 - 财政年份:2023
- 资助金额:
$ 44.35万 - 项目类别:
Operating Grants
New Tendencies of French Film Theory: Representation, Body, Affect
法国电影理论新动向:再现、身体、情感
- 批准号:
23K00129 - 财政年份:2023
- 资助金额:
$ 44.35万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
The Protruding Void: Mystical Affect in Samuel Beckett's Prose
突出的虚空:塞缪尔·贝克特散文中的神秘影响
- 批准号:
2883985 - 财政年份:2023
- 资助金额:
$ 44.35万 - 项目类别:
Studentship