RII Track-4: NSF: Extracting Pan Genomic Information from Metagenomic Data: Distributed Algorithms and Scalable Software
RII Track-4:NSF:从宏基因组数据中提取泛基因组信息:分布式算法和可扩展软件
基本信息
- 批准号:2327456
- 负责人:
- 金额:$ 29.21万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-01-01 至 2025-12-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
The analysis of metagenomes, i.e., genetic data collected directly from environmental samples, has become integral to various research areas, including climate studies, human health, the discovery of rare earth elements, social and environmental resilience planning, and more. Genetic data collected in this manner typically contains a mixed population of multiple microbial communities. Scientists often aim to extract and represent the genomic diversity information of a particular microbial species from such a mixed population, a process known as pan-genomic information representation. However, there is a shortage of theoretically sound and biologically valid algorithms capable of performing metagenomic or pan-genomic analysis. Furthermore, the vast amount of genetic data generated by high-throughput genome sequencing machines necessitates that these algorithms be scalable and distributed in nature. This project will investigate both the distributed algorithmic aspect and its practical implementation to extract pan-genomic information from large-scale metagenomic datasets. This research aligns with at least six different research areas prioritized by Alaska EPSCoR in their latest Science and Technology Plan, including Community Resilience, Resource Extraction, Food-Energy-Water Nexus, Renewable Resources, Environmental Monitoring, and One Health.This RII Track-4: NSF fellowship will enable an Assistant Professor and a graduate student at the University of Alaska Fairbanks (UAF) to collaborate with scientists at North Carolina State University (NCSU) and utilize their resources. The Principal Investigator (PI) will work alongside experts in the field of bioinformatics and algorithms to develop a set of provably correct, scalable, and distributed algorithms with low time complexity for extracting pan-genomic information from large-scale metagenomic datasets. Additionally, utilizing cutting-edge high-performance computing (HPC) resources at NCSU, the PI aims to create a preliminary version of an HPC-compliant software framework implementing these algorithms. The analytic pipeline comprises four distinct stages: 1) metagenomic error correction, 2) metagenomic assembly, 3) binning and annotation of the assembled genome, and 4) creating the pan-genomic profile of the available microbes. Each of these stages presents algorithmic challenges. The diverse coverages of microbiomes in the metagenomic dataset, coupled with instrumental errors, render the process of identifying the actual species and their genetic diversity exceedingly challenging, necessitating extensive research in string matching and graph analysis. The distributed software implementation must address numerous HPC challenges. The research outcomes, including publications and open-source codebases, will support multiple research activities at UAF, focusing on arctic climate change, arctic marine biology, Alaska Native health, among others. The collaboration facilitated by this fellowship will also lay the foundation for an interdisciplinary Ph.D. program at UAF, encompassing computer science, bioinformatics, and indigenous science concentrations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
宏基因组的分析,即,直接从环境样本中收集的基因数据已成为气候研究、人类健康、稀土元素发现、社会和环境复原力规划等各个研究领域不可或缺的组成部分。以这种方式收集的遗传数据通常包含多个微生物群落的混合种群。科学家们经常试图从这样的混合种群中提取和表示特定微生物物种的基因组多样性信息,这一过程被称为泛基因组信息表示。然而,缺乏能够进行宏基因组或泛基因组分析的理论上合理且生物学上有效的算法。此外,由高通量基因组测序机器产生的大量遗传数据需要这些算法在本质上是可扩展的和分布式的。该项目将研究分布式算法方面及其实际实现,以从大规模宏基因组数据集中提取泛基因组信息。这项研究与阿拉斯加EPSCoR在其最新的科学和技术计划中优先考虑的至少六个不同的研究领域保持一致,包括社区弹性,资源开采,食品-能源-水关系,可再生资源,环境监测和一个健康。NSF奖学金将使一名助理教授和一名研究生在阿拉斯加大学费尔班克斯(UAF)与北卡罗来纳州州立大学(NCSU)的科学家合作,利用他们的资源。主要研究者(PI)将与生物信息学和算法领域的专家合作,开发一套可证明正确,可扩展和分布式的算法,具有低时间复杂度,用于从大规模宏基因组数据集中提取泛基因组信息。此外,利用NCSU的尖端高性能计算(HPC)资源,PI旨在创建一个实现这些算法的HPC兼容软件框架的初步版本。分析管道包括四个不同的阶段:1)宏基因组纠错,2)宏基因组组装,3)组装基因组的分箱和注释,以及4)创建可用微生物的泛基因组谱。这些阶段的每个阶段都提出了算法挑战。宏基因组数据集中微生物组的多样性覆盖,加上仪器误差,使得识别实际物种及其遗传多样性的过程极具挑战性,需要在字符串匹配和图形分析方面进行广泛的研究。分布式软件实施必须解决众多HPC挑战。研究成果,包括出版物和开源代码库,将支持UAF的多项研究活动,重点是北极气候变化,北极海洋生物学,阿拉斯加土著健康等。该奖学金促进的合作也将为跨学科博士奠定基础。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Arghya Das其他文献
An advanced pore-scale model for simulating water retention characteristics in granular soils
用于模拟粒状土壤保水特性的先进孔隙尺度模型
- DOI:
10.1016/j.jhydrol.2022.128561 - 发表时间:
2022 - 期刊:
- 影响因子:6.4
- 作者:
Suaiba Mufti;Arghya Das - 通讯作者:
Arghya Das
Asymptotically flat vacuum solution for a rotating black hole in a modified gravity theory
修正引力理论中旋转黑洞的渐近平坦真空解
- DOI:
10.1140/epjc/s10052-022-10899-5 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Arghya Das;B. Mukhopadhyay - 通讯作者:
B. Mukhopadhyay
Evaluation of a simple method for testing aztreonam and ceftazidime-avibactam synergy in New Delhi metallo-beta-lactamase producing Enterobacterales
评估新德里产金属-β-内酰胺酶肠杆菌中氨曲南和头孢他啶-阿维巴坦协同作用的简单方法
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:3.7
- 作者:
Salman Khan;Arghya Das;Deepali Vashisth;Anwita Mishra;A. Vidyarthi;Raghav Gupta;N. Begam;Babita Kataria;Sushma Bhatnagar - 通讯作者:
Sushma Bhatnagar
Transport and fluctuations in mass aggregation processes: Mobility-driven clustering.
质量聚合过程中的传输和波动:移动驱动的聚类。
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:2.4
- 作者:
Subhadip Chakraborti;Tanmoy Chakraborty;Arghya Das;Rahul Dandekar;P. Pradhan - 通讯作者:
P. Pradhan
Mucormycosis and black fungus: Breaking the myth.
毛霉菌病和黑木耳:打破神话。
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:4.9
- 作者:
A. Vidyarthi;Arghya Das;R. Chaudhry - 通讯作者:
R. Chaudhry
Arghya Das的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Arghya Das', 18)}}的其他基金
Equipment: MRI Track-I: Acquisition of CyBR: Cyber Infrastructure for Big Data Research Critical for Alaska
设备: MRI Track-I:收购 CyBR:对阿拉斯加至关重要的大数据研究网络基础设施
- 批准号:
2320196 - 财政年份:2023
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
相似海外基金
RII Track-4:NSF: Integrated Electrochemical-Optical Microscopy for High Throughput Screening of Electrocatalysts
RII Track-4:NSF:用于高通量筛选电催化剂的集成电化学光学显微镜
- 批准号:
2327025 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: Resistively-Detected Electron Spin Resonance in Multilayer Graphene
RII Track-4:NSF:多层石墨烯中电阻检测的电子自旋共振
- 批准号:
2327206 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: Improving subseasonal-to-seasonal forecasts of Central Pacific extreme hydrometeorological events and their impacts in Hawaii
RII Track-4:NSF:改进中太平洋极端水文气象事件的次季节到季节预报及其对夏威夷的影响
- 批准号:
2327232 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: Design of zeolite-encapsulated metal phthalocyanines catalysts enabled by insights from synchrotron-based X-ray techniques
RII Track-4:NSF:通过基于同步加速器的 X 射线技术的见解实现沸石封装金属酞菁催化剂的设计
- 批准号:
2327267 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: From the Ground Up to the Air Above Coastal Dunes: How Groundwater and Evaporation Affect the Mechanism of Wind Erosion
RII Track-4:NSF:从地面到沿海沙丘上方的空气:地下水和蒸发如何影响风蚀机制
- 批准号:
2327346 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: In-Situ/Operando Characterizations of Single Atom Catalysts for Clean Fuel Generation
RII Track-4:NSF:用于清洁燃料生成的单原子催化剂的原位/操作表征
- 批准号:
2327349 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4: NSF: Fundamental study on hydrogen flow in porous media during repetitive drainage-imbibition processes and upscaling for underground energy storage
RII Track-4:NSF:重复排水-自吸过程中多孔介质中氢气流动的基础研究以及地下储能的升级
- 批准号:
2327317 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: An Integrated Urban Meteorological and Building Stock Modeling Framework to Enhance City-level Building Energy Use Predictions
RII Track-4:NSF:综合城市气象和建筑群建模框架,以增强城市级建筑能源使用预测
- 批准号:
2327435 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4: NSF: Developing 3D Models of Live-Endothelial Cell Dynamics with Application Appropriate Validation
RII Track-4:NSF:开发活内皮细胞动力学的 3D 模型并进行适当的应用验证
- 批准号:
2327466 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant
RII Track-4:NSF: HEAL: Heterogeneity-aware Efficient and Adaptive Learning at Clusters and Edges
RII Track-4:NSF:HEAL:集群和边缘的异质性感知高效自适应学习
- 批准号:
2327452 - 财政年份:2024
- 资助金额:
$ 29.21万 - 项目类别:
Standard Grant