CAREER: Practical algorithms and high dimensional statistical methods for multimodal haplotype modelling
职业:多模态单倍型建模的实用算法和高维统计方法
基本信息
- 批准号:2239870
- 负责人:
- 金额:$ 54.83万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-15 至 2028-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Massive and diverse datasets have been generated from human cells with the goal of explaining the many ways cellular differences affect the observed differences in traits between people. Mathematical models of the genetic differences between people can be used to explain, for example, why some individuals are predisposed to developing a particular disease. However, most mathematical models make overly simplistic assumptions about how genetic differences interact to influence an observed trait. This project addresses major challenges in computational biology and applied machine learning by innovating new robust mathematical models that make few assumptions and efficient training algorithms to leverage massive and complex cellular data. Specifically, the project considers: (a) methods for computing sequences of genetic differences by integrating different types of data, machine learning, and algorithmic techniques; (b) mathematical models for characterizing the genetic similarity between people; and (c) efficient algorithms that scale to large datasets. The results of this project include new methods that are broadly applicable to clustering massive and diverse sequential data, and specifically helpful for researchers trying to understand how genetic differences affect disease and other traits. Furthermore, the research supports the math and science high school and university communities by developing interactive learning modules and networking resources.This project develops the statistical and algorithmic foundations for sequences of multimodal variation (i.e., multiomic haplotypes) in two research directions. The first direction introduces the multiomic haplotype data structure and develops new Bayesian nonparametric models and fast inference algorithms for clustering multiomic haplotypes from heterogeneous and high dimensional biomolecular data. Computational tractability is achieved through novel and efficient inference algorithms that operate in data-space (Bayesian coresets), model-space (deep approximations), and algorithm-space (variational approximations). The second direction develops the first model that unifies the combinatorial domain of haplotype assembly with the probabilistic haplotype phasing domain to infer latent haplotypes. The investigator will accomplish this unification goal by combining directed and undirected graphical modeling techniques with efficient particle-based inference algorithms. The completion of these research tasks will result in new methods for developing deep approximations for high dimensional Bayesian nonparametric models, models for multimodal sequential clustering, and methods to accelerate the training of high dimensional statistical models. Additionally, the research addresses (a) the longstanding open problem of haplotype assembly and haplotype phasing unification; and (b) potential sources of missing heritability in association studies: phase-dependent genetic and haplotype-epigenetic interactions. Partnerships with the university and regional high school communities will translate the research findings into educational modules and resources to motivate, engage, and retain computer science students and teachers.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
从人类细胞中产生了大量不同的数据集,目的是解释细胞差异影响人与人之间特征差异的多种方式。人与人之间遗传差异的数学模型可以用来解释,例如,为什么有些人容易患上某种疾病。然而,大多数数学模型对遗传差异如何相互作用以影响观察到的特征做出了过于简单化的假设。该项目通过创新新的强大的数学模型来解决计算生物学和应用机器学习的主要挑战,这些模型几乎没有假设和有效的训练算法,以利用大量复杂的细胞数据。具体而言,该项目考虑:(a)通过整合不同类型的数据、机器学习和算法技术来计算遗传差异序列的方法;(B)表征人与人之间遗传相似性的数学模型;(c)可扩展到大型数据集的高效算法。该项目的结果包括广泛适用于聚类大量和多样化序列数据的新方法,特别有助于研究人员试图了解遗传差异如何影响疾病和其他特征。此外,该研究通过开发交互式学习模块和网络资源来支持高中和大学的数学和科学社区。该项目为多模态变异序列(即,多组单倍型)在两个研究方向。第一个方向介绍了多组单倍型数据结构,并开发了新的贝叶斯非参数模型和快速推理算法,用于从异质性和高维生物分子数据中聚类多组单倍型。计算易处理性是通过在数据空间(贝叶斯核心集),模型空间(深度近似)和算法空间(变分近似)中操作的新颖而有效的推理算法来实现的。第二个方向开发了第一个模型,该模型将单倍型组装的组合域与概率单倍型定相域统一以推断潜在单倍型。研究人员将通过结合有向和无向图形建模技术与高效的基于粒子的推理算法来实现这一统一目标。这些研究任务的完成将导致开发高维贝叶斯非参数模型,多模态序贯聚类模型的深度近似的新方法,以及加速高维统计模型训练的方法。此外,该研究解决了(a)单倍型组装和单倍型定相统一的长期开放问题;(B)关联研究中缺失遗传力的潜在来源:相位依赖性遗传和单倍型-表观遗传相互作用。与大学和地区高中社区的合作伙伴关系将把研究成果转化为教育模块和资源,以激励、吸引和留住计算机科学学生和教师。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Derek Aguiar其他文献
Smooth and Stepwise Self-Distillation for Object Detection
用于物体检测的平滑逐步自蒸馏
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Jieren Deng;Xiaoxia Zhou;Hao Tian;Zhihong Pan;Derek Aguiar - 通讯作者:
Derek Aguiar
Evaluating molecular fingerprint-based models of drug side effects against a statistical control
- DOI:
10.1016/j.drudis.2022.103364 - 发表时间:
2022-11-01 - 期刊:
- 影响因子:7.500
- 作者:
Berk A. Alpay;Mark Gosink;Derek Aguiar - 通讯作者:
Derek Aguiar
Trends and Barriers of Medication Treatment for Opioid Use Disorders: A Systematic Review and Meta-Analysis
阿片类药物使用障碍药物治疗的趋势和障碍:系统回顾和荟萃分析
- DOI:
10.1177/00220426231204841 - 发表时间:
2023 - 期刊:
- 影响因子:1.7
- 作者:
M. Hutchison;Beth S. Russell;Abigail Leander;Nathaniel Rickles;Derek Aguiar;Xiaomei S. Cong;O. Harel;Adrian V. Hernandez - 通讯作者:
Adrian V. Hernandez
Derek Aguiar的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Collaborative Research: CIF: Small: Versatile Data Synchronization: Novel Codes and Algorithms for Practical Applications
合作研究:CIF:小型:多功能数据同步:实际应用的新颖代码和算法
- 批准号:
2312872 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Standard Grant
Development of practical screening tools to support targeted prevention of early, high-risk drinking substance use
开发实用的筛查工具,以支持有针对性地预防早期高风险饮酒物质的使用
- 批准号:
10802793 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Collaborative Research: SaTC: CORE: Small: Differentially Private Data Synthesis: Practical Algorithms and Statistical Foundations
协作研究:SaTC:核心:小型:差分隐私数据合成:实用算法和统计基础
- 批准号:
2247795 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Continuing Grant
Collaborative Research: SaTC: CORE: Small: Differentially Private Data Synthesis: Practical Algorithms and Statistical Foundations
协作研究:SaTC:核心:小型:差分隐私数据合成:实用算法和统计基础
- 批准号:
2247794 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Continuing Grant
Collaborative Research: CIF: Small: Versatile Data Synchronization: Novel Codes and Algorithms for Practical Applications
合作研究:CIF:小型:多功能数据同步:实际应用的新颖代码和算法
- 批准号:
2312871 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Standard Grant
PRIMES: Practical Inference Algorithms to Detect Hybridization
PRIMES:检测杂交的实用推理算法
- 批准号:
2331660 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: Versatile Data Synchronization: Novel Codes and Algorithms for Practical Applications
合作研究:CIF:小型:多功能数据同步:实际应用的新颖代码和算法
- 批准号:
2312873 - 财政年份:2023
- 资助金额:
$ 54.83万 - 项目类别:
Standard Grant
Multi-functional millimeter-wave radios for joint communication and sensing: signal processing algorithms and practical design
用于联合通信和传感的多功能毫米波无线电:信号处理算法和实用设计
- 批准号:
RGPIN-2020-06754 - 财政年份:2022
- 资助金额:
$ 54.83万 - 项目类别:
Discovery Grants Program - Individual
A study on practical algorithms for solving DM optimization problems
解决DM优化问题的实用算法研究
- 批准号:
22K11917 - 财政年份:2022
- 资助金额:
$ 54.83万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
SHARING QUALITATIVE RESEARCH DATA: IDENTIFYING AND ADDRESSING ETHICAL AND PRACTICAL BARRIERS
共享定性研究数据:识别和解决道德和实践障碍
- 批准号:
10614306 - 财政年份:2022
- 资助金额:
$ 54.83万 - 项目类别: