CAREER: Scalable algorithms for regularized and non-linear genetic models of gene expression
职业:基因表达的正则化和非线性遗传模型的可扩展算法
基本信息
- 批准号:2336469
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-03-01 至 2029-02-28
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
DNA mutations have a profound effect on how genes work, but it’s still not well understood which mutations affect which genes. Currently, our knowledge is limited due to challenges in analyzing genomics data, such as bias arising from an overrepresentation of European study participants and simplistic statistical models that do not sufficiently capture the data. This project overcomes these challenges across three main scientific goals, in which innovative statistical models map DNA mutations to their target genes, and two educational goals, in which scientific training and diversity are simultaneously cultivated. First, the investigators will improve the fidelity of mapping mutations to target genes for groups of individuals that are not well-studied, such as minority populations. Second, the investigators will develop a new method to connect mutations to genes by considering how genes interact with each other in genome-wide networks, suggesting functional effects for many uncharacterized mutations. Third, the investigators will characterize the specific cells in which mutations exert their effects using scalable models that reflect the natural distribution of data from single cell genomic assays. This research advances the fields of bioinformatics and human genetics by introducing new, robust statistical models that link mutations to their target genes. This project also enhances equity and diversity in biomedical discoveries, while simultaneously enhancing diversity within research environments. Toward the latter, the investigators initiate a multi-week on-campus research program for high school students from under-resourced communities, as well as genetics training courses for undergraduate and graduate students, supplying quantitative interdisciplinary skills coveted by industry and academia alike. This award will generate extensive datasets, open-source statistical models and genomics tools, high-impact publications, and course materials, thereby engaging and fueling the scientific community to partake and propel related research. This project focuses on developing new genetic models to understand how specific genetic variations influence gene expression. These models overcome current limitations in characterizing the function of genetic variation, which often has the subsequent goal of implicating target genes in the regulation of human phenotypes such as height and cancer risk. Challenges of existing algorithms include statistical issues due to finite sample sizes (especially for understudied minority populations), multiple hypothesis burdens restricting the knowledge gained from genome-wide analysis, and model misspecification especially for new datatypes of growing popularity, such as single cell genomics. The investigators address these challenges across three main objectives. First, the investigators link genetic variation to changes in gene expression in understudied minority populations, by jointly modeling genetic associations across globally diverse datasets. Second, the investigators develop a comprehensive approach to map genome-wide genetic variants to changes in gene expression using a priori knowledge of gene regulatory networks and advanced machine learning algorithms to reduce the burden of multiple testing. Third, the investigators design a new statistical model to characterize the cell-type-specificity of gene expression regulation at high resolution; this model leverages the natural distribution of single cell data, resolving model misspecification of state-of-the-art methods and reduces measurement noise by modeling millions of single cell measurements across donors. This award supports the generation of open-source genomics software and data repositories characterizing the function of genetic variants, while also creating educational and training opportunities for under-resourced high school students and motivated undergraduate and graduate students. The symbiotic research and educational intertwine in a relationship that is expected to enhance both the diversity in research environments, as well as the diversity in research cohorts.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
DNA突变对基因的工作方式有着深远的影响,但目前还不清楚哪些突变会影响哪些基因。目前,我们的知识是有限的,由于在分析基因组学数据的挑战,如偏见所产生的欧洲研究参与者和简单的统计模型,不足以捕捉数据的过度代表性。该项目克服了三个主要科学目标的挑战,其中创新的统计模型将DNA突变映射到其靶基因,以及两个教育目标,其中科学培训和多样性同时培养。首先,研究人员将提高定位突变的保真度,以针对未得到充分研究的个体群体(如少数群体)的靶基因。其次,研究人员将开发一种新方法,通过考虑基因在全基因组网络中如何相互作用,将突变与基因联系起来,从而为许多未表征的突变提供功能效应。第三,研究人员将使用可扩展的模型来表征突变发挥作用的特定细胞,这些模型反映了单细胞基因组测定数据的自然分布。这项研究通过引入将突变与其靶基因联系起来的新的、强大的统计模型,推进了生物信息学和人类遗传学领域。该项目还增强了生物医学发现的公平性和多样性,同时增强了研究环境的多样性。对于后者,研究人员为来自资源不足社区的高中生启动了为期数周的校园研究计划,并为本科生和研究生提供遗传学培训课程,提供了行业和学术界都渴望的定量跨学科技能。该奖项将产生广泛的数据集,开源统计模型和基因组学工具,高影响力的出版物和课程材料,从而吸引和推动科学界参与和推动相关研究。该项目的重点是开发新的遗传模型,以了解特定的遗传变异如何影响基因表达。这些模型克服了目前在表征遗传变异功能方面的局限性,遗传变异的后续目标通常是将靶基因与人类表型(如身高和癌症风险)的调控联系起来。现有算法的挑战包括由于有限的样本量(特别是对于未充分研究的少数群体),多个假设负担限制了从全基因组分析中获得的知识,以及模型错误指定(特别是对于日益流行的新数据库,如单细胞基因组学)而导致的统计问题。调查人员通过三个主要目标解决这些挑战。首先,研究人员将遗传变异与未充分研究的少数群体的基因表达变化联系起来,通过在全球不同数据集上联合建模遗传关联。其次,研究人员开发了一种综合方法,利用基因调控网络的先验知识和先进的机器学习算法将全基因组遗传变异映射到基因表达的变化,以减少多次测试的负担。第三,研究人员设计了一种新的统计模型,以高分辨率表征基因表达调控的细胞类型特异性;该模型利用单细胞数据的自然分布,解决了最先进方法的模型错误,并通过对供体之间数百万个单细胞测量进行建模来减少测量噪声。该奖项支持生成开源基因组学软件和表征遗传变异功能的数据存储库,同时也为资源不足的高中生和有动力的本科生和研究生创造教育和培训机会。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Tiffany Amariuta-Bartell其他文献
Tiffany Amariuta-Bartell的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
相似海外基金
CAREER: Fast Scalable Graph Algorithms
职业:快速可扩展图算法
- 批准号:
2340048 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
CAREER: Scalable and Robust Uncertainty Quantification using Subsampling Markov Chain Monte Carlo Algorithms
职业:使用子采样马尔可夫链蒙特卡罗算法进行可扩展且稳健的不确定性量化
- 批准号:
2340586 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
CAREER: Learning Kernels in Operators from Data: Learning Theory, Scalable Algorithms and Applications
职业:从数据中学习算子的内核:学习理论、可扩展算法和应用
- 批准号:
2238486 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
CAREER: Scalable Algorithms for Nonlinear, Large-Scale Inverse Problems Governed by Dynamical Systems
职业:动态系统控制的非线性、大规模反问题的可扩展算法
- 批准号:
2145845 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
CAREER: Scalable binning algorithms for genome-resolved metagenomics
职业:用于基因组解析宏基因组学的可扩展分箱算法
- 批准号:
1845890 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
CAREER: Pushing the Theoretical Limits of Scalable Distributed Algorithms
职业:突破可扩展分布式算法的理论极限
- 批准号:
1845146 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
CAREER: Towards Fast and Scalable Algorithms for Big Proteogenomics Data Analytics
职业:面向蛋白质基因组大数据分析的快速且可扩展的算法
- 批准号:
1925960 - 财政年份:2018
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CAREER: Towards Fast and Scalable Algorithms for Big Proteogenomics Data Analytics
职业:面向蛋白质基因组大数据分析的快速且可扩展的算法
- 批准号:
1651724 - 财政年份:2017
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CAREER: Scalable Algorithms for Spectral Analysis of Massive Networked Systems
职业:大规模网络系统频谱分析的可扩展算法
- 批准号:
1651433 - 财政年份:2017
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CAREER: Fast and Scalable Combinatorial Algorithms for Data Analytics
职业:用于数据分析的快速且可扩展的组合算法
- 批准号:
1553528 - 财政年份:2016
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant