Geometric structures guided learning model and algorithms for bulk RNAseq data analysis
用于批量 RNAseq 数据分析的几何结构引导学习模型和算法
基本信息
- 批准号:10592460
- 负责人:
- 金额:$ 21.47万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-09-28 至 2025-07-31
- 项目状态:未结题
- 来源:
- 关键词:AddressAlgorithmsAlzheimer&aposs DiseaseBenchmarkingBiologicalCell ExtractsCellsCharacteristicsComplexComputational algorithmComputer HardwareControl GroupsDataData AnalysesData CompressionData SetDiseaseExhibitsGene ExpressionGenesGraphHumanIndividualLaplacianLearningMathematicsMethodsModelingModernizationNoiseOutcomePharmacotherapyProblem SolvingProcessRandomizedResearchStructureTechniquesTissue SampleTissuesTranscription AlterationValidationWorkbasecell typecomputerized toolscostdata spacedifferential expressionfundamental researchgeometric structureinnovationlarge datasetsmathematical modeltranscriptome sequencing
项目摘要
Discovering potential drugs and treatments of many diseases heavily depends on identifying differentially
expressed (DE) genes in disease conditions within individual cell types. While it is possible to
experimentally sort out cells of individual cell types for DE analysis, computationally leveraging bulk tissue
data has the advantage of greater availability, lower expenses, and less human handling. A critical step
toward this research is to (completely) deconvolute gene expressions in specific cell types from the
heterogeneous bulk tissues. Complete deconvolution can be viewed as a nonnegative matrix factorization
(NMF) problem, however, NMF is strongly ill-posed, and its non-separable solutions give great challenges
in data interpretability. These challenges vary in different applications, so if no special treatment is taken,
results from complete deconvolution of gene expression data will make accurate DE analysis almost
impossible. In this proposal, a mathematical model and associated computational algorithms will be
established for the fundamental research of bulk tissue RNAseq analysis, for better data interpretability,
reliability, and efficiency. To tackle this challenge, the geometric structure of the given bulk tissue data set
will be explored first to identify marker genes for the constituent cell types. Then the model is established
by (1) enforcing the weak solvability condition (because of noises) of NMF and (2) performing geometrical
constraints on the data space of knowns. This work is motivated by the common characteristics of many
biological data, in which expression levels across sample tissues exhibit strong correlations among certain
genes. For massive amount of biological data, stochastic fast computational algorithms will be developed.
After validation and benchmarking, the proposed model will be applied to DE analysis for various datasets.
This proposed new model is important to decipher cellular transcriptional alterations in many diseases. In
modeling strategies, this research provides a new perspective of observing topological/geometric
structures of data, enforcing the corresponding constraints to enhance problem solvability and data
interpretability. In computation, this research develops nonlinear graph Laplacian regularized optimization
associated with stochastic compression algorithms, which can process massive data with low storage.
requirement, low complexity, and adapt to modern structure of computer hardware.
As
发现许多疾病的潜在药物和治疗方法在很大程度上取决于差异化识别
疾病状况下个体细胞类型中表达的(DE)基因。虽然有可能
通过计算利用大量组织,通过实验筛选出单个细胞类型的细胞进行 DE 分析
数据具有更高的可用性、更低的费用和更少的人工处理的优点。关键的一步
这项研究的目的是(完全)将特定细胞类型中的基因表达从
异质块体组织。完全反卷积可以看作是非负矩阵分解
(NMF)问题,然而,NMF是强不适定的,其不可分离的解决方案带来了巨大的挑战
在数据可解释性方面。这些挑战在不同的应用中有所不同,因此如果不采取特殊处理,
基因表达数据完全反卷积的结果将使 DE 分析几乎准确
不可能的。在这个提案中,数学模型和相关的计算算法将是
为大量组织 RNAseq 分析的基础研究而建立,以实现更好的数据解释性,
可靠性和效率。为了应对这一挑战,给定大块组织数据集的几何结构
首先将探索以确定组成细胞类型的标记基因。那么模型就建立了
通过 (1) 强制 NMF 的弱可解性条件(由于噪声)和 (2) 执行几何
已知数据空间的约束。这项工作的动机是许多人的共同特征
生物数据,其中样本组织的表达水平在某些
基因。对于大量的生物数据,将开发随机快速计算算法。
经过验证和基准测试后,所提出的模型将应用于各种数据集的 DE 分析。
这种提出的新模型对于破译许多疾病中的细胞转录改变非常重要。在
建模策略,这项研究提供了观察拓扑/几何的新视角
数据结构,实施相应的约束以增强问题的可解决性和数据
可解释性。在计算方面,本研究开发了非线性图拉普拉斯正则化优化
与随机压缩算法相关,可以用低存储量处理海量数据。
要求高,复杂度低,适应现代计算机硬件结构。
作为
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Duan Chen其他文献
Duan Chen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Duan Chen', 18)}}的其他基金
Geometric structures guided learning model and algorithms for bulk RNAseq data analysis
用于批量 RNAseq 数据分析的几何结构引导学习模型和算法
- 批准号:
10710214 - 财政年份:2022
- 资助金额:
$ 21.47万 - 项目类别:
相似海外基金
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Continuing Grant
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
$ 21.47万 - 项目类别:
Research Grant














{{item.name}}会员




