权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA

大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法

基本信息

批准号：
10335252
负责人：
Yuval Kluger
金额：
$ 40.04万
依托单位：
YALE UNIVERSITY
依托单位国家：
美国
项目类别：
财政年份：
2019
资助国家：
美国
起止时间：
2019-05-01 至 2025-01-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10335252
关键词：
3-Dimensional Address Adopted Algorithms Attenuated Benchmarking Big Data Methods Big Data to Knowledge Biological Biology Calibration Cells Computational Biology DNA Methylation Data Data Analyses Data Analytics Data Set Detection Diffusion Dimensions Dropout Emerging Technologies Excision Funding Generations Genes Genomic approach Graph Immune Laplacian Learning Maps Measurement Methods Modality Neurons Noise Pathogenesis Phenotype Population Probability Procedures Recovery Research Research Personnel Sampling Science Series Signal Transduction Speed Structure System Techniques United States National Institutes of Health Validation Variant Visualization analytical method artificial neural network base biomarker discovery cell type computerized tools deep learning deep neural network density experimental study hematopoietic differentiation human disease improved insight kernel methods large datasets learning network multidimensional data neural network novel prototype response single cell analysis single-cell RNA sequencing theories tool transcriptome sequencing

项目摘要

Single cell RNA-seq (scRNA-seq) profiling provides an unprecedented opportunity to conduct detailed cellular analysis of cell subpopulations. Fulfilling the promise of scRNA-seq for biomedical studies and biomarker discovery requires robust computational approaches to support detection of rare phenotypes and unanticipated cellular responses. Current approaches for imputation, calibration, clustering and visualizing of scRNA-seq data suffer from challenges such as erroneous imputing of non-expressed genes, limitation of linear assumptions in removal of multivariate batch effects, and inefficiencies of clustering and dimensional reduction methods of very large datasets. We have developed spectral, neural network, and Fast Multipole Methods (FMM) prototypes suitable for addressing these issues in the context of scRNA-seq and other high throughput data contexts and propose to further develop and adapt these methods to scRNA-seq data analysis. Our team of experts on data analytics and computational biology is currently funded through the NIH BD2K initiative to develop novel big data tools and methods that have broad applicability to biomedical science. This effort proved the feasibility of extremely efficient scalable prototypes of neural network, spectral, and harmonic analysis techniques suitable for calibrating, reducing the dimensionality and visualizing high dimensional data, finding intrinsic state-probability densities, and co-organizing cells, markers and samples. We propose substantial advances over existing analytical procedures used in single cell RNA-seq studies including matrix recovery approaches for the sparse and noisy scRNA-seq data by combining matrix completion and statistical techniques (Aim 1A), and calibration based on our unsupervised MMD-ResNet neural network prototype and optimal transport theory (Aim 1B). We will develop a variant of the FMM approach to speed up the calculation of the repulsion term of the t-distributed stochastic neighbor embedding (t-SNE) visualization technique, which will improve our current fastest t-SNE FFT-based FIt-SNE prototype, and develop new reliable approximate nearest neighbors approaches to speed up the computation of the attraction term of t-SNE and other clustering algorithms (Aim 2A). Our additional variants of t-SNE will be further developed to allow better separation between clusters of cell subpopulations (late exaggeration) and better visualization using 1D t-SNE for heatmap gene-cell representation (Aim 2A). We will adapt SpectralNet, our efficient neural network approach, for computing graph Laplacian eigenvectors for large datasets. This will enable computation of spectral clustering, diffusion maps and manifold learning that are utilized in many scRNA studies but are currently limited to a moderate number of single cells (Aim 2B). Finally, we will develop a kernel based differential abundance algorithm to characterize differences between biological conditions (Aim 2C). We will adopt appropriate sampling approaches to significantly improve current methods.

单细胞RNA-seq（scRNA-seq）分析提供了一个前所未有的机会，进行详细的细胞细胞亚群的分析。履行scRNA-seq在生物医学研究和生物标志物方面的承诺发现需要强大的计算方法来支持罕见表型和意外表型的检测细胞反应。scRNA-seq的估算、校准、聚类和可视化的当前方法数据受到挑战，如错误的非表达基因的插补，线性的限制，去除多变量批量效应的假设，以及聚类和降维的效率低下非常大的数据集的方法。我们开发了谱方法、神经网络方法和快速多极方法 (FMM)适用于在scRNA-seq和其他高通量的背景下解决这些问题的原型数据背景，并建议进一步开发和调整这些方法，以scRNA-seq数据分析。我们的团队数据分析和计算生物学专家的研究目前通过NIH BD 2K计划获得资助，开发对生物医学科学具有广泛适用性的新型大数据工具和方法。这一努力证明了神经网络、频谱和谐波的高效可扩展原型的可行性 - 适合于校准、降维和可视化高维数据的分析技术，寻找内在状态概率密度，以及共同组织细胞、标记和样本。我们提出与单细胞RNA-seq研究中使用的现有分析方法相比，通过组合矩阵完成和统计的稀疏和噪声scRNA-seq数据的恢复方法技术（Aim 1A），并基于我们的无监督MMD-ResNet神经网络原型进行校准，最佳运输理论（Aim 1B）。我们将开发FMM方法的变体以加快计算速度的排斥项的t分布随机邻居嵌入（t-SNE）可视化技术，将改进我们目前最快的基于FFT的t-SNE FIT-SNE原型，并开发新的可靠的近似最近邻方法来加速t-SNE和其他聚类的吸引项的计算算法（目标2A）。我们将进一步开发t-SNE的其他变体，以实现更好的分离细胞亚群簇之间（后期夸张）以及使用1D t-SNE进行更好的可视化热图基因-细胞表示（目的2A）。我们将采用我们高效的神经网络方法SpectralNet，用于计算大型数据集的图拉普拉斯特征向量。这将使光谱计算聚类，扩散图和流形学习，在许多scRNA研究中使用，但目前仅限于中等数量的单细胞（目标2B）。最后，我们将开发一个基于内核的差分丰度算法来表征生物条件之间的差异（目标2C）。我们将采取适当的取样方法，以大大改善目前的方法。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Two-sample Statistics Based on Anisotropic Kernels

DOI：
10.1093/imaiai/iaz018
发表时间：
2017-09
期刊：
Information and inference : a journal of the IMA
影响因子：
0
作者：
Xiuyuan Cheng;A. Cloninger;R. Coifman
通讯作者：
Xiuyuan Cheng;A. Cloninger;R. Coifman

Classification Logit Two-Sample Testing by Neural Networks for Differentiating Near Manifold Densities

DOI：
10.1109/tit.2022.3175691
发表时间：
2019-09
期刊：
IEEE Transactions on Information Theory
影响因子：
2.5
作者：
Xiuyuan Cheng;A. Cloninger
通讯作者：
Xiuyuan Cheng;A. Cloninger

Multi-Omics Investigation of Innate Navitoclax Resistance in Triple-Negative Breast Cancer Cells.

DOI：
10.3390/cancers12092551
发表时间：
2020-09-08
期刊：
Cancers
影响因子：
5.2
作者：
Marczyk M;Patwardhan GA;Zhao J;Qu R;Li X;Wali VB;Gupta AK;Pillai MM;Kluger Y;Yan Q;Hatzis C;Pusztai L;Gunasekharan V
通讯作者：
Gunasekharan V