EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA
大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法
基本信息
- 批准号:10335252
- 负责人:
- 金额:$ 40.04万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-05-01 至 2025-01-31
- 项目状态:未结题
- 来源:
- 关键词:3-DimensionalAddressAdoptedAlgorithmsAttenuatedBenchmarkingBig Data MethodsBig Data to KnowledgeBiologicalBiologyCalibrationCellsComputational BiologyDNA MethylationDataData AnalysesData AnalyticsData SetDetectionDiffusionDimensionsDropoutEmerging TechnologiesExcisionFundingGenerationsGenesGenomic approachGraphImmuneLaplacianLearningMapsMeasurementMethodsModalityNeuronsNoisePathogenesisPhenotypePopulationProbabilityProceduresRecoveryResearchResearch PersonnelSamplingScienceSeriesSignal TransductionSpeedStructureSystemTechniquesUnited States National Institutes of HealthValidationVariantVisualizationanalytical methodartificial neural networkbasebiomarker discoverycell typecomputerized toolsdeep learningdeep neural networkdensityexperimental studyhematopoietic differentiationhuman diseaseimprovedinsightkernel methodslarge datasetslearning networkmultidimensional dataneural networknovelprototyperesponsesingle cell analysissingle-cell RNA sequencingtheoriestooltranscriptome sequencing
项目摘要
Single cell RNA-seq (scRNA-seq) profiling provides an unprecedented opportunity to conduct detailed cellular
analysis of cell subpopulations. Fulfilling the promise of scRNA-seq for biomedical studies and biomarker
discovery requires robust computational approaches to support detection of rare phenotypes and unanticipated
cellular responses. Current approaches for imputation, calibration, clustering and visualizing of scRNA-seq
data suffer from challenges such as erroneous imputing of non-expressed genes, limitation of linear
assumptions in removal of multivariate batch effects, and inefficiencies of clustering and dimensional reduction
methods of very large datasets. We have developed spectral, neural network, and Fast Multipole Methods
(FMM) prototypes suitable for addressing these issues in the context of scRNA-seq and other high throughput
data contexts and propose to further develop and adapt these methods to scRNA-seq data analysis. Our team
of experts on data analytics and computational biology is currently funded through the NIH BD2K initiative to
develop novel big data tools and methods that have broad applicability to biomedical science. This effort
proved the feasibility of extremely efficient scalable prototypes of neural network, spectral, and harmonic
analysis techniques suitable for calibrating, reducing the dimensionality and visualizing high dimensional data,
finding intrinsic state-probability densities, and co-organizing cells, markers and samples. We propose
substantial advances over existing analytical procedures used in single cell RNA-seq studies including matrix
recovery approaches for the sparse and noisy scRNA-seq data by combining matrix completion and statistical
techniques (Aim 1A), and calibration based on our unsupervised MMD-ResNet neural network prototype and
optimal transport theory (Aim 1B). We will develop a variant of the FMM approach to speed up the calculation
of the repulsion term of the t-distributed stochastic neighbor embedding (t-SNE) visualization technique, which
will improve our current fastest t-SNE FFT-based FIt-SNE prototype, and develop new reliable approximate
nearest neighbors approaches to speed up the computation of the attraction term of t-SNE and other clustering
algorithms (Aim 2A). Our additional variants of t-SNE will be further developed to allow better separation
between clusters of cell subpopulations (late exaggeration) and better visualization using 1D t-SNE for
heatmap gene-cell representation (Aim 2A). We will adapt SpectralNet, our efficient neural network approach,
for computing graph Laplacian eigenvectors for large datasets. This will enable computation of spectral
clustering, diffusion maps and manifold learning that are utilized in many scRNA studies but are currently
limited to a moderate number of single cells (Aim 2B). Finally, we will develop a kernel based differential
abundance algorithm to characterize differences between biological conditions (Aim 2C). We will adopt
appropriate sampling approaches to significantly improve current methods.
单细胞RNA-seq(scRNA-seq)分析提供了一个前所未有的机会,进行详细的细胞
细胞亚群的分析。履行scRNA-seq在生物医学研究和生物标志物方面的承诺
发现需要强大的计算方法来支持罕见表型和意外表型的检测
细胞反应。scRNA-seq的估算、校准、聚类和可视化的当前方法
数据受到挑战,如错误的非表达基因的插补,线性的限制,
去除多变量批量效应的假设,以及聚类和降维的效率低下
非常大的数据集的方法。我们开发了谱方法、神经网络方法和快速多极方法
(FMM)适用于在scRNA-seq和其他高通量的背景下解决这些问题的原型
数据背景,并建议进一步开发和调整这些方法,以scRNA-seq数据分析。我们的团队
数据分析和计算生物学专家的研究目前通过NIH BD 2K计划获得资助,
开发对生物医学科学具有广泛适用性的新型大数据工具和方法。这一努力
证明了神经网络、频谱和谐波的高效可扩展原型的可行性
- 适合于校准、降维和可视化高维数据的分析技术,
寻找内在状态概率密度,以及共同组织细胞、标记和样本。我们提出
与单细胞RNA-seq研究中使用的现有分析方法相比,
通过组合矩阵完成和统计的稀疏和噪声scRNA-seq数据的恢复方法
技术(Aim 1A),并基于我们的无监督MMD-ResNet神经网络原型进行校准,
最佳运输理论(Aim 1B)。我们将开发FMM方法的变体以加快计算速度
的排斥项的t分布随机邻居嵌入(t-SNE)可视化技术,
将改进我们目前最快的基于FFT的t-SNE FIT-SNE原型,并开发新的可靠的近似
最近邻方法来加速t-SNE和其他聚类的吸引项的计算
算法(目标2A)。我们将进一步开发t-SNE的其他变体,以实现更好的分离
细胞亚群簇之间(后期夸张)以及使用1D t-SNE进行更好的可视化
热图基因-细胞表示(目的2A)。我们将采用我们高效的神经网络方法SpectralNet,
用于计算大型数据集的图拉普拉斯特征向量。这将使光谱计算
聚类,扩散图和流形学习,在许多scRNA研究中使用,但目前
仅限于中等数量的单细胞(目标2B)。最后,我们将开发一个基于内核的差分
丰度算法来表征生物条件之间的差异(目标2C)。我们将采取
适当的取样方法,以大大改善目前的方法。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Two-sample Statistics Based on Anisotropic Kernels
- DOI:10.1093/imaiai/iaz018
- 发表时间:2017-09
- 期刊:
- 影响因子:0
- 作者:Xiuyuan Cheng;A. Cloninger;R. Coifman
- 通讯作者:Xiuyuan Cheng;A. Cloninger;R. Coifman
Classification Logit Two-Sample Testing by Neural Networks for Differentiating Near Manifold Densities
- DOI:10.1109/tit.2022.3175691
- 发表时间:2019-09
- 期刊:
- 影响因子:2.5
- 作者:Xiuyuan Cheng;A. Cloninger
- 通讯作者:Xiuyuan Cheng;A. Cloninger
Multi-Omics Investigation of Innate Navitoclax Resistance in Triple-Negative Breast Cancer Cells.
- DOI:10.3390/cancers12092551
- 发表时间:2020-09-08
- 期刊:
- 影响因子:5.2
- 作者:Marczyk M;Patwardhan GA;Zhao J;Qu R;Li X;Wali VB;Gupta AK;Pillai MM;Kluger Y;Yan Q;Hatzis C;Pusztai L;Gunasekharan V
- 通讯作者:Gunasekharan V
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yuval Kluger其他文献
Yuval Kluger的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yuval Kluger', 18)}}的其他基金
EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA
大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法
- 批准号:
9920743 - 财政年份:2019
- 资助金额:
$ 40.04万 - 项目类别:
EFFICIENT METHODS FOR CALIBRATION, CLUSTERING, VISUALIZATION AND IMPUTATION OF LARGE scRNA-seq DATA
大规模 scRNA-seq 数据校准、聚类、可视化和插补的有效方法
- 批准号:
9764594 - 财政年份:2019
- 资助金额:
$ 40.04万 - 项目类别:
EFFICIENT SPECTRAL APPROACHES FOR FINDING UNDERLYING STRUCTURES IN BIG DATA
用于查找大数据底层结构的高效谱方法
- 批准号:
9278252 - 财政年份:2016
- 资助金额:
$ 40.04万 - 项目类别:
Co-ordination of recombination and allelic exclusion at IgH and Igk loci
IgH 和 Igk 位点重组和等位基因排除的协调
- 批准号:
8740626 - 财政年份:2014
- 资助金额:
$ 40.04万 - 项目类别:
Role of ATM and RAG in maintaining genome stability during Tcra/d rearrangement.
ATM 和 RAG 在 Tcra/d 重排期间维持基因组稳定性中的作用。
- 批准号:
8707743 - 财政年份:2013
- 资助金额:
$ 40.04万 - 项目类别:
Role of ATM and RAG in maintaining genome stability during Tcra/d rearrangement.
ATM 和 RAG 在 Tcra/d 重排期间维持基因组稳定性中的作用。
- 批准号:
8513573 - 财政年份:2012
- 资助金额:
$ 40.04万 - 项目类别:
相似海外基金
Rational design of rapidly translatable, highly antigenic and novel recombinant immunogens to address deficiencies of current snakebite treatments
合理设计可快速翻译、高抗原性和新型重组免疫原,以解决当前蛇咬伤治疗的缺陷
- 批准号:
MR/S03398X/2 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Fellowship
Re-thinking drug nanocrystals as highly loaded vectors to address key unmet therapeutic challenges
重新思考药物纳米晶体作为高负载载体以解决关键的未满足的治疗挑战
- 批准号:
EP/Y001486/1 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Research Grant
CAREER: FEAST (Food Ecosystems And circularity for Sustainable Transformation) framework to address Hidden Hunger
职业:FEAST(食品生态系统和可持续转型循环)框架解决隐性饥饿
- 批准号:
2338423 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Continuing Grant
Metrology to address ion suppression in multimodal mass spectrometry imaging with application in oncology
计量学解决多模态质谱成像中的离子抑制问题及其在肿瘤学中的应用
- 批准号:
MR/X03657X/1 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Fellowship
CRII: SHF: A Novel Address Translation Architecture for Virtualized Clouds
CRII:SHF:一种用于虚拟化云的新型地址转换架构
- 批准号:
2348066 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Standard Grant
BIORETS: Convergence Research Experiences for Teachers in Synthetic and Systems Biology to Address Challenges in Food, Health, Energy, and Environment
BIORETS:合成和系统生物学教师的融合研究经验,以应对食品、健康、能源和环境方面的挑战
- 批准号:
2341402 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Standard Grant
The Abundance Project: Enhancing Cultural & Green Inclusion in Social Prescribing in Southwest London to Address Ethnic Inequalities in Mental Health
丰富项目:增强文化
- 批准号:
AH/Z505481/1 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Research Grant
ERAMET - Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
ERAMET - 快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10107647 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
EU-Funded
Ecosystem for rapid adoption of modelling and simulation METhods to address regulatory needs in the development of orphan and paediatric medicines
快速采用建模和模拟方法的生态系统,以满足孤儿药和儿科药物开发中的监管需求
- 批准号:
10106221 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
EU-Funded
Recite: Building Research by Communities to Address Inequities through Expression
背诵:社区开展研究,通过表达解决不平等问题
- 批准号:
AH/Z505341/1 - 财政年份:2024
- 资助金额:
$ 40.04万 - 项目类别:
Research Grant