Theory and Methods for Large-Scale Multi-Modal Matrix Data

大规模多模态矩阵数据的理论与方法

基本信息

  • 批准号:
    2015492
  • 负责人:
  • 金额:
    $ 20万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-07-01 至 2024-06-30
  • 项目状态:
    已结题

项目摘要

Modern data acquisition technology produces new types of data that carry rich information but also poses new challenges for analysis. In many modern datasets, the basic unit of measurement can be a matrix or even higher order array recording the interactions among one or multiple groups of individuals. For example, a gene co-expression network measures the average strength of correlation between each pair of genes in a particular organ tissue. With gene co-expression networks collected at different developmental stages, it is possible to understand how groups of genes change their behavior in a coherent way. As another example, next generation sequencing techniques are able to produce gene expression data at different scales: Tissue sample data consists of gene expressions in bulk tissue samples, whereas single cell RNA sequencing data contains expressions of the same genes for individual cells. Motivated by the these examples, this research work aims at developing novel probability tools and statistical inference methods for complex matrix valued datasets, which will enable scientists to uncover salient structures in such datasets in a coherent and efficient way. The project also provides research training opportunities for graduate students. This project consists of two parts. In the first part, the PI studies multiple layer networks with a shared latent structure across layers and develops methods to efficiently combine the information across different layers to recover the latent structure, which would be impossible if only a single layer were available. The expected results will provide new probability theorems describing the behavior of random noises in matrix forms, as well as their linear combinations and higher order functions. In the second part, the PI studies a series of inference problems related to tissue and single cell RNA-seq data, starting from dimensionality reduction and variable selection in a computationally efficient manner, followed by downstream inference problems such as cell type deconvolution in tissue RNA-seq data. The expected results will provide an important addition to the sparse principal components analysis literature, by developing a projection-free, gradient-based algorithm with provable global convergence properties. The cell type deconvolution problem will be an interesting application combining techniques from variable selection, nonnegative matrix factorization, and optimization.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代数据采集技术产生了携带丰富信息的新型数据,但也为分析带来了新的挑战。在许多现代数据集中,基本的测量单位可以是矩阵,甚至更高阶的数组,记录一组或多组个体之间的相互作用。 例如,基因共表达网络测量特定器官组织中每对基因之间的平均相关强度。通过在不同发育阶段收集的基因共表达网络,有可能了解基因组如何以连贯的方式改变它们的行为。作为另一个例子,下一代测序技术能够产生不同尺度的基因表达数据:组织样本数据由大量组织样本中的基因表达组成,而单细胞RNA测序数据包含单个细胞的相同基因的表达。 受这些例子的启发,本研究工作旨在为复杂矩阵值数据集开发新的概率工具和统计推断方法,这将使科学家能够以连贯和有效的方式发现此类数据集中的显着结构。该项目还为研究生提供研究培训机会。本项目由两部分组成。在第一部分中,PI研究了具有跨层共享潜在结构的多层网络,并开发了有效地将不同层的信息联合收割机组合以恢复潜在结构的方法,这在只有单层可用的情况下是不可能的。预期的结果将提供新的概率定理描述的行为随机噪声的矩阵形式,以及它们的线性组合和高阶函数。在第二部分中,PI研究了一系列与组织和单细胞RNA-seq数据相关的推断问题,从计算高效的降维和变量选择开始,然后是下游推断问题,如组织RNA-seq数据中的细胞类型去卷积。 预期的结果将提供一个重要的除了稀疏主成分分析文献,通过开发一个无投影,基于梯度的算法,可证明的全局收敛性。细胞类型的反卷积问题将是一个有趣的应用相结合的技术,从变量选择,非负矩阵因式分解和optimization.This奖项反映了NSF的法定使命,并已被认为是值得通过评估使用基金会的智力价值和更广泛的影响审查标准的支持。

项目成果

期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Model selection properties of forward selection and sequential cross‐validation for high‐dimensional regression
  • DOI:
    10.1002/cjs.11635
  • 发表时间:
    2021-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. Wieczorek;Jing Lei
  • 通讯作者:
    J. Wieczorek;Jing Lei
Consistent estimation of the number of communities in stochastic block models using cross‐validation
  • DOI:
    10.1002/sta4.426
  • 发表时间:
    2021-09
  • 期刊:
  • 影响因子:
    1.7
  • 作者:
    Jining Qin;Jing Lei
  • 通讯作者:
    Jining Qin;Jing Lei
Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning
  • DOI:
    10.1093/biomet/asac041
  • 发表时间:
    2019-11
  • 期刊:
  • 影响因子:
    2.7
  • 作者:
    Yixuan Qiu;Jing Lei;K. Roeder
  • 通讯作者:
    Yixuan Qiu;Jing Lei;K. Roeder
Network representation using graph root distributions
  • DOI:
    10.1214/20-aos1976
  • 发表时间:
    2018-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jing Lei
  • 通讯作者:
    Jing Lei
Bias-Adjusted Spectral Clustering in Multi-Layer Stochastic Block Models
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Jing Lei其他文献

Performances investigations of a new combined cooling, heating and power system integrated with a chemical recuperation process
与化学回收过程集成的新型冷热电联供系统的性能研究
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    11.2
  • 作者:
    Zhang Bai;Taixiu Liu;Qibin Liu;Jing Lei;L Gong;Hongguang Jin
  • 通讯作者:
    Hongguang Jin
Tail Bounds for Matrix Quadratic Forms and Bias Adjusted Spectral Clustering in Multi-layer Stochastic Block Models
多层随机块模型中矩阵二次形式的尾界和偏差调整谱聚类
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jing Lei
  • 通讯作者:
    Jing Lei
The Design and Aerodynamic Investigation on a Wide-Speed Range inParallel Vehicle
宽速并联车辆的设计与气动研究
Marking Key Segment of Program Input via Attention Mechanism
通过注意力机制标记程序输入的关键片段
  • DOI:
    10.1109/access.2019.2960522
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    3.9
  • 作者:
    Xing Zhang;Chao Feng;Runhao Li;Jing Lei;Chaojing Tang
  • 通讯作者:
    Chaojing Tang
Convergence and concentration of empirical measures under Wasserstein distance in unbounded functional spaces
  • DOI:
    10.3150/19-bej1151
  • 发表时间:
    2018-04
  • 期刊:
  • 影响因子:
    1.5
  • 作者:
    Jing Lei
  • 通讯作者:
    Jing Lei

Jing Lei的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Jing Lei', 18)}}的其他基金

Theory and Methods for Modern Predictive Inference
现代预测推理的理论与方法
  • 批准号:
    2310764
  • 财政年份:
    2023
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Research on the Handwriting Trajectory Reconstruction and Recognition with Wearable Sensing Method
可穿戴传感方法的笔迹轨迹重建与识别研究
  • 批准号:
    18K11400
  • 财政年份:
    2018
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CAREER: Modernizing Classical Nonparametric and Multivariate Theory for Large-scale, High-dimensional Data Analysis
职业:现代化经典非参数和多元理论以进行大规模、高维数据分析
  • 批准号:
    1553884
  • 财政年份:
    2016
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Unconstrained energy harvesting and online behavior recognition based on ring-shape wearable device
基于环形可穿戴设备的无约束能量收集与在线行为识别
  • 批准号:
    26730094
  • 财政年份:
    2014
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Spectral and principal components analysis in sparse, high-dimensional data
稀疏高维数据中的谱和主成分分析
  • 批准号:
    1407771
  • 财政年份:
    2014
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant

相似国自然基金

Computational Methods for Analyzing Toponome Data
  • 批准号:
    60601030
  • 批准年份:
    2006
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Fast and Accurate Statistical Learning and Inference from Large-Scale Data: Theory, Methods, and Algorithms
职业:从大规模数据中快速准确地进行统计学习和推理:理论、方法和算法
  • 批准号:
    2046874
  • 财政年份:
    2021
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Collaborative Research: Statistical Methods, Algorithms, and Theory for Large Tensors
合作研究:大张量的统计方法、算法和理论
  • 批准号:
    1721495
  • 财政年份:
    2017
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Collaborative Research: Statistical Methods, Algorithms, and Theory for Large Tensors
合作研究:大张量的统计方法、算法和理论
  • 批准号:
    1721584
  • 财政年份:
    2017
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Collaborative Research: Statistical Methods, Algorithms, and Theory for Large Tensors
合作研究:大张量的统计方法、算法和理论
  • 批准号:
    1803450
  • 财政年份:
    2017
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
Statistical theory and methods for large-scale inference
大规模推理的统计理论与方法
  • 批准号:
    501896-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 20万
  • 项目类别:
    University Undergraduate Student Research Awards
Development of self-adaptive moving mesh methods for numerical computations of phenomena with large deformation based on the theory of integrable systems
基于可积系统理论的大变形现象数值计算自适应移动网格方法的发展
  • 批准号:
    15K04909
  • 财政年份:
    2015
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Statistical Theory and Methods for D&R Analysis of Large Complex Data
D 统计理论与方法
  • 批准号:
    1228348
  • 财政年份:
    2012
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Ultra-large-scale electronic state calculations based on ab initio theory and mathematical optimization methods
基于从头理论和数学优化方法的超大规模电子态计算
  • 批准号:
    23540370
  • 财政年份:
    2011
  • 资助金额:
    $ 20万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Theory and Applications of Stochastic First-order Methods for Large-scale Stochastic Convex Optimization
大规模随机凸优化的随机一阶方法的理论与应用
  • 批准号:
    1000347
  • 财政年份:
    2010
  • 资助金额:
    $ 20万
  • 项目类别:
    Standard Grant
Methods and Applications of Electronic Structure Theory for Large Molecules
大分子电子结构理论的方法与应用
  • 批准号:
    9981997
  • 财政年份:
    2000
  • 资助金额:
    $ 20万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了