CIF:Small: Theory and Methods for Simultaneous Feature Auto-grouping and Dimension Reduction in Supervised Multivariate Learning

CIF:Small:监督多元学习中同时特征自动分组和降维的理论和方法

基本信息

  • 批准号:
    2105818
  • 负责人:
  • 金额:
    $ 33.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-06-01 至 2025-05-31
  • 项目状态:
    未结题

项目摘要

Modern real-world applications have created an urgent need for analyzing and interpreting high-dimensional data with low-dimensional structures. In situations where a large number of response variables is present, very few features may be completely irrelevant to the entire set of responses; this leads to ineffective sparsity-based variable selection and to non-interpretable vanilla low-rank modeling. To address these issues, this project proposes grouping the features based on their contributions to the response variables, in a possibly low-dimensional subspace, in order to build a more parsimonious and interpretable model. In the context of multivariate learning, the intrinsic cost of searching for clusters and the potential adverse effect of high-dimensionality on signal recovery are not yet fully understood. Another critical challenge in the big-data era is to develop efficient optimization algorithms with rigorous convergence guarantees. The fact that the obtained algorithmic solutions may not be globally optimal, due to the non-convexity of the problem, makes the statistical error analysis nontrivial. The associated model-selection problem is another unsolved problem in the context of clustering, most notably when the number of features and/or the number of responses go beyond the sample size. To answer these questions, innovative and transformative statistical methods are being introduced, and the proposed algorithms are being analyzed to demonstrate their efficiency. The project covers potential applications in a wide range of areas such as machine learning, genomics, and macro-econometrics, and will help cross-fertilize ideas from statistics, operations research, economics, and bio-engineering. Education activities are tightly coupled with research, and include course development, student mentoring, outreach, and recruiting underrepresented students. The project proposes a novel clustered reduced-rank learning framework that utilizes joint matrix regularizations to relax the stringent assumption of sparsity-based learning and to gain interpretability as compared with vanilla low-rank modeling. Some universal information-theoretic limits are revealing the intrinsic cost of searching for clusters regardless of the estimator in use, as well as the benefit of accumulating a large number of response variables in multivariate learning. Efficient optimization algorithm that perform simultaneous subspace learning and clustering are being developed; the resulting fixed-point estimators, while not necessarily globally optimal, still enjoy the desired statistical accuracy beyond the standard likelihood setup. Finally, a new kind of information criterion for joint cluster and rank selection is being proposed, without assuming either infinite sample size or large signal-to-noise ratio. The research is creating a fusion between statistics, information theory, nonconvex optimization, and model selection, with real-world applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代现实世界的应用迫切需要分析和解释具有低维结构的高维数据。在存在大量响应变量的情况下,很少的特征可能与整个响应集完全无关;这导致基于稀疏性的变量选择无效和不可解释的普通低阶建模。为了解决这些问题,该项目建议根据特征对响应变量的贡献将其分组在可能的低维子空间中,以建立更简明和可解释的模型。在多变量学习的背景下,寻找聚类的内在代价和高维对信号恢复的潜在不利影响尚未被完全理解。大数据时代的另一个关键挑战是开发具有严格收敛保证的高效优化算法。由于问题的非凸性,所得到的算法解可能不是全局最优的,这使得统计误差分析不是平凡的。在聚类的背景下,关联的模型选择问题是另一个未解决的问题,最明显的是当特征的数量和/或响应的数量超过样本大小时。为了回答这些问题,正在引入创新和变革性的统计方法,并对拟议的算法进行分析,以证明其效率。该项目涵盖了机器学习、基因组学和宏观计量经济学等广泛领域的潜在应用,并将有助于交叉培养统计学、运筹学、经济学和生物工程的想法。教育活动与研究紧密结合,包括课程开发、学生指导、外展和招收代表性不足的学生。该项目提出了一种新颖的聚类降维学习框架,该框架利用联合矩阵正则化来放松基于稀疏性的学习的严格假设,并与普通的低阶建模相比获得可解释性。一些普遍的信息论限制揭示了在不考虑使用哪种估计器的情况下搜索集群的内在成本,以及在多变量学习中积累大量响应变量的好处。同时执行子空间学习和聚类的高效优化算法正在被开发;所得到的定点估计器虽然不一定是全局最优的,但仍然享有超过标准似然设置的期望的统计精度。最后,提出了一种新的信息准则,在不假定样本容量无限大或信噪比较大的情况下,提出了一种新的联合聚类和秩值选择准则。这项研究正在创造统计学、信息论、非凸优化和模型选择的融合,并结合现实世界的应用。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Supervised multivariate learning with simultaneous feature auto‐grouping and dimension reduction
Network Pruning via Annealing and Direct Sparsity Control
Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning
非光滑非凸统计学习的广义Bregman代理算法分析
  • DOI:
    10.1214/21-aos2090
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    She, Yiyuan;Wang, Zhifeng;Jin, Jiuwu
  • 通讯作者:
    Jin, Jiuwu
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yiyuan She其他文献

Indirect Gaussian Graph Learning Beyond Gaussianity
超越高斯性的间接高斯图学习
Reduced Rank Vector Generalized Linear Models for Feature Extraction
  • DOI:
    10.4310/sii.2013.v6.n2.a4
  • 发表时间:
    2010-07
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yiyuan She
  • 通讯作者:
    Yiyuan She
Supplementary Material for ‘Robust Orthogonal Complement Principal Component Analysis’
“稳健正交补主成分分析”的补充材料
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yiyuan She;Shijie Li;D. Wu
  • 通讯作者:
    D. Wu
Selective Factor Extraction in High Dimensions
  • DOI:
    10.1093/biomet/asw059
  • 发表时间:
    2014-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yiyuan She
  • 通讯作者:
    Yiyuan She
Joint Association Graph Screening and Decomposition for Large-Scale Linear Dynamical Systems
大规模线性动力系统的联合关联图筛选与分解

Yiyuan She的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yiyuan She', 18)}}的其他基金

Slow Kill for Big Data Learning
大数据学习的慢杀
  • 批准号:
    2113599
  • 财政年份:
    2021
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
CIF: Small: Collaborative Research: Scalable Nonconvex Optimization with Statistical Guarantees for Information Computing in High Dimensions
CIF:小型:协作研究:具有统计保证的可扩展非凸优化,用于高维信息计算
  • 批准号:
    1617801
  • 财政年份:
    2016
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
CAREER: Theory and Methods for Simultaneous Variable Selection and Rank Reduction
职业:同时变量选择和降级的理论和方法
  • 批准号:
    1352259
  • 财政年份:
    2014
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Continuing Grant
CIF: Small: Collaborative Research: Compressed Sensing for Coherent Designs under Gaussian/Non-Gaussian Noise
CIF:小型:协作研究:高斯/非高斯噪声下相干设计的压缩感知
  • 批准号:
    1116447
  • 财政年份:
    2011
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant

相似国自然基金

昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    0.0 万元
  • 项目类别:
    省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
  • 批准号:
    32000033
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
  • 批准号:
    31972324
  • 批准年份:
    2019
  • 资助金额:
    58.0 万元
  • 项目类别:
    面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
  • 批准号:
    81900988
  • 批准年份:
    2019
  • 资助金额:
    21.0 万元
  • 项目类别:
    青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
  • 批准号:
    31870821
  • 批准年份:
    2018
  • 资助金额:
    56.0 万元
  • 项目类别:
    面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
  • 批准号:
    31802058
  • 批准年份:
    2018
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
  • 批准号:
    31772128
  • 批准年份:
    2017
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
  • 批准号:
    81704176
  • 批准年份:
    2017
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
  • 批准号:
    91640114
  • 批准年份:
    2016
  • 资助金额:
    85.0 万元
  • 项目类别:
    重大研究计划

相似海外基金

Collaborative Research: CIF: Small: Theory for Learning Lossless and Lossy Coding
协作研究:CIF:小型:学习无损和有损编码的理论
  • 批准号:
    2324396
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
CIF: Small: Theory and Algorithms for Efficient and Large-Scale Monte Carlo Tree Search
CIF:小型:高效大规模蒙特卡罗树搜索的理论和算法
  • 批准号:
    2327013
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Small: New Theory, Algorithms and Applications for Large-Scale Bilevel Optimization
合作研究:CIF:小型:大规模双层优化的新理论、算法和应用
  • 批准号:
    2311274
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Small: New Theory, Algorithms and Applications for Large-Scale Bilevel Optimization
合作研究:CIF:小型:大规模双层优化的新理论、算法和应用
  • 批准号:
    2311275
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Small: Theory for Learning Lossless and Lossy Coding
协作研究:CIF:小型:学习无损和有损编码的理论
  • 批准号:
    2324397
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
CIF: Small: Shared Information: Theory and Applications
CIF:小:共享信息:理论与应用
  • 批准号:
    2310203
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
CIF: Small: Multidimensional Remaindering Theory and Applications
CIF:小:多维余数理论与应用
  • 批准号:
    2246917
  • 财政年份:
    2023
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Small: New Theory and Applications of Non-smooth and Non-Lipschitz Riemannian Optimization
合作研究:CIF:小:非光滑和非Lipschitz黎曼优化的新理论和应用
  • 批准号:
    2308597
  • 财政年份:
    2022
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Small: New Theory and Applications of Non-smooth and Non-Lipschitz Riemannian Optimization
合作研究:CIF:小:非光滑和非Lipschitz黎曼优化的新理论和应用
  • 批准号:
    2007797
  • 财政年份:
    2020
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
CIF: Small: Poisson matching: A new tool for information theory
CIF:小:泊松匹配:信息论的新工具
  • 批准号:
    2007965
  • 财政年份:
    2020
  • 资助金额:
    $ 33.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了