CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding

职业:通过最小体积多面嵌入进行有原则的无监督学习

基本信息

  • 批准号:
    2237640
  • 负责人:
  • 金额:
    $ 54万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-03-01 至 2028-02-29
  • 项目状态:
    未结题

项目摘要

Unsupervised learning problems are in general significantly more difficult than their supervised counterparts in machine learning. This poses considerable challenges in not only machine learning research but also education, as nearly all models are NP-hard (with possibly the sole exception of PCA), and the community has been dwelling on algorithms without optimality guarantees for several decades. This project aims at developing a principled framework of minimum volume polytopic embedding that unifies various unsupervised learning problems such as independent component analysis, dictionary learning, and nonnegative matrix factorization, by treating the problem as embedding the set of data points into a regular polytope such as a simplex, a box, or an orthoplex, while guided by a novel matrix volume criterion. The benefit is two-fold: 1) it provides identifiability guarantee with finite samples, and 2) it hinges on the development of algorithms that could optimally solve these NP-hard problems under mild assumptions. The PI’s prior work has showed strong identifiability guarantees for the former benefit, while this project will focus on resolving the latter one, starting from a Frank-Wolfe algorithmic framework that has shown great empirical success. Furthermore, this project will greatly expand its application domains such as POMDP identification in reinforcement learning, aggregate flexibility in power systems, and deep polytopic word embedding in natural language processing. In terms of the mathematical framework, extensions to handle nonlinearity and deep representation learning are also developed, which have been elusive and are expected to be widely impactful beyond the main focus of theory and algorithm development in this project. Extensive education and outreach plans are laid out to corroborate the research impact and encourage students from all backgrounds to engage in computer science and machine learning research.In this project we propose a novel framework that tries to transform all data points as points in a regular polytope (such as a simplex, a box, or an orthoplex), hence the aim polytopic embedding, while guided by a novel matrix volume optimization criterion. The PI's prior work not only showed strong identifiability guarantees of the latent representation, but also found a wide variety of practical success in applications. Prior success inspires the PI to further investigate this direction, resolve unsettled theoretical challenges, broaden the learning framework, and seek even more application domains. This project will evolve along the following synergistic thrusts: in Thrust 1, a Frank-Wolfe algorithm is designed to solve the NP-hard polytopic embedding problem. Inspired by recent developments in analyzing guaranteed non-convex learning, a promising pathway is laid out to provide provable global optimality guarantees. In Thrust 2, the proposed learning framework will be used to identify an unknown POMDP from only observations with computational guarantees. Research along this thrust will be applied to healthcare recommendations from medical data. In Thrust 3, the problem of aggregate flexibility in power systems is introduced, which provides an interesting dual interpretation of polytopic embedding. Experiments on real data will validate the performance and expand the framework to handle nonlinear constraints. In Thrust 4, we propose a novel word embedding scheme with not only computational guarantee but also semantic interpretation. An extension to deep polytopic embedding framework is also introduced.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
一般来说,无监督学习问题比机器学习中的有监督问题要困难得多。这不仅在机器学习研究中,而且在教育中都带来了相当大的挑战,因为几乎所有的模型都是NP难的(可能只有PCA例外),而且几十年来,社区一直在研究没有最优性保证的算法。该项目旨在开发一个最小体积多面体嵌入的原则性框架,该框架统一了各种无监督学习问题,如独立成分分析,字典学习和非负矩阵分解,通过将数据点集嵌入到规则多面体(如单纯形,盒子或正交多面体)中来处理问题,同时由新的矩阵体积标准指导。好处是双重的:1)它提供了有限样本的可识别性保证,2)它取决于算法的发展,可以在温和的假设下最优地解决这些NP难问题。PI先前的工作已经显示出对前一个好处的强可识别性保证,而本项目将专注于解决后一个问题,从Frank-Wolfe算法框架开始,该框架已经显示出巨大的经验成功。此外,该项目还将大大扩展其应用领域,如强化学习中的POMDP识别,电力系统中的聚合灵活性,以及自然语言处理中的深度多面体词嵌入。在数学框架方面,还开发了处理非线性和深度表示学习的扩展,这些扩展一直难以实现,预计将在该项目的理论和算法开发的主要焦点之外产生广泛的影响。我们制定了广泛的教育和推广计划,以证实研究的影响,并鼓励来自各种背景的学生从事计算机科学和机器学习研究。在这个项目中,我们提出了一个新颖的框架,试图将所有数据点转换为规则多面体中的点。(例如单纯形、盒子或正交复合形),因此目标多面体嵌入,同时由新的矩阵体积优化准则指导。PI先前的工作不仅显示了潜在表示的强可识别性保证,而且在应用中也取得了广泛的实际成功。先前的成功激励PI进一步研究这个方向,解决未解决的理论挑战,拓宽学习框架,并寻求更多的应用领域。这个项目将沿着以下协同推力发展:在推力1中,设计了一个Frank-Wolfe算法来解决NP难的多面体嵌入问题。受最近在分析保证非凸学习方面的发展的启发,提出了一种有希望的途径来提供可证明的全局最优性保证。在推力2中,所提出的学习框架将用于仅从具有计算保证的观测中识别未知的POMDP。研究沿着这一推力将适用于医疗保健建议的医疗数据。在推力3中,引入了电力系统中的聚合柔性问题,这提供了一个有趣的多面体嵌入的双重解释。在真实的数据上的实验将验证性能并扩展框架以处理非线性约束。在Thrust 4中,我们提出了一种新的词嵌入方案,不仅具有计算保证,而且具有语义解释。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees
具有可识别性保证的体积正则化非负 Tucker 分解
Identifiable Bounded Component Analysis Via Minimum Volume Enclosing Parallelotope
通过最小体积封闭平行位图进行可识别的有界分量分析
Global Identifiability of L1-based Dictionary Learning via Matrix Volume Optimization
通过矩阵体积优化实现基于 L1 的字典学习的全局可识别性
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Kejun Huang其他文献

JULIA: Joint Multi-linear and Nonlinear Identification for Tensor Completion
JULIA:张量补全的联合多线性和非线性识别
  • DOI:
    10.48550/arxiv.2205.03749
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Cheng Qian;Kejun Huang;Lucas Glass;R. S. Srinivasa;Jimeng Sun
  • 通讯作者:
    Jimeng Sun
HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition
HOQRI:可扩展 Tucker 分解的高阶 QR 迭代
Scalable and flexible Max-Var generalized canonical correlation analysis via alternating optimization
通过交替优化进行可扩展且灵活的 Max-Var 广义典型相关分析
Efficient Implementation of Stochastic Proximal Point Algorithm for Matrix and Tensor Completion
矩阵和张量补全的随机近点算法的高效实现
Identifying Potential Investors with Data Driven Approaches
通过数据驱动的方法识别潜在投资者
  • DOI:
    10.1137/1.9781611976236.27
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    5.4
  • 作者:
    Bo Yang;Kejun Huang;N. Sidiropoulos
  • 通讯作者:
    N. Sidiropoulos

Kejun Huang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似海外基金

A Principled Framework for Explaining, Choosing and Negotiating Privacy Parameters of Differential Privacy
解释、选择和协商差异隐私的隐私参数的原则框架
  • 批准号:
    23K24851
  • 财政年份:
    2024
  • 资助金额:
    $ 54万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
CAREER: Principled yet practical observability for a microservices-based cloud
职业:基于微服务的云的原则性且实用的可观察性
  • 批准号:
    2340128
  • 财政年份:
    2024
  • 资助金额:
    $ 54万
  • 项目类别:
    Continuing Grant
Principled phylogenomic analysis without gene tree estimation
无需基因树估计的有原则的系统发育分析
  • 批准号:
    2308495
  • 财政年份:
    2023
  • 资助金额:
    $ 54万
  • 项目类别:
    Standard Grant
A principled generalization of the maximum entropy principle for non-Shannon systems
非香农系统最大熵原理的原则概括
  • 批准号:
    23K16855
  • 财政年份:
    2023
  • 资助金额:
    $ 54万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
A Principled Framework for Explaining, Choosing and Negotiating Privacy Parameters of Differential Privacy
解释、选择和协商差异隐私的隐私参数的原则框架
  • 批准号:
    22H03595
  • 财政年份:
    2022
  • 资助金额:
    $ 54万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
CAREER: Principled Approaches to Securing Next-Generation Cellular Networks
职业:保护下一代蜂窝网络的原则性方法
  • 批准号:
    2145631
  • 财政年份:
    2022
  • 资助金额:
    $ 54万
  • 项目类别:
    Continuing Grant
Principled approaches to deep learning: generalization under distribution shift and predictive uncertainty
深度学习的原则方法:分布变化和预测不确定性下的泛化
  • 批准号:
    RGPIN-2022-03609
  • 财政年份:
    2022
  • 资助金额:
    $ 54万
  • 项目类别:
    Discovery Grants Program - Individual
NeTS: Small: Hybrid Switching in Data Center Networks: Systems-driven Modeling and Principled Algorithms
NetS:小型:数据中心网络中的混合交换:系统驱动的建模和原理算法
  • 批准号:
    2309187
  • 财政年份:
    2022
  • 资助金额:
    $ 54万
  • 项目类别:
    Standard Grant
Collaborative: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
协作:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
  • 批准号:
    2219810
  • 财政年份:
    2022
  • 资助金额:
    $ 54万
  • 项目类别:
    Standard Grant
Collaborative Research: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
合作研究:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
  • 批准号:
    2220345
  • 财政年份:
    2022
  • 资助金额:
    $ 54万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了