权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding

职业：通过最小体积多面嵌入进行有原则的无监督学习

基本信息

批准号：
2237640
负责人：
Kejun Huang
金额：
$ 54万
依托单位：
University of Florida
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-03-01 至 2028-02-29
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2237640&HistoricalAwards=false
关键词：
CAREER Principled Unsupervised Learning via

项目摘要

Unsupervised learning problems are in general significantly more difficult than their supervised counterparts in machine learning. This poses considerable challenges in not only machine learning research but also education, as nearly all models are NP-hard (with possibly the sole exception of PCA), and the community has been dwelling on algorithms without optimality guarantees for several decades. This project aims at developing a principled framework of minimum volume polytopic embedding that unifies various unsupervised learning problems such as independent component analysis, dictionary learning, and nonnegative matrix factorization, by treating the problem as embedding the set of data points into a regular polytope such as a simplex, a box, or an orthoplex, while guided by a novel matrix volume criterion. The benefit is two-fold: 1) it provides identifiability guarantee with finite samples, and 2) it hinges on the development of algorithms that could optimally solve these NP-hard problems under mild assumptions. The PI’s prior work has showed strong identifiability guarantees for the former benefit, while this project will focus on resolving the latter one, starting from a Frank-Wolfe algorithmic framework that has shown great empirical success. Furthermore, this project will greatly expand its application domains such as POMDP identification in reinforcement learning, aggregate flexibility in power systems, and deep polytopic word embedding in natural language processing. In terms of the mathematical framework, extensions to handle nonlinearity and deep representation learning are also developed, which have been elusive and are expected to be widely impactful beyond the main focus of theory and algorithm development in this project. Extensive education and outreach plans are laid out to corroborate the research impact and encourage students from all backgrounds to engage in computer science and machine learning research.In this project we propose a novel framework that tries to transform all data points as points in a regular polytope (such as a simplex, a box, or an orthoplex), hence the aim polytopic embedding, while guided by a novel matrix volume optimization criterion. The PI's prior work not only showed strong identifiability guarantees of the latent representation, but also found a wide variety of practical success in applications. Prior success inspires the PI to further investigate this direction, resolve unsettled theoretical challenges, broaden the learning framework, and seek even more application domains. This project will evolve along the following synergistic thrusts: in Thrust 1, a Frank-Wolfe algorithm is designed to solve the NP-hard polytopic embedding problem. Inspired by recent developments in analyzing guaranteed non-convex learning, a promising pathway is laid out to provide provable global optimality guarantees. In Thrust 2, the proposed learning framework will be used to identify an unknown POMDP from only observations with computational guarantees. Research along this thrust will be applied to healthcare recommendations from medical data. In Thrust 3, the problem of aggregate flexibility in power systems is introduced, which provides an interesting dual interpretation of polytopic embedding. Experiments on real data will validate the performance and expand the framework to handle nonlinear constraints. In Thrust 4, we propose a novel word embedding scheme with not only computational guarantee but also semantic interpretation. An extension to deep polytopic embedding framework is also introduced.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在机器学习中，无监督学习问题通常比有监督学习问题要困难得多。这不仅给机器学习研究带来了相当大的挑战，也给教育带来了相当大的挑战，因为几乎所有的模型都是NP-Hard(可能唯一的例外是PCA)，几十年来，社区一直在研究没有最优性保证的算法。该项目旨在开发一个最小体积多面体嵌入的原则性框架，该框架统一了独立分量分析、字典学习和非负矩阵分解等各种无监督学习问题，在新的矩阵体积准则的指导下，将该问题视为将数据点集合嵌入到规则多面体(如单纯形、盒或正多面体)中。它的好处有两个：1)它提供了有限样本的可辨识性保证；2)它取决于算法的发展，这些算法可以在温和的假设下以最优方式解决这些NP-Hard问题。PI之前的工作已经显示出对前一个好处的强有力的可识别性保证，而这个项目将专注于解决后一个问题，从Frank-Wolfe算法框架开始，该算法框架已经显示出巨大的经验成功。此外，该项目还将极大地扩展其在强化学习中的POMDP识别、电力系统中的聚合灵活性以及自然语言处理中的深度多面词嵌入等应用领域。在数学框架方面，还开发了处理非线性和深度表示学习的扩展，这些扩展一直难以捉摸，预计将产生广泛影响，超出本项目理论和算法开发的主要重点。在这个项目中，我们提出了一个新的框架，试图将所有数据点转换为规则多面体(如单纯形、长方体或正多面体)中的点，从而实现多面体嵌入的目标，同时遵循新的矩阵体积优化准则。PI的前人工作不仅显示了潜在表示的强可识别性保证，而且在实际应用中也取得了广泛的成功。以往的成功促使PI进一步研究这一方向，解决悬而未决的理论挑战，拓宽学习框架，寻求更多的应用领域。该项目将沿着以下协同推力发展：在推力1中，设计了一个Frank-Wolfe算法来解决NP-Hard多面体嵌入问题。受分析保证非凸学习的最新发展的启发，提出了一条提供可证明的全局最优性保证的有希望的途径。在推力2中，所提出的学习框架将用于从仅有计算保证的观测中识别未知的POMDP。沿着这一方向进行的研究将应用于来自医疗数据的医疗保健建议。在推力3中，引入了电力系统中的聚合柔性问题，这为多面体嵌入提供了一种有趣的双重解释。在真实数据上的实验将验证该框架的性能，并扩展该框架以处理非线性约束。在推力4中，我们提出了一种新的词嵌入方案，该方案不仅具有计算保证，而且具有语义解释能力。还介绍了对深度多面体嵌入框架的扩展。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees

具有可识别性保证的体积正则化非负 Tucker 分解

DOI：
10.1109/icassp49357.2023.10096076
发表时间：
2023
期刊：
Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing
影响因子：
0
作者：
Sun, Yuchen;Huang, Kejun
通讯作者：
Huang, Kejun

Identifiable Bounded Component Analysis Via Minimum Volume Enclosing Parallelotope

通过最小体积封闭平行位图进行可识别的有界分量分析

DOI：
10.1109/icassp49357.2023.10095905
发表时间：
2023
期刊：
Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing
影响因子：
0
作者：
Hu, Jingzhou;Huang, Kejun
通讯作者：
Huang, Kejun

Global Identifiability of L1-based Dictionary Learning via Matrix Volume Optimization

通过矩阵体积优化实现基于 L1 的字典学习的全局可识别性