CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding
职业:通过最小体积多面嵌入进行有原则的无监督学习
基本信息
- 批准号:2237640
- 负责人:
- 金额:$ 54万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-03-01 至 2028-02-29
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Unsupervised learning problems are in general significantly more difficult than their supervised counterparts in machine learning. This poses considerable challenges in not only machine learning research but also education, as nearly all models are NP-hard (with possibly the sole exception of PCA), and the community has been dwelling on algorithms without optimality guarantees for several decades. This project aims at developing a principled framework of minimum volume polytopic embedding that unifies various unsupervised learning problems such as independent component analysis, dictionary learning, and nonnegative matrix factorization, by treating the problem as embedding the set of data points into a regular polytope such as a simplex, a box, or an orthoplex, while guided by a novel matrix volume criterion. The benefit is two-fold: 1) it provides identifiability guarantee with finite samples, and 2) it hinges on the development of algorithms that could optimally solve these NP-hard problems under mild assumptions. The PI’s prior work has showed strong identifiability guarantees for the former benefit, while this project will focus on resolving the latter one, starting from a Frank-Wolfe algorithmic framework that has shown great empirical success. Furthermore, this project will greatly expand its application domains such as POMDP identification in reinforcement learning, aggregate flexibility in power systems, and deep polytopic word embedding in natural language processing. In terms of the mathematical framework, extensions to handle nonlinearity and deep representation learning are also developed, which have been elusive and are expected to be widely impactful beyond the main focus of theory and algorithm development in this project. Extensive education and outreach plans are laid out to corroborate the research impact and encourage students from all backgrounds to engage in computer science and machine learning research.In this project we propose a novel framework that tries to transform all data points as points in a regular polytope (such as a simplex, a box, or an orthoplex), hence the aim polytopic embedding, while guided by a novel matrix volume optimization criterion. The PI's prior work not only showed strong identifiability guarantees of the latent representation, but also found a wide variety of practical success in applications. Prior success inspires the PI to further investigate this direction, resolve unsettled theoretical challenges, broaden the learning framework, and seek even more application domains. This project will evolve along the following synergistic thrusts: in Thrust 1, a Frank-Wolfe algorithm is designed to solve the NP-hard polytopic embedding problem. Inspired by recent developments in analyzing guaranteed non-convex learning, a promising pathway is laid out to provide provable global optimality guarantees. In Thrust 2, the proposed learning framework will be used to identify an unknown POMDP from only observations with computational guarantees. Research along this thrust will be applied to healthcare recommendations from medical data. In Thrust 3, the problem of aggregate flexibility in power systems is introduced, which provides an interesting dual interpretation of polytopic embedding. Experiments on real data will validate the performance and expand the framework to handle nonlinear constraints. In Thrust 4, we propose a novel word embedding scheme with not only computational guarantee but also semantic interpretation. An extension to deep polytopic embedding framework is also introduced.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在机器学习中,无监督学习问题通常比有监督学习问题要困难得多。这不仅给机器学习研究带来了相当大的挑战,也给教育带来了相当大的挑战,因为几乎所有的模型都是np困难的(可能只有PCA例外),而且几十年来,社区一直在研究没有最优性保证的算法。该项目旨在开发一个最小体积多边形嵌入的原则框架,该框架将各种无监督学习问题(如独立分量分析、字典学习和非负矩阵分解)统一起来,通过将问题视为将数据点集嵌入到规则多面体(如单纯形、框形或正交形)中,同时遵循新的矩阵体积准则。好处是双重的:1)它提供了有限样本的可识别性保证,2)它取决于算法的发展,可以在温和的假设下最优地解决这些np困难问题。PI之前的工作表明,前者的好处有很强的可识别性保证,而本项目将重点解决后者,从一个已经取得巨大经验成功的Frank-Wolfe算法框架开始。此外,该项目将极大地扩展其应用领域,如强化学习中的POMDP识别、电力系统中的聚合灵活性、自然语言处理中的深度多面体词嵌入。在数学框架方面,还开发了处理非线性和深度表示学习的扩展,这些扩展一直难以捉摸,预计将在本项目的理论和算法开发的主要焦点之外产生广泛的影响。我们制定了广泛的教育和推广计划,以证实研究的影响,并鼓励来自各种背景的学生从事计算机科学和机器学习研究。在这个项目中,我们提出了一个新的框架,试图将所有数据点转换为规则多面体(如单纯形、框形或正交形)中的点,从而实现多面体嵌入,同时采用新的矩阵体积优化准则。PI先前的工作不仅显示了潜在表示的强可识别性保证,而且在应用中取得了各种各样的实际成功。先前的成功激励PI进一步研究这一方向,解决未解决的理论挑战,拓宽学习框架,并寻求更多的应用领域。该项目将沿着以下协同推力发展:在推力1中,设计了一个Frank-Wolfe算法来解决NP-hard多面体嵌入问题。受分析保证非凸学习的最新发展的启发,提出了一种有前途的途径来提供可证明的全局最优性保证。在推力2中,提出的学习框架将用于仅从具有计算保证的观测中识别未知的POMDP。沿着这一方向的研究将应用于医疗数据的医疗保健建议。在第3部分中,介绍了电力系统中的聚合灵活性问题,这为多面体嵌入提供了一个有趣的双重解释。实际数据的实验将验证该算法的性能,并扩展该框架以处理非线性约束。在Thrust 4中,我们提出了一种新的词嵌入方案,该方案不仅具有计算保证,而且具有语义解释。对深度多边形嵌入框架进行了扩展。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Volume-Regularized Nonnegative Tucker Decomposition with Identifiability Guarantees
具有可识别性保证的体积正则化非负 Tucker 分解
- DOI:10.1109/icassp49357.2023.10096076
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Sun, Yuchen;Huang, Kejun
- 通讯作者:Huang, Kejun
Identifiable Bounded Component Analysis Via Minimum Volume Enclosing Parallelotope
通过最小体积封闭平行位图进行可识别的有界分量分析
- DOI:10.1109/icassp49357.2023.10095905
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Hu, Jingzhou;Huang, Kejun
- 通讯作者:Huang, Kejun
Global Identifiability of L1-based Dictionary Learning via Matrix Volume Optimization
通过矩阵体积优化实现基于 L1 的字典学习的全局可识别性
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Hu, Jingzhou;Huang, Kejun
- 通讯作者:Huang, Kejun
{{
                item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:{{ item.author }} 
数据更新时间:{{ patent.updateTime }}
Kejun Huang其他文献
JULIA: Joint Multi-linear and Nonlinear Identification for Tensor Completion
JULIA:张量补全的联合多线性和非线性识别
- DOI:10.48550/arxiv.2205.03749 
- 发表时间:2022 
- 期刊:
- 影响因子:0
- 作者:Cheng Qian;Kejun Huang;Lucas Glass;R. S. Srinivasa;Jimeng Sun 
- 通讯作者:Jimeng Sun 
HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition
HOQRI:可扩展 Tucker 分解的高阶 QR 迭代
- DOI:10.1109/icassp43922.2022.9746726 
- 发表时间:2022 
- 期刊:
- 影响因子:0
- 作者:Yuchen Sun;Kejun Huang 
- 通讯作者:Kejun Huang 
Scalable and flexible Max-Var generalized canonical correlation analysis via alternating optimization
通过交替优化进行可扩展且灵活的 Max-Var 广义典型相关分析
- DOI:
- 发表时间:2017 
- 期刊:
- 影响因子:0
- 作者:Xiao Fu;Kejun Huang;Mingyi Hong;N. Sidiropoulos;A. M. So 
- 通讯作者:A. M. So 
Efficient Implementation of Stochastic Proximal Point Algorithm for Matrix and Tensor Completion
矩阵和张量补全的随机近点算法的高效实现
- DOI:
- 发表时间:2021 
- 期刊:
- 影响因子:0
- 作者:Aysegül Bumin;Kejun Huang 
- 通讯作者:Kejun Huang 
Identifying Potential Investors with Data Driven Approaches
通过数据驱动的方法识别潜在投资者
- DOI:10.1137/1.9781611976236.27 
- 发表时间:2020 
- 期刊:
- 影响因子:5.4
- 作者:Bo Yang;Kejun Huang;N. Sidiropoulos 
- 通讯作者:N. Sidiropoulos 
Kejun Huang的其他文献
{{
              item.title }}
{{ item.translation_title }}
- DOI:{{ item.doi }} 
- 发表时间:{{ item.publish_year }} 
- 期刊:
- 影响因子:{{ item.factor }}
- 作者:{{ item.authors }} 
- 通讯作者:{{ item.author }} 
相似海外基金
A Principled Framework for Explaining, Choosing and Negotiating Privacy Parameters of Differential Privacy
解释、选择和协商差异隐私的隐私参数的原则框架
- 批准号:23K24851 
- 财政年份:2024
- 资助金额:$ 54万 
- 项目类别:Grant-in-Aid for Scientific Research (B) 
CAREER: Principled yet practical observability for a microservices-based cloud
职业:基于微服务的云的原则性且实用的可观察性
- 批准号:2340128 
- 财政年份:2024
- 资助金额:$ 54万 
- 项目类别:Continuing Grant 
Principled phylogenomic analysis without gene tree estimation
无需基因树估计的有原则的系统发育分析
- 批准号:2308495 
- 财政年份:2023
- 资助金额:$ 54万 
- 项目类别:Standard Grant 
A principled generalization of the maximum entropy principle for non-Shannon systems
非香农系统最大熵原理的原则概括
- 批准号:23K16855 
- 财政年份:2023
- 资助金额:$ 54万 
- 项目类别:Grant-in-Aid for Early-Career Scientists 
A Principled Framework for Explaining, Choosing and Negotiating Privacy Parameters of Differential Privacy
解释、选择和协商差异隐私的隐私参数的原则框架
- 批准号:22H03595 
- 财政年份:2022
- 资助金额:$ 54万 
- 项目类别:Grant-in-Aid for Scientific Research (B) 
CAREER: Principled Approaches to Securing Next-Generation Cellular Networks
职业:保护下一代蜂窝网络的原则性方法
- 批准号:2145631 
- 财政年份:2022
- 资助金额:$ 54万 
- 项目类别:Continuing Grant 
Principled approaches to deep learning: generalization under distribution shift and predictive uncertainty
深度学习的原则方法:分布变化和预测不确定性下的泛化
- 批准号:RGPIN-2022-03609 
- 财政年份:2022
- 资助金额:$ 54万 
- 项目类别:Discovery Grants Program - Individual 
NeTS: Small: Hybrid Switching in Data Center Networks: Systems-driven Modeling and Principled Algorithms
NetS:小型:数据中心网络中的混合交换:系统驱动的建模和原理算法
- 批准号:2309187 
- 财政年份:2022
- 资助金额:$ 54万 
- 项目类别:Standard Grant 
Collaborative: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
协作:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
- 批准号:2219810 
- 财政年份:2022
- 资助金额:$ 54万 
- 项目类别:Standard Grant 
Collaborative Research: FMitF: Track I: A Principled Approach to Modeling and Analysis of Hardware Fault Attacks on Embedded Software
合作研究:FMitF:第一轨:嵌入式软件硬件故障攻击建模和分析的原则方法
- 批准号:2220345 
- 财政年份:2022
- 资助金额:$ 54万 
- 项目类别:Standard Grant 

 刷新
              刷新
            
















 {{item.name}}会员
              {{item.name}}会员
            



