CIF: CAREER: Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering

CIF:职业:使用 K 集聚类进行稳健、可解释且高效的无监督学习

基本信息

项目摘要

Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data. These problems are called 'unsupervised' because no human-provided information about the data is available to guide the machine learning process. Arguably the two most important unsupervised machine learning tools are dimensionality-reduction and clustering. In dimensionality-reduction, the algorithm seeks a simple low-dimensional structure that captures the interesting behavior in the data. In clustering, the algorithm seeks to group data points together into meaningful clusters. As increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable. However, joint formulations in the literature are often ad-hoc and fundamentally unable to operate on real data that have missing elements, corruptions, and heterogeneity --- critical machine learning challenges for modern data problems. This research project is expected to have broad applicability in data science, and will be demonstrated in two applications: genetics and computer vision. The joint clustering and dimensionality reduction formulation used in this project, called K-set clustering, seeks K "central sets" constrained to have some low-dimensional representation, each of which represents one of K clusters in the data. The formulation is a generalization of K-means, K-subspaces, and principal component analysis, and it naturally leads to several novel problem instances. Given a defined set geometry, the corresponding problem instance is approached from two perspectives: understanding the geometry of that instance of the problem formulation, and learning those geometric models from data. Three specific examples of the problem formulation will be studied: subspace clustering, variety clustering, and polyhedral set clustering. While each example presents intrinsic and unique challenges, these are just examples of a larger paradigm that is limited only by one's ability to define sets amenable to modeling the geometric structure in data. The formulation allows for interpretable data analysis, with a framework that can readily incorporate missing data and heterogeneous data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代机器学习技术旨在设计模型和算法,使计算机能够从大量以前未探索的数据中有效学习。这些问题被称为“无监督”,因为没有人类提供的关于数据的信息可用于指导机器学习过程。可以说,两个最重要的无监督机器学习工具是降维和聚类。在降维中,该算法寻求一个简单的低维结构,以捕获数据中有趣的行为。在聚类中,该算法寻求将数据点分组到有意义的集群中。随着越来越多的高维数据被收集,越来越复杂的物理,生物和社会现象,旨在降维和聚类的算法往往是高度适用的。然而,文献中的联合公式通常是临时的,并且从根本上无法对具有缺失元素、损坏和异质性的真实的数据进行操作--这是现代数据问题的关键机器学习挑战。该研究项目预计将在数据科学中具有广泛的适用性,并将在遗传学和计算机视觉两个应用中得到证明。在这个项目中使用的联合聚类和降维公式,称为K集聚类,寻找K个“中心集”,这些中心集被限制为具有一些低维表示,每个中心集代表数据中的K个聚类之一。该公式是K-均值,K-子空间和主成分分析的推广,它自然会导致几个新的问题实例。给定一个定义的集合几何,相应的问题实例从两个角度来处理:理解问题公式化的该实例的几何,并从数据中学习这些几何模型。三个具体的例子的问题制定将进行研究:子空间聚类,品种聚类和多面体集聚类。虽然每个例子都提出了固有的和独特的挑战,这些只是一个更大的范例的例子,只限于一个人的能力,以定义适合建模的几何结构的数据集。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(25)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold
  • DOI:
    10.48550/arxiv.2209.09211
  • 发表时间:
    2022-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Can Yaras;Peng Wang;Zhihui Zhu;L. Balzano;Qing Qu
  • 通讯作者:
    Can Yaras;Peng Wang;Zhihui Zhu;L. Balzano;Qing Qu
Scaling-Up Distributed Processing of Data Streams for Machine Learning
  • DOI:
    10.1109/jproc.2020.3021381
  • 发表时间:
    2020-05
  • 期刊:
  • 影响因子:
    20.6
  • 作者:
    M. Nokleby;Haroon Raja;W. Bajwa
  • 通讯作者:
    M. Nokleby;Haroon Raja;W. Bajwa
FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization
  • DOI:
    10.48550/arxiv.2205.02215
  • 发表时间:
    2022-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Davoud Ataee Tarzanagh;Mingchen Li;Christos Thrampoulidis;Samet Oymak
  • 通讯作者:
    Davoud Ataee Tarzanagh;Mingchen Li;Christos Thrampoulidis;Samet Oymak
Preference Modeling with Context-Dependent Salient Features
  • DOI:
  • 发表时间:
    2020-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amanda Bower;L. Balzano
  • 通讯作者:
    Amanda Bower;L. Balzano
HePPCAT: Probabilistic PCA for Data With Heteroscedastic Noise
  • DOI:
    10.1109/tsp.2021.3104979
  • 发表时间:
    2021-01
  • 期刊:
  • 影响因子:
    5.4
  • 作者:
    David Hong;Kyle Gilman;L. Balzano;J. Fessler
  • 通讯作者:
    David Hong;Kyle Gilman;L. Balzano;J. Fessler
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Laura Balzano其他文献

Laura Balzano的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Laura Balzano', 18)}}的其他基金

CIF: Small: Learning Low-Dimensional Representations with Heteroscedastic Data Sources
CIF:小:使用异方差数据源学习低维表示
  • 批准号:
    2331590
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Standard Grant
BRIGE: Simultaneous Modeling and Calibration for Environmental Sensor Data
BRIGE:环境传感器数据的同步建模和校准
  • 批准号:
    1342121
  • 财政年份:
    2013
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Standard Grant

相似海外基金

CAREER: Game Theoretic Models for Robust Cyber-Physical Interactions: Inference and Design under Uncertainty
职业:稳健的网络物理交互的博弈论模型:不确定性下的推理和设计
  • 批准号:
    2336840
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Robust, Fair, and Culturally Aware Commonsense Reasoning in Natural Language
职业:用自然语言进行稳健、公平和具有文化意识的常识推理
  • 批准号:
    2339746
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Optimal Transport Beyond Probability Measures for Robust Geometric Representation Learning
职业生涯:超越概率测量的最佳传输以实现稳健的几何表示学习
  • 批准号:
    2339898
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits
职业:模型不确定性下的鲁棒强化学习:算法和基本限制
  • 批准号:
    2337375
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Unary Computing in Memory for Fast, Robust and Energy-Efficient Processing
职业:内存中的一元计算,实现快速、稳健和节能的处理
  • 批准号:
    2339701
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Theoretical and Computational Advances for Enabling Robust Numerical Guarantees in Linear and Mixed Integer Programming Solvers
职业:在线性和混合整数规划求解器中实现鲁棒数值保证的理论和计算进展
  • 批准号:
    2340527
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
CAREER: Robust, Reversible, and Stimuli-responsive Thermodynamic Adhesion in Hydrogels
事业:水凝胶中稳健、可逆且刺激响应的热力学粘附
  • 批准号:
    2337592
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Standard Grant
CAREER: Unraveling Oxygen Electrode Delamination Mechanisms in Reversible Solid Oxide Cells for Robust Hydrogen Production
职业:揭示可逆固体氧化物电池中的氧电极分层机制,以实现稳健的氢气生产
  • 批准号:
    2336465
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Standard Grant
CAREER: Robust and Lightweight Formal Methods for Mobile Robot System Development
职业:用于移动机器人系统开发的稳健且轻量级的形式化方法
  • 批准号:
    2338706
  • 财政年份:
    2024
  • 资助金额:
    $ 59.68万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了