权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Soft-Clustering - From Heuristics To Approximation Algorithms

软聚类 - 从启发式到近似算法

基本信息

批准号：
324506751
负责人：
Professor Dr. Johannes Blömer
金额：
--
依托单位：
Fachgruppe Codes und Kryptographie
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2017
资助国家：
德国
起止时间：
2016-12-31 至 2020-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/324506751?language=en
关键词：
Soft Clustering Heuristics Approximation Algorithms

项目摘要

A clustering is a partition of a set of objects into groups, so-called clusters. The computation of a clustering is a typical problem from machine learning and data mining with applications in several fields, including data compression, image and pattern recognition and signal processing. One of the most well-known clustering problems is the so-called K-means problem. However, in practice it does not always make sense to assign each data point to a unique cluster. That is why so-called soft clustering problems assign each data point to each cluster with some degree of membership. One of the most well-known soft clustering problems is the maximum likelihood estimation (MLE) problem with respect to mixture models. The Expectation Maximization (EM) algorithm is a popular approach to solve this problem in practical applications. It is a generalization of Lloyd's algorithm, which is readily applied when solving the K-means problem. Even though it is regularly used, the EM algorithm is a heuristic with two significant downsides. First, there is no guarantee on the solutions that the EM algorithm computes. Second, there is no known upper bound on its runtime. The main goal of this project is the algorithmic and complexity theoretical analysis of the MLE problem for mixture models. Comparing the MLE problem to the well-analyzed K-means problem, one can identify two substantial structural differences. The distribution of the memberships of points in the sense of a soft clustering, and the cost function in the form of a mixture model. That is why we want to approach the MLE problem from two different directions. To this end, we want to examine several variants "inbetween" the K-means and the MLE problem. The core idea is that these variants do not exhibit both of the previously addressed differences to the K-means problem, but in each case only one of them. In particular, we analyze the fuzzy K-means and the CMLE problem. These problems are, aside from the role they assume in this project as "intermediates" between the K-means and MLE problem, of great practical interest. Analogously to Lloyd's algorithm and the EM algorithm, there are heuristics for the fuzzy K-means and the CMLE problem, which are usually applied in practice. Our goal is to develop algorithms that solve these problems with provable guarantees on the quality of solutions and on the runtime. Additionally, we want to work out structural differences between these problems and the K-means problem, respectively between their optimal solutions. We want to use the insights gained from this analysis for the algorithmic and complexity theoretical examination of the MLE problem.

聚类是将一组对象划分为组，即所谓的集群。聚类的计算是机器学习和数据挖掘中的一个典型问题，在多个领域中有应用，包括数据压缩，图像和模式识别以及信号处理。最著名的聚类问题之一是所谓的K-means问题。然而，在实践中，将每个数据点分配到一个唯一的集群并不总是有意义的。这就是为什么所谓的软聚类问题将每个数据点分配给具有一定隶属度的每个聚类。最著名的软聚类问题之一是混合模型的最大似然估计（MLE）问题。期望最大化（EM）算法是解决这一问题的一种常用方法.它是Lloyd算法的推广，在解决K-means问题时很容易应用。尽管EM算法经常被使用，但它是一种启发式算法，有两个显著的缺点。首先，EM算法计算的解决方案没有保证。其次，它的运行时间没有已知的上限。这个项目的主要目标是混合模型的极大似然估计问题的算法和复杂性理论分析。将MLE问题与分析良好的K-means问题进行比较，可以识别两个实质性的结构差异。软聚类意义下的点的隶属度分布，以及混合模型形式的代价函数。这就是为什么我们要从两个不同的方向来处理MLE问题。为此，我们想研究K均值和MLE问题之间的几个变体。其核心思想是，这些变体不会表现出先前针对K-means问题的两个差异，而是在每种情况下只表现出其中一个。特别是，我们分析了模糊K-均值和CMLE问题。这些问题，除了他们在这个项目中承担的角色，作为“中间人”之间的K-means和MLE问题，具有很大的实际意义。与Lloyd算法和EM算法类似，模糊K均值和CMLE问题也有算法，它们通常在实际中得到应用。我们的目标是开发算法，解决这些问题的解决方案的质量和运行时的可证明的保证。此外，我们还想找出这些问题和K-means问题之间的结构差异，以及它们的最优解之间的差异。我们希望使用从这种分析中获得的见解，用于MLE问题的算法和复杂性理论研究。