Soft-Clustering - From Heuristics To Approximation Algorithms
软聚类 - 从启发式到近似算法
基本信息
- 批准号:324506751
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Research Grants
- 财政年份:2017
- 资助国家:德国
- 起止时间:2016-12-31 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A clustering is a partition of a set of objects into groups, so-called clusters. The computation of a clustering is a typical problem from machine learning and data mining with applications in several fields, including data compression, image and pattern recognition and signal processing. One of the most well-known clustering problems is the so-called K-means problem. However, in practice it does not always make sense to assign each data point to a unique cluster. That is why so-called soft clustering problems assign each data point to each cluster with some degree of membership. One of the most well-known soft clustering problems is the maximum likelihood estimation (MLE) problem with respect to mixture models. The Expectation Maximization (EM) algorithm is a popular approach to solve this problem in practical applications. It is a generalization of Lloyd's algorithm, which is readily applied when solving the K-means problem. Even though it is regularly used, the EM algorithm is a heuristic with two significant downsides. First, there is no guarantee on the solutions that the EM algorithm computes. Second, there is no known upper bound on its runtime. The main goal of this project is the algorithmic and complexity theoretical analysis of the MLE problem for mixture models. Comparing the MLE problem to the well-analyzed K-means problem, one can identify two substantial structural differences. The distribution of the memberships of points in the sense of a soft clustering, and the cost function in the form of a mixture model. That is why we want to approach the MLE problem from two different directions. To this end, we want to examine several variants "inbetween" the K-means and the MLE problem. The core idea is that these variants do not exhibit both of the previously addressed differences to the K-means problem, but in each case only one of them. In particular, we analyze the fuzzy K-means and the CMLE problem. These problems are, aside from the role they assume in this project as "intermediates" between the K-means and MLE problem, of great practical interest. Analogously to Lloyd's algorithm and the EM algorithm, there are heuristics for the fuzzy K-means and the CMLE problem, which are usually applied in practice. Our goal is to develop algorithms that solve these problems with provable guarantees on the quality of solutions and on the runtime. Additionally, we want to work out structural differences between these problems and the K-means problem, respectively between their optimal solutions. We want to use the insights gained from this analysis for the algorithmic and complexity theoretical examination of the MLE problem.
聚类是将一组对象划分为组,即所谓的集群。聚类的计算是机器学习和数据挖掘中的一个典型问题,在多个领域中有应用,包括数据压缩,图像和模式识别以及信号处理。最著名的聚类问题之一是所谓的K-means问题。然而,在实践中,将每个数据点分配到一个唯一的集群并不总是有意义的。这就是为什么所谓的软聚类问题将每个数据点分配给具有一定隶属度的每个聚类。最著名的软聚类问题之一是混合模型的最大似然估计(MLE)问题。期望最大化(EM)算法是解决这一问题的一种常用方法.它是Lloyd算法的推广,在解决K-means问题时很容易应用。尽管EM算法经常被使用,但它是一种启发式算法,有两个显著的缺点。首先,EM算法计算的解决方案没有保证。其次,它的运行时间没有已知的上限。这个项目的主要目标是混合模型的极大似然估计问题的算法和复杂性理论分析。将MLE问题与分析良好的K-means问题进行比较,可以识别两个实质性的结构差异。软聚类意义下的点的隶属度分布,以及混合模型形式的代价函数。这就是为什么我们要从两个不同的方向来处理MLE问题。为此,我们想研究K均值和MLE问题之间的几个变体。其核心思想是,这些变体不会表现出先前针对K-means问题的两个差异,而是在每种情况下只表现出其中一个。特别是,我们分析了模糊K-均值和CMLE问题。这些问题,除了他们在这个项目中承担的角色,作为“中间人”之间的K-means和MLE问题,具有很大的实际意义。与Lloyd算法和EM算法类似,模糊K均值和CMLE问题也有算法,它们通常在实际中得到应用。我们的目标是开发算法,解决这些问题的解决方案的质量和运行时的可证明的保证。此外,我们还想找出这些问题和K-means问题之间的结构差异,以及它们的最优解之间的差异。我们希望使用从这种分析中获得的见解,用于MLE问题的算法和复杂性理论研究。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Complexity Theoretical Study of Fuzzy K-Means
模糊K均值复杂度理论研究
- DOI:10.1145/3409385
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Blömer;Brauer;und Bujna
- 通讯作者:und Bujna
Coresets for Fuzzy K-Means with Applications
模糊 K 均值核心集及其应用
- DOI:10.4230/lipics.isaac.2018.46
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Blömer;Brauer;und Bujna
- 通讯作者:und Bujna
How well do SEM algorithms imitate EM algorithms? A non-asymptotic analysis for mixture models
- DOI:10.1007/s11634-019-00366-7
- 发表时间:2019-07
- 期刊:
- 影响因子:1.6
- 作者:Johannes Blömer;Sascha Brauer;Kathrin Bujna;Daniel Kuntze
- 通讯作者:Johannes Blömer;Sascha Brauer;Kathrin Bujna;Daniel Kuntze
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Johannes Blömer其他文献
Professor Dr. Johannes Blömer的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Johannes Blömer', 18)}}的其他基金
Development of a practical theory for clustering algorithms through data-driven modeling and analysis
通过数据驱动的建模和分析开发聚类算法的实用理论
- 批准号:
47960847 - 财政年份:2007
- 资助金额:
-- - 项目类别:
Priority Programmes
Sicherheitsanalyse kryptographischer Systeme bezüglich Gitterangriffen
密码系统针对网格攻击的安全分析
- 批准号:
5431984 - 财政年份:2004
- 资助金额:
-- - 项目类别:
Priority Programmes
相似海外基金
Modern statistical methods for clustering community ecology data
群落生态数据聚类的现代统计方法
- 批准号:
DP240100143 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Discovery Projects
Time series clustering to identify and translate time-varying multipollutant exposures for health studies
时间序列聚类可识别和转化随时间变化的多污染物暴露以进行健康研究
- 批准号:
10749341 - 财政年份:2024
- 资助金额:
-- - 项目类别:
CAREER: A Theoretical Exploration of Efficient and Accurate Clustering Algorithms
职业生涯:高效准确聚类算法的理论探索
- 批准号:
2337832 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
Data-driven phenotyping of central disorders of hypersomnolence with unsupervised clustering: toward more reliable diagnostic criteria
无监督聚类的数据驱动的中枢性嗜睡症表型分析:寻求更可靠的诊断标准
- 批准号:
481046 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Collaborative Research: Magnetic Clustering using Novel Poly(amino acid) Corrals to Advance Magnetic Particle Imaging
合作研究:利用新型聚氨基酸畜栏进行磁聚类以推进磁粒子成像
- 批准号:
2305404 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Collaborative Research: Magnetic Clustering using Novel Poly(amino acid) Corrals to Advance Magnetic Particle Imaging
合作研究:利用新型聚氨基酸畜栏进行磁聚类以推进磁粒子成像
- 批准号:
2305402 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Clustering of startups and formation of an ecosystem for startups after the withdrawal of major regional companies
区域大企业退出后初创企业的集聚和初创企业生态系统的形成
- 批准号:
23K01535 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
Did the 2022 strong polar vortex make serial extratropical cyclone clustering more likely? (StratClust)
2022年的强极地涡旋是否使系列温带气旋聚集的可能性更大?
- 批准号:
NE/X011933/1 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Research Grant
Understanding the structural mechanism of spontaneous ubiquitin cargo clustering on the cell plasma membrane
了解细胞质膜上自发泛素货物聚集的结构机制
- 批准号:
10730734 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Computational methods for Bayesian clustering
贝叶斯聚类的计算方法
- 批准号:
2884420 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Studentship