High-dimensional Clustering: Theory and Methods
高维聚类:理论与方法
基本信息
- 批准号:1713003
- 负责人:
- 金额:$ 38万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2017
- 资助国家:美国
- 起止时间:2017-07-01 至 2021-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The past two decades have witnessed an explosion in the scale and complexity of data sets that arise in science and engineering. Broadly, clustering methods which discover latent structure in data are our primary tool for navigating, exploring and visualizing massive datasets. These methods have been widely and successfully applied in phylogeny, medicine, psychiatry, archaeology and anthropology, phytosociology, economics and several other fields. Despite its ubiquity, the widespread scientific adoption of clustering methods have been hindered by the lack of flexible clustering methods for high-dimensional datasets and by the dearth of meaningful inferential guarantees in clustering problems. Accordingly, the goal of this research is to develop new and effective methods for clustering complex data-sets, and to further develop an inferential grounding -- which will in turn lead to actionable conclusions -- for these methods. This research will lead to the development of new clustering methods, as well as to a deeper understanding of the fundamental limitations of methods aimed at uncovering latent structure in data. The research component of this project consists of four aims designed to address related aspects of this high-level goal: (a) analyze and develop new clustering methods for high-dimensional datasets, with a particular focus on practically useful methods like mixture-model based clustering, and minimum volume clustering; (b) develop novel methods for inference in the context of clustering, motivated by scientific applications where it is important not only to cluster the data but also to clearly characterize the sampling variability of the discovered clusters; (c) develop fundamental lower bounds for high-dimensional clustering (d) develop novel methods for clustering functional data with inferential guarantees. These research components are closely coupled with concrete educational initiatives, including the development and broad dissemination of publicly-available software for high-dimensional clustering; tutorials and workshops at Machine Learning conferences and fostering further interactions between the Departments of Statistics and Machine Learning at Carnegie Mellon.
在过去的二十年里,科学和工程领域出现了数据集规模和复杂性的爆炸式增长。从广义上讲,发现数据中潜在结构的聚类方法是我们导航,探索和可视化海量数据集的主要工具。这些方法已广泛而成功地应用于遗传学、医学、精神病学、考古学和人类学、植物社会学、经济学等领域。尽管它无处不在,聚类方法的广泛科学采用受到了阻碍,缺乏灵活的聚类方法,高维数据集和聚类问题缺乏有意义的推理保证。因此,本研究的目标是开发新的和有效的方法来聚类复杂的数据集,并进一步发展推理的基础-这反过来又会导致可操作的结论-这些方法。这项研究将导致新的聚类方法的发展,以及更深入地了解旨在揭示数据中潜在结构的方法的基本局限性。该项目的研究部分包括四个目标,旨在解决这一高级目标的相关方面:(a)分析和开发新的高维数据集聚类方法,特别关注实际有用的方法,如基于混合模型的聚类和最小体积聚类;(B)在聚类的背景下开发新的推理方法,受科学应用的推动,在这些应用中,不仅要对数据进行聚类,而且要清楚地描述所发现的聚类的采样可变性;(c)开发高维聚类的基本下界(d)开发具有推理保证的功能数据聚类的新方法。这些研究组成部分与具体的教育举措密切相关,包括开发和广泛传播用于高维聚类的公开软件;机器学习会议上的教程和研讨会,以及促进卡内基梅隆大学统计和机器学习部门之间的进一步互动。
项目成果
期刊论文数量(24)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Hybrid Wasserstein distance and fast distribution clustering
- DOI:10.1214/19-ejs1639
- 发表时间:2018-12
- 期刊:
- 影响因子:1.1
- 作者:I. Verdinelli;L. Wasserman
- 通讯作者:I. Verdinelli;L. Wasserman
Analysis of a mode clustering diagram
- DOI:10.1214/18-ejs1510
- 发表时间:2018-05
- 期刊:
- 影响因子:1.1
- 作者:I. Verdinelli;L. Wasserman
- 通讯作者:I. Verdinelli;L. Wasserman
Minimax optimal conditional independence testing
- DOI:10.1214/20-aos2030
- 发表时间:2020-01
- 期刊:
- 影响因子:0
- 作者:Matey Neykov;Sivaraman Balakrishnan;L. Wasserman
- 通讯作者:Matey Neykov;Sivaraman Balakrishnan;L. Wasserman
Rate Optimal Estimation and Confidence Intervals for High-dimensional Regression with Missing Covariates
- DOI:10.1016/j.jmva.2019.06.004
- 发表时间:2017-02
- 期刊:
- 影响因子:0
- 作者:Yining Wang;Jialei Wang;Sivaraman Balakrishnan;Aarti Singh
- 通讯作者:Yining Wang;Jialei Wang;Sivaraman Balakrishnan;Aarti Singh
Feeling the Bern: Adaptive Estimators for Bernoulli Probabilities of Pairwise Comparisons
感受伯尔尼:成对比较伯努利概率的自适应估计器
- DOI:10.1109/tit.2019.2903249
- 发表时间:2019
- 期刊:
- 影响因子:2.5
- 作者:Shah, Nihar B.;Balakrishnan, Sivaraman;Wainwright, Martin J.
- 通讯作者:Wainwright, Martin J.
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sivaraman Balakrishnan其他文献
Minimax rates for homology inference
同源推理的极小极大率
- DOI:
- 发表时间:
2011 - 期刊:
- 影响因子:0
- 作者:
Sivaraman Balakrishnan;A. Rinaldo;Don Sheehy;Aarti Singh;L. Wasserman - 通讯作者:
L. Wasserman
Cluster Trees on Manifolds
流形上的聚类树
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Sivaraman Balakrishnan;S. Narayanan;A. Rinaldo;Aarti Singh;L. Wasserman - 通讯作者:
L. Wasserman
When is it Better to Compare than to Score?
什么时候比较比评分更好?
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Nihar B. Shah;Sivaraman Balakrishnan;Joseph K. Bradley;Abhay K. Parekh;K. Ramchandran;M. Wainwright - 通讯作者:
M. Wainwright
Causal Effect Estimation after Propensity Score Trimming with Continuous Treatments
通过连续治疗进行倾向评分调整后的因果效应估计
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Zach Branson;Edward H. Kennedy;Sivaraman Balakrishnan;Larry Wasserman - 通讯作者:
Larry Wasserman
TESTING FOR HIGH-DIMENSIONAL MULTINOMIALS : A SELECTIVE REVIEW By Sivaraman Balakrishnan and Larry Wasserman
高维多项式测试:Sivaraman Balakrishnan 和 Larry Wasserman 的选择性审查
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Sivaraman Balakrishnan;L. Wasserman;S. Fienberg - 通讯作者:
S. Fienberg
Sivaraman Balakrishnan的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sivaraman Balakrishnan', 18)}}的其他基金
Foundations of High-Dimensional and Nonparametric Hypothesis Testing
高维和非参数假设检验的基础
- 批准号:
2113684 - 财政年份:2021
- 资助金额:
$ 38万 - 项目类别:
Standard Grant
相似海外基金
Alpha Clustering in Ab Initio Nuclear Theory
从头算核理论中的 Alpha 聚类
- 批准号:
535536-2019 - 财政年份:2021
- 资助金额:
$ 38万 - 项目类别:
Postgraduate Scholarships - Doctoral
Alpha Clustering in Ab Initio Nuclear Theory
从头算核理论中的 Alpha 聚类
- 批准号:
535536-2019 - 财政年份:2020
- 资助金额:
$ 38万 - 项目类别:
Postgraduate Scholarships - Doctoral
New Development of Clustering Methods Considering Uncertainty Based on Rough Set Theory
基于粗糙集理论考虑不确定性的聚类方法新进展
- 批准号:
20K19886 - 财政年份:2020
- 资助金额:
$ 38万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Alpha Clustering in Ab Initio Nuclear Theory
从头算核理论中的 Alpha 聚类
- 批准号:
535536-2019 - 财政年份:2019
- 资助金额:
$ 38万 - 项目类别:
Postgraduate Scholarships - Doctoral
Study on k-means type clustering based on rough set theory
基于粗糙集理论的k-means型聚类研究
- 批准号:
17K12753 - 财政年份:2017
- 资助金额:
$ 38万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Statistical theory of unsupervised learning with a focus on clustering methods
以聚类方法为重点的无监督学习统计理论
- 批准号:
26880031 - 财政年份:2014
- 资助金额:
$ 38万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Clustering: Bridging the Theory-Practice Gap
聚类:弥合理论与实践的差距
- 批准号:
420643-2012 - 财政年份:2013
- 资助金额:
$ 38万 - 项目类别:
Postdoctoral Fellowships
Clustering: Bridging the Theory-Practice Gap
聚类:弥合理论与实践的差距
- 批准号:
420643-2012 - 财政年份:2012
- 资助金额:
$ 38万 - 项目类别:
Postdoctoral Fellowships
A Theory of Clustering
聚类理论
- 批准号:
389179-2010 - 财政年份:2011
- 资助金额:
$ 38万 - 项目类别:
Postgraduate Scholarships - Doctoral
Clustering theory
聚类理论
- 批准号:
380849-2009 - 财政年份:2011
- 资助金额:
$ 38万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral