Adaptive Thresholding for Hierarchical Clustering of Variables, with Connections to Scan Statistics
用于变量分层聚类的自适应阈值,并连接到扫描统计数据
基本信息
- 批准号:1613202
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-08-01 至 2020-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In modern data analysis with large data sets, a common goal is to detect groups of variables that exhibit similar behavior. This task is usually referred to as clustering. In genetics and proteomics, for instance, clustering can reveal structures of scientific interest, such as potential biological pathways. On top of detecting scientifically relevant structure in the data, clustering can also be used to simplify data representations and analysis. One of the most widely used approaches to clustering is called hierarchical clustering. In hierarchical clustering, a measure of similarity, like correlation, is computed between each pair of variables, and then similar groups of variables are repeatedly merged. This leads to a fundamental question: how much grouping should be done? The proposed research consists of several projects aimed at developing broadly applicable methods for determining the appropriate amount of clustering, based on the degree of similarity present in the data. The resulting procedures will also provide statistical guarantees on the meaning of the resulting groups.This proposal aims to develop practical procedures for adaptive thresholding of hierarchical clustering dendrograms, when applied to pairwise similarities of variables. These procedures will be connected to inferential guarantees about the false cluster error rate of the resulting clustering. The results will target a range of common linkages and variable similarity measures. The PI will also demonstrate these procedures in a modern genetics application.To support these procedures, new theory will be developed describing the large order statistics of variable similarity measures, including new asymptotic bounds on their joint distributions and new finite-sample bounds on their maxima. The techniques proposed here will also have application to other threshold-based procedures in statistics; in particular, connections may be made between the proposed work and adaptive thresholding procedures for scan statistics.
在具有大型数据集的现代数据分析中,一个共同的目标是检测表现出相似行为的变量组。 这个任务通常被称为集群。 例如,在遗传学和蛋白质组学中,聚类可以揭示科学感兴趣的结构,例如潜在的生物学途径。 除了检测数据中的科学相关结构之外,聚类还可以用于简化数据表示和分析。 最广泛使用的聚类方法之一称为层次聚类。 在层次聚类中,计算每对变量之间的相似性度量,如相关性,然后重复合并相似的变量组。 这就引出了一个基本问题:应该进行多大程度的分组? 拟议的研究包括几个项目,旨在开发广泛适用的方法,用于确定适当的聚类量,根据数据中存在的相似程度。 由此产生的程序也将提供统计保证的意义上产生groups.This建议的目的是开发实用的程序自适应阈值的层次聚类树状图,当应用到成对的变量相似性。 这些程序将连接到推理保证假聚类错误率的结果聚类。 结果将针对一系列共同的联系和可变的相似性措施。 PI还将在现代遗传学应用中演示这些程序。为了支持这些程序,将开发新的理论来描述变量相似性度量的高阶统计量,包括其联合分布的新渐近界和其最大值的新有限样本界。 这里提出的技术也将应用于其他基于阈值的统计程序,特别是,连接之间提出的工作和自适应阈值程序扫描统计。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Max G'Sell其他文献
False Variable Selection Rates in Regression
回归中的错误变量选择率
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
Max G'Sell;T. Hastie;R. Tibshirani - 通讯作者:
R. Tibshirani
Max G'Sell的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Max G'Sell', 18)}}的其他基金
NCS-FO:Collaborative Research:Decoding and Reconstructing the Neural Basis of Real World Social Perception
NCS-FO:合作研究:解码和重建现实世界社会感知的神经基础
- 批准号:
1734868 - 财政年份:2017
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
相似海外基金
Sparsity, thresholding and regularization in data science
数据科学中的稀疏性、阈值化和正则化
- 批准号:
RGPIN-2022-04531 - 财政年份:2022
- 资助金额:
$ 15万 - 项目类别:
Discovery Grants Program - Individual
Sparsity, thresholding and regularization in data science
数据科学中的稀疏性、阈值化和正则化
- 批准号:
DGECR-2022-00453 - 财政年份:2022
- 资助金额:
$ 15万 - 项目类别:
Discovery Launch Supplement
Adaptive thresholding for subchondral bone in high-resolution peripheral computed tomography
高分辨率外周计算机断层扫描中软骨下骨的自适应阈值处理
- 批准号:
496568-2016 - 财政年份:2016
- 资助金额:
$ 15万 - 项目类别:
University Undergraduate Student Research Awards
BENIGN-MALIGNANT LESION DIFFERENTIATION USING FUNCTIONAL ADC-THRESHOLDING
使用功能性 ADC 阈值区分良恶性病变
- 批准号:
8362919 - 财政年份:2011
- 资助金额:
$ 15万 - 项目类别:
Fully Nonparametric Models for Random Effects, Order Thresholding, Boostrap Testing, and Applications
用于随机效应、阶次阈值、Boostrap 测试和应用的完全非参数模型
- 批准号:
0805598 - 财政年份:2008
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Practical Algorithms for Common Subgraph Problems with Thresholding
带有阈值的常见子图问题的实用算法
- 批准号:
317203-2006 - 财政年份:2006
- 资助金额:
$ 15万 - 项目类别:
Postgraduate Scholarships - Master's
Practical Algorithms for Common Subgraph Problems with Thresholding
带有阈值的常见子图问题的实用算法
- 批准号:
317203-2005 - 财政年份:2005
- 资助金额:
$ 15万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
Block Thresholding Methods for Adaptive Wavelet Function Estimation: Theory and Applications
自适应小波函数估计的块阈值方法:理论与应用
- 批准号:
0296215 - 财政年份:2001
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Block Thresholding Methods for Adaptive Wavelet Function Estimation: Theory and Applications
自适应小波函数估计的块阈值方法:理论与应用
- 批准号:
0072578 - 财政年份:2000
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Adaptive Estimation in Wavelet Image Compression: Thresholding and Quantization
小波图像压缩中的自适应估计:阈值化和量化
- 批准号:
9802314 - 财政年份:1998
- 资助金额:
$ 15万 - 项目类别:
Standard Grant