权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: STATISTICAL INFERENCE FOR TOPOLOGICAL AND GEOMETRIC DATA ANALYSIS

职业：拓扑和几何数据分析的统计推断

基本信息

批准号：
1149677
负责人：
Alessandro Rinaldo
金额：
$ 40万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-06-01 至 2018-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1149677&HistoricalAwards=false
关键词：
CAREER STATISTICAL INFERENCE TOPOLOGICAL GEOMETRIC

项目摘要

The research objective of this proposal is to develop new theories and methods for estimating topological and geometric features of lower-dimensional sets based on noisy high-dimensional data. To this end, the investigator has formulated two separate but highly interdependent sets of research goals. The first set of research goals is the integration of statistical theory with methods of topological data analysis. Recent breakthroughs in computational topology have made it possible to compute topological invariants of sets from a collection of points in Euclidean spaces. Though the potential for high-dimensional statistical inference of these new types of data summaries is significant, their statistical properties are still largely unexplored. The investigator proposes to 1) to develop a comprehensive theory of minimax (and adaptive) estimation of topological properties of sets and 2) to create statistical procedures for non-parametric testing and de-noising based on topological invariants. The second set of research goals pertains to the traditional geometric data-analytic task of clustering in high-dimensions, and it is aimed at advancing the theory and practice of high-density clustering. Recent progress in the theory of clustering has demonstrated that clustering using density estimation can perform well in high-dimensional settings, and that the notion of high-density clustering provides a natural probabilistic framework for describing and analyzing clustering problems in great generality. Thus, the investigator intends 1) to generalize and refine the high-density clustering problem under weak conditions on the data-generating mechanism and 2) to investigate the theory and use of data resampling techniques for parameter tuning in high-density clustering and density estimation. A common thread in the proposed research is the reliance on density estimation, as a tool for both accurate high-dimensional clustering and smoothing/de-noising of topological features. In the last few decades, advances in data acquisition technologies have led to an explosion in the collection and diffusion of large-scale datasets, across a variety of scientific fields. The unprecedented magnitude and complexity of modern databases pose formidable challenges to statisticians, both of theoretical and methodological nature, and has required the development of new statistical tools for data analysis. Modern high-dimensional statistics is predicated on the key assumption that, while the data are observed in a high-dimensional space, the intrinsic complexity of the data-generating mechanism is in fact significantly smaller and, therefore, learnable in computationally efficient ways. This research proposal capitalizes on this premise, and describes an array of methods for summarizing, discriminating, visualizing and clustering high-dimensional noisy data and for extracting salient low-dimensional features. The proposed research encompasses several novel and open research problems at the interface of mathematics, computer science, statistics and machine learning. The procedures studied in the proposal are of broad applicability and promise to be used in a multitude of scientific areas, such as medical imaging, neuroscience, astrophysics, biology, genetics, geophysics and sensor networks, just to name a few. The broader impact of this project also includes interdisciplinary training of students in statistics, mathematics and computer science.

本课题的研究目标是发展新的理论和方法，用于在高维噪声数据的基础上估计低维集合的拓扑和几何特征。为此，研究者制定了两套独立但高度相互依赖的研究目标。第一组研究目标是统计理论与拓扑数据分析方法的整合。计算拓扑学的最新突破使得从欧几里得空间中的点集合计算集合的拓扑不变量成为可能。虽然这些新类型的数据摘要的高维统计推断的潜力是显着的，它们的统计特性仍然在很大程度上未被探索。研究者建议：1）开发一个集的拓扑性质的极大极小（和自适应）估计的综合理论; 2）创建基于拓扑不变量的非参数检验和去噪的统计程序。第二组研究目标属于传统的高维聚类几何数据分析任务，旨在推进高密度聚类的理论和实践。聚类理论的最新进展表明，使用密度估计的聚类可以在高维环境中表现良好，并且高密度聚类的概念为描述和分析聚类问题提供了一个自然的概率框架。因此，研究者打算1）在弱条件下推广和改进高密度聚类问题的数据生成机制和2）研究高密度聚类和密度估计中参数调整的数据恢复技术的理论和使用。在拟议的研究中的一个共同点是依赖于密度估计，作为一种工具，准确的高维聚类和平滑/去噪的拓扑特征。在过去的几十年里，数据采集技术的进步导致了跨各种科学领域的大规模数据集的收集和传播的爆炸式增长。现代数据库的规模和复杂性前所未有，对统计人员在理论和方法上都构成了巨大挑战，需要开发新的数据分析统计工具。现代高维统计学基于一个关键假设，即虽然数据是在高维空间中观察到的，但数据生成机制的内在复杂性实际上要小得多，因此可以以计算效率高的方式学习。本研究建议利用这一前提，并描述了一系列的方法，用于总结，区分，可视化和聚类高维噪声数据和提取显着的低维特征。拟议的研究包括数学，计算机科学，统计学和机器学习接口的几个新颖和开放的研究问题。该提案中研究的程序具有广泛的适用性，并有望用于许多科学领域，如医学成像，神经科学，天体物理学，生物学，遗传学，生物物理学和传感器网络，仅举几例。该项目的更广泛影响还包括对学生进行统计、数学和计算机科学方面的跨学科培训。