Rigorous Methods for Dimensionality Reduction of High-Dimensional Data

高维数据降维的严格方法

基本信息

  • 批准号:
    0505303
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2005
  • 资助国家:
    美国
  • 起止时间:
    2005-07-01 至 2010-06-30
  • 项目状态:
    已结题

项目摘要

A research effort is proposed to create tools for data analysis and inference in high-dimensional settings. The effort uses tools from random matrix theory (RMT), Banach Space Theory (BST), and differential geometry (DG) to expose new phenomena in high-dimensional statistical inference and data analysis, yielding practical statistical methods with rigorously-established properties under carefully-stated conditions. The results will impact a wide range of data analysis problems, including the building of linear models, the testing of complex hypotheses about multivariate data, and the detection of subtle nonlinear structures in high-dimensional data. In the research, the investigators build further bridges between RMT, BST, and DG and three problem areas: (a) Sparse Linear Modelling -- How should one build a predictive model choosing relatively few predictors out of many available predictors?; (b) Multivariate Analysis in High Dimensions -- How should one best estimate and test for structure in high-dimensional data, particularly when the number of variables is large and the number of observations is small?; (c) Manifold Learning -- How can one best find nonlinear structure in high-dimensional data and best parametrize that structure? Each of these areas is of fundamental importance to the analysis of high-dimensional data, and the investigators identify a strategy to use RMT, BST, and DG to make substantial contributions to each. This strategy builds on the authors' recent research accomplishments using RMT, BST, and DG, which will be extended to show: (a) how to find the best-fitting low-dimensional linear model without spending exponential time searching through model space -- extending previous successes in using Basis Pursuit, LARS and Lasso; (b) how to correctly test a wide range of important hypotheses in multivariate analysis using the Tracy-Widom distribution -- extending previous results in applying the Tracy-Widom distribution to Principal Components Analysis; and (c) how to correctly estimate a nonlinear parametrization of sparsely sampled curved data in high dimensional space -- extending previous successes in developing the Hessian Eigenmap technique of dimensionality reduction.The motivation for this project lies in the `data deluge' now engulfing every branch of science and technology. In field after field, new sensors are creating data streams of unparalleled breadth and depth. As a result, today scientific and technological progress depends heavily on the ability to process high-dimensional data and reduce its dimensionality, sometimes drastically, obtaining a good approximation using a few well-chosen combinations of the original measurements. While many methods of dimensionality reduction have already been proposed, much existing research activity in this area is heuristic and speculative; the tools are often of unknown reliability and their properties hold under conditions of unknown generality. This project develops methods based on careful mathematical analysis to develop methods of dimensionality reduction which are rigorously correct and/or optimal. These methods give the user the assurance that important features are captured in the dimensions which remain and that little of importance is discarded in the dimensions that are thrown away. The project develops such rigorous methods in three areas: (a) building parsimonious but accurate predictive models out of a database of many possible predictors; (b) testing for hidden structure in what otherwise seems to be high dimensional `noise'; (c) discovering the correct representation for data which are intrinsically nonlinear. Strong expectations for success of this project can be based on existing solid achievements by the investigators in each of these three areas.
一项研究工作,提出了在高维设置的数据分析和推理的工具。 这项工作使用随机矩阵理论(RMT),Banach空间理论(BST)和微分几何(DG)的工具来揭示高维统计推断和数据分析中的新现象,在仔细陈述的条件下产生具有严格建立的属性的实用统计方法。 结果将影响广泛的数据分析问题,包括线性模型的构建,关于多元数据的复杂假设的测试,以及高维数据中微妙的非线性结构的检测。在研究中,研究人员在RMT、BST和DG与三个问题领域之间建立了进一步的桥梁:(a)稀疏线性模型--如何从许多可用的预测因子中选择相对较少的预测因子来建立预测模型?(b)高维多变量分析--如何最好地估计和检验高维数据的结构,特别是当变量数很大而观测数很小时?(c)流形学习--如何在高维数据中找到非线性结构并最好地参数化该结构? 这些领域中的每一个都对高维数据的分析至关重要,研究人员确定了一种使用RMT、BST和DG的策略,以对每个领域做出实质性贡献。 该策略建立在作者最近使用RMT,BST和DG的研究成果基础上,这些成果将被扩展以展示:(a)如何找到最佳拟合的低维线性模型,而无需花费指数时间搜索模型空间-扩展以前使用Basis Pursuit,LARS和Lasso的成功;(B)如何在使用Tracy-Widom分布的多变量分析中正确地检验大范围的重要假设--将先前的结果扩展到将Tracy-Widom分布应用于主成分分析;以及(c)如何正确地估计高维空间中稀疏采样曲线数据的非线性参数化-扩展了以前在开发降维的Hessian Eigenmap技术方面取得的成功。 在一个又一个领域,新的传感器正在创造前所未有的广度和深度的数据流。因此,今天的科学和技术进步在很大程度上取决于处理高维数据和降低其维数的能力,有时是急剧的,使用原始测量的一些精心选择的组合获得良好的近似。虽然已经提出了许多降维方法,但这一领域的许多现有研究活动都是启发式和推测性的;这些工具通常具有未知的可靠性,并且在未知的一般性条件下保持其属性。 该项目开发基于仔细的数学分析的方法,以开发严格正确和/或最佳的降维方法。 这些方法为用户提供了保证,即在保留的维度中捕获了重要的特征,并且在丢弃的维度中丢弃了很少的重要特征。该项目在三个领域开发了这种严格的方法:(a)从许多可能的预测因素的数据库中建立简约但准确的预测模型;(B)在似乎是高维“噪声”的情况下测试隐藏结构;(c)发现本质上非线性的数据的正确表示。 对该项目成功的强烈期望可以基于调查人员在这三个领域中的每一个领域的现有扎实成就。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Iain Johnstone其他文献

Initial functional and economic status of patients with multivessel coronary artery disease randomized in the Bypass Angioplasty Revascularization Investigation (BARI).
旁路血管成形术血运重建调查 (BARI) 中随机分配的多支冠状动脉疾病患者的初始功能和经济状况。
  • DOI:
    10.1016/s0002-9149(99)80393-2
  • 发表时间:
    1995
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Hlatky;Edgar D. Charles;Fred T. Nobrega;Kathryn Gelman;Kathryn Gelman;Iain Johnstone;Joseph Melvin;Thomas J. Ryan;R. Wiens;Bertram Pitt;G. Reeder;Hugh C. Smith;P. Whitlow;George L. Zorn;David J. Frid;Daniel B. Mark
  • 通讯作者:
    Daniel B. Mark
233: Multiparametric high dimensional analysis of normal & VZV infected human tonsil T cells at a single cell resolution by mass cytometry
  • DOI:
    10.1016/j.cyto.2013.06.236
  • 发表时间:
    2013-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Nandini Sen;Gourab Mukherjee;Sean C. Bendall;Adrish Sen;Astraea Jager;Phil Sung;Garry P. Nolan;Iain Johnstone;Ann M. Arvin
  • 通讯作者:
    Ann M. Arvin

Iain Johnstone的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Iain Johnstone', 18)}}的其他基金

Properties of Approximate Inference for Complex High-Dimensional Models
复杂高维模型的近似推理的性质
  • 批准号:
    1811614
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Estimation and testing in low rank multivariate models
低秩多元模型中的估计和测试
  • 批准号:
    1407813
  • 财政年份:
    2014
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
High dimensional data: new phenomena and theory in modeling and approximation
高维数据:建模和近似中的新现象和理论
  • 批准号:
    0906812
  • 财政年份:
    2009
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
A genetic analysis of the response to the presence of glycine
对甘氨酸存在反应的遗传分析
  • 批准号:
    G0401202/1
  • 财政年份:
    2006
  • 资助金额:
    --
  • 项目类别:
    Research Grant
New Statistical Challenges Posed by Multiscale and Adaptive Representations
多尺度和自适应表示带来的新统计挑战
  • 批准号:
    0072661
  • 财政年份:
    2000
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Mathematical Sciences/GIG: "Group Infrastructure Grant for Stanford Statistics"
数学科学/GIG:“斯坦福统计集团基础设施拨款”
  • 批准号:
    9631278
  • 财政年份:
    1996
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Mathematical Sciences: Adaptive Estimation: New Tools, New Settings
数学科学:自适应估计:新工具,新设置
  • 批准号:
    9505151
  • 财政年份:
    1995
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
U.S.-Australia Joint Workshop: New Directions in Nonparametric Curve Estimation / Canberra, Australia / June 1994
美国-澳大利亚联合研讨会:非参数曲线估计的新方向 / 澳大利亚堪培拉 / 1994 年 6 月
  • 批准号:
    9316006
  • 财政年份:
    1994
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
PYI: Mathematical Sciences: Studies in New Multivariate Methods and Decision Theory
PYI:数学科学:新多元方法和决策理论研究
  • 批准号:
    8451750
  • 财政年份:
    1985
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant

相似国自然基金

Computational Methods for Analyzing Toponome Data
  • 批准号:
    60601030
  • 批准年份:
    2006
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Sensitivity analysis on Deep Learning (DL)-based dimensionality reduction methods of scRNA-seq data
基于深度学习 (DL) 的 scRNA-seq 数据降维方法的敏感性分析
  • 批准号:
    572254-2022
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    University Undergraduate Student Research Awards
Modelling modern data objects: statistical methods for high-dimensionality and intricate correlation structures
现代数据对象建模:高维和复杂相关结构的统计方法
  • 批准号:
    RGPIN-2020-06941
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Modelling modern data objects: statistical methods for high-dimensionality and intricate correlation structures
现代数据对象建模:高维和复杂相关结构的统计方法
  • 批准号:
    RGPIN-2020-06941
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Efficient Methods for Dimensionality Reduction ofSingle-Cell RNA-Sequencing Data
单细胞 RNA 测序数据降维的有效方法
  • 批准号:
    10356883
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
Modelling modern data objects: statistical methods for high-dimensionality and intricate correlation structures
现代数据对象建模:高维和复杂相关结构的统计方法
  • 批准号:
    DGECR-2020-00367
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Discovery Launch Supplement
Modelling modern data objects: statistical methods for high-dimensionality and intricate correlation structures
现代数据对象建模:高维和复杂相关结构的统计方法
  • 批准号:
    RGPIN-2020-06941
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
OAC Core: Small: Scalable Non-linear Dimensionality Reduction Methods to Accelerate Scientific Discovery
OAC 核心:小型:加速科学发现的可扩展非线性降维方法
  • 批准号:
    1910539
  • 财政年份:
    2019
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Analysis of Stochastic Neuronal Models Using Eigenvalue Numerical Solution Methods that Overcome the Curse of Dimensionality
使用特征值数值求解方法分析随机神经元模型,克服维数灾难
  • 批准号:
    18K11518
  • 财政年份:
    2018
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Geometric methods for dimensionality reductions of stochastic (partial) differential equations with applications to signal processing and finance
随机(偏)微分方程降维的几何方法及其在信号处理和金融中的应用
  • 批准号:
    1943803
  • 财政年份:
    2017
  • 资助金额:
    --
  • 项目类别:
    Studentship
BIGDATA: Collaborative Research: F: Statistical Theory and Methods Beyond the Dimensionality Barrier
BIGDATA:协作研究:F:超越维度障碍的统计理论和方法
  • 批准号:
    1633212
  • 财政年份:
    2016
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了