权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Dense and Sparse Methods in High-Dimensional Data Analysis

高维数据分析中的密集和稀疏方法

基本信息

批准号：
1208785
负责人：
Lee Dicker
金额：
$ 16万
依托单位：
Rutgers University New Brunswick
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-08-01 至 2016-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1208785&HistoricalAwards=false
关键词：
Dense Sparse Methods Dimensional Data

项目摘要

Many methods for high-dimensional data analysis begin with the assumption that the parameter of interest is, in some sense, sparse. Furthermore, the performance of many of these methods depends on the sparsity of the underlying parameters. However, statistical methods for checking sparsity assumptions and determining the implications of the absence or near-absence of sparsity are lacking. The driving goal of this project is to develop practical statistical tools for identifying situations where the relevant parameters are in fact sparse, or where sparse methods for high-dimensional data analysis may be applied effectively. Problems considered in this project will primarily be studied within the context of the linear model and the Gaussian location model. Methods will be assessed by decision theoretic-like criteria (e.g. asymptotic minimaxity). A null model based on dense (non-sparse) signals and dense estimation and prediction methods will be developed and thoroughly studied. This will provide a rich framework for sparsity testing, where the aim is to identify settings in which sparse methods are likely to be successful. Specific sparsity testing procedures will be proposed and analyzed. High-dimensional data analysis is one of the most active areas of current statistical research. Much of this research has been driven by technological advances that have enabled researchers to collect vast datasets with relative ease in a variety of scientific disciplines, including astrophysics, geological sciences, molecular biology, and genomics. In high-dimensional datasets, many features are measured for each unit of observation (e.g. thousands of gene expression levels may be measured for each individual in a genomic study). Sparsity plays a major role in much of the research on high-dimensional data analysis. Broadly speaking, sparsity measures the degree to which a specified outcome may be described by relatively few features. Sparse methods for high-dimensional data analysis attempt to leverage sparsity in the underlying dataset and have proven to be very effective in many applications, especially in engineering and signal processing. On the other hand, the performance of sparse methods has been more mixed in other important applications where high-dimensional data are abundant, such as genomics. In this project, the investigator will develop statistical methods for characterizing and identifying situations where sparse methods can be successfully applied. This will be achieved by developing tools for determining the level of sparsity in high-dimensional datasets. These methods, when applied to a given dataset, will help researchers determine the validity of subsequent statistical analyses and the potential benefits of using sparse methods for these analyses. This research is likely to have significant implications for understanding reproducibility in high-dimensional data analysis and broad applications in the analysis of genomic data. The methods developed during the course of this project will be utilized in collaborative work with highly experienced researchers in genomics.

许多用于高维数据分析的方法首先假设感兴趣的参数在某种意义上是稀疏的。此外，许多方法的性能取决于底层参数的稀疏性。然而，用于检查稀疏性假设和确定缺乏或几乎缺乏稀疏性的含义的统计方法是缺乏的。该项目的驱动目标是开发实用的统计工具，用于识别相关参数实际上是稀疏的情况，或者可以有效应用高维数据分析的稀疏方法的情况。在这个项目中考虑的问题将主要在线性模型和高斯位置模型的背景下进行研究。方法将通过类似决策理论的标准（例如渐近极小性）进行评估。一个基于密集（非稀疏）信号和密集估计和预测方法的零模型将被开发和深入研究。这将为稀疏性测试提供一个丰富的框架，其目的是识别稀疏方法可能成功的设置。具体的稀疏性测试程序将提出和分析。高维数据分析是当前统计研究中最活跃的领域之一。技术的进步使研究人员能够相对容易地收集各种科学学科的大量数据集，包括天体物理学、地质科学、分子生物学和基因组学，这在很大程度上推动了这些研究。在高维数据集中，每个观察单位测量许多特征（例如，在基因组研究中可能测量每个个体的数千个基因表达水平）。稀疏性在高维数据分析的许多研究中起着重要作用。广义地说，稀疏性度量的是一个特定的结果可以用相对较少的特征来描述的程度。用于高维数据分析的稀疏方法试图利用底层数据集的稀疏性，并已被证明在许多应用中非常有效，特别是在工程和信号处理中。另一方面，在其他高维数据丰富的重要应用中，如基因组学，稀疏方法的性能则更加参差不齐。在这个项目中，研究者将开发统计方法来描述和识别稀疏方法可以成功应用的情况。这将通过开发用于确定高维数据集的稀疏程度的工具来实现。当将这些方法应用于给定的数据集时，将帮助研究人员确定后续统计分析的有效性以及使用稀疏方法进行这些分析的潜在好处。这项研究可能对理解高维数据分析的可重复性和基因组数据分析的广泛应用具有重要意义。在此项目过程中开发的方法将用于与基因组学领域经验丰富的研究人员的合作工作。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Lee Dicker其他文献

IDENTIFICATION OF NOVEL URINARY BIOMARKERS OF RENAL OBSTRUCTION USING TEMPORAL QUANTITATIVE PROTEOMICS

DOI：
10.1016/s0022-5347(09)60717-5
发表时间：
2009-04-01
期刊：
Conference abstract
影响因子：
作者：
Alireza Vaezzadeh;Andrew C Briscoe;Lee Dicker;Oliver Hofman;Winston Hide;Hanno Steen;Richard S Lee
通讯作者：
Richard S Lee