"Big-Data" Asymptotics: Theory and Large-Scale Experiments
“大数据”渐进:理论和大规模实验
基本信息
- 批准号:1418362
- 负责人:
- 金额:$ 70.06万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-08-15 至 2018-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Large datasets are becoming increasingly available and important in science and technology. This project will develop new tools for dealing with large datasets, as well as a new understanding of some of the fascinating phenomena that emerge with high-dimensional data. This project will develop methods for recovering signals (vectors and matrices) from highly undersampled measurements (also known as compressed sensing and matrix completion), and for recovering low-rank matrices from noisy and under sampled measurements, and tools for robustly fitting predictive models when the number of predictor variables is very large. All of these tools have broad domains of applicability -- basically wherever big data are being gathered, researchers will want to use such tools. Phenomena that will be explored include the phase transitions that some algorithms undergo, going abruptly from successful recovery to failure, as the amount of undersampling and/or contamination of the data increases, and the fact that fundamental formulas of classical statistics, such as the Fisher information formula for variance of the maximum-likelihood estimator, no longer apply in high-dimensional statistics. We expect to develop quantitatively precise explanations of these phenomena. Our quantitative explanations will help engineers and scientists plan experiments and make reliable inferences from large datasets.Several classical problems in multivariate data analysis develop a new character when the number of variables p and the number of observations n are both large. These problems include estimation in the linear model, robust estimation in the linear model, sparsity-penalized estimation in the linear model, and estimation of covariance matrices obeying a factor model. In particular, work in a range of fields shows that if certain underlying features of the problem are random (such as iid Gaussian predictor variables, or uniformly distributed eigenvectors), then various surprises occur in the limit when p and n tend to infinity in a fixed proportion. These surprises include sharp phase transitions in the success/failure of recovering the object of interest and extra gaussian noise beyond that caused by the measurements which have no classical counterpart in the p fixed n tending to infinity case. In the proposed work, we will use both massive computational experiments and precise theoretical calculations, to predict and verify such surprising phenomena in the large n, large p setting. Our techniques involve (on the computing side) new tools for design and execution of computational experiments involving many processors in cloud-based configurations, as well as (on the theoretical side) analysis tools for understanding phase transitions in compressed sensing, both in exact reconstruction from noiseless measurements and in asymptotic mean-squared error of convex optimization reconstructions as well as asymptotic mean-squared error of non-convex optimizations. In the study of low-rank models for matrices, we will develop and adapt recent advances in random matrix theory, such as recent progress in the so-called spiked covariance model.
大型数据集在科学和技术中变得越来越可用和重要。该项目将开发处理大型数据集的新工具,以及对高维数据中出现的一些迷人现象的新理解。该项目将开发从高度欠采样测量(也称为压缩传感和矩阵补全)中恢复信号(向量和矩阵)的方法,以及从噪声和欠采样测量中恢复低秩矩阵的方法,以及在预测变量数量非常大时稳健拟合预测模型的工具。所有这些工具都有广泛的适用领域——基本上,无论在哪里收集大数据,研究人员都会想要使用这些工具。将探讨的现象包括一些算法经历的相变,随着数据的不足采样和/或污染的数量增加,从成功恢复到失败,以及经典统计的基本公式,如最大似然估计量方差的Fisher信息公式,不再适用于高维统计的事实。我们期望对这些现象作出定量的精确解释。我们的定量解释将帮助工程师和科学家计划实验,并从大数据集做出可靠的推断。当变量数p和观测数n都很大时,多变量数据分析中的几个经典问题呈现出新的特征。这些问题包括线性模型的估计、线性模型的稳健估计、线性模型的稀疏惩罚估计以及服从因子模型的协方差矩阵的估计。特别是,在一系列领域的工作表明,如果问题的某些潜在特征是随机的(例如,非高斯预测变量,或均匀分布的特征向量),那么当p和n以固定比例趋于无穷大时,在极限中会发生各种意外。这些意外包括恢复感兴趣的对象的成功/失败中的急剧相变和额外的高斯噪声,这些噪声超出了在p固定n趋于无穷大的情况下没有经典对应的测量所引起的。在提出的工作中,我们将使用大量的计算实验和精确的理论计算,来预测和验证大n,大p设置下的这种令人惊讶的现象。我们的技术包括(在计算方面)用于设计和执行计算实验的新工具,这些实验涉及基于云的配置中的许多处理器,以及(在理论方面)用于理解压缩感知中的相变的分析工具,包括从无噪声测量中精确重建和凸优化重建的渐近均方误差以及非凸优化的渐近均方误差。在对矩阵的低秩模型的研究中,我们将发展和适应随机矩阵理论的最新进展,例如所谓的尖刺协方差模型的最新进展。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
David Donoho其他文献
Data Science at the Singularity
- DOI:
10.1162/99608f92.b91339ef - 发表时间:
2023-10 - 期刊:
- 影响因子:0
- 作者:
David Donoho - 通讯作者:
David Donoho
David Donoho的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('David Donoho', 18)}}的其他基金
Collaborative Research: A Focused Research Group on Multiscale Geometric Analysis--Theory, Tools, Applications
协作研究:多尺度几何分析重点研究小组——理论、工具、应用
- 批准号:
0140698 - 财政年份:2002
- 资助金额:
$ 70.06万 - 项目类别:
Standard Grant
Scientific Computing Research Environments for the Mathematical Sciences (SCREMS)
数学科学的科学计算研究环境 (SCREMS)
- 批准号:
0215486 - 财政年份:2002
- 资助金额:
$ 70.06万 - 项目类别:
Standard Grant
Mathematical Sciences: Exploiting Hidden Sparsity in Statistical Estimation
数学科学:利用统计估计中隐藏的稀疏性
- 批准号:
9209130 - 财政年份:1992
- 资助金额:
$ 70.06万 - 项目类别:
Continuing Grant
PYI: Mathematical Sciences: Signal Processing/Inverse Problems
PYI:数学科学:信号处理/反问题
- 批准号:
8451753 - 财政年份:1985
- 资助金额:
$ 70.06万 - 项目类别:
Continuing Grant
Mathematical Sciences Postdoctoral Research Fellowship
数学科学博士后研究奖学金
- 批准号:
8311683 - 财政年份:1983
- 资助金额:
$ 70.06万 - 项目类别:
Fellowship Award
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
- 批准号:
- 批准年份:2020
- 资助金额:40 万元
- 项目类别:
基于Linked Open Data的Web服务语义互操作关键技术
- 批准号:61373035
- 批准年份:2013
- 资助金额:77.0 万元
- 项目类别:面上项目
Molecular Interaction Reconstruction of Rheumatoid Arthritis Therapies Using Clinical Data
- 批准号:31070748
- 批准年份:2010
- 资助金额:34.0 万元
- 项目类别:面上项目
高维数据的函数型数据(functional data)分析方法
- 批准号:11001084
- 批准年份:2010
- 资助金额:16.0 万元
- 项目类别:青年科学基金项目
染色体复制负调控因子datA在细胞周期中的作用
- 批准号:31060015
- 批准年份:2010
- 资助金额:25.0 万元
- 项目类别:地区科学基金项目
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
An innovative platform using ML/AI to analyse farm data and deliver insights to improve farm performance, increasing farm profitability by 5-10%
An%20innovative%20platform%20using%20ML/AI%20to%20analysis%20farm%20data%20and%20deliver%20insights%20to%20improv%20farm%20performance,%20increasing%20farm%20profitability%20by%205-10%
- 批准号:
10093235 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
Collaborative R&D
Seamless integration of Financial data into ESG data
将财务数据无缝集成到 ESG 数据中
- 批准号:
10099890 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
Collaborative R&D
Patient Lifestyle and Disease Data Interactium (PaLaDIn)
患者生活方式和疾病数据交互 (PaLaDIn)
- 批准号:
10103989 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
EU-Funded
Patient Lifestyle and Disease Data Interactium (PaLaDIn)
患者生活方式和疾病数据交互 (PaLaDIn)
- 批准号:
10105921 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
EU-Funded
Treecle - data and automation to unlock woodland creation in the UK to achieve net zero
Treecle - 数据和自动化解锁英国林地创造以实现净零排放
- 批准号:
10111492 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
SME Support
NEMO - Net zero events using multiple open data sources
NEMO - 使用多个开放数据源的净零事件
- 批准号:
10114096 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
SME Support
Facilitating circular construction practices in the UK: A data driven online marketplace for waste building materials
促进英国的循环建筑实践:数据驱动的废弃建筑材料在线市场
- 批准号:
10113920 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
SME Support
Quantum Machine Learning for Financial Data Streams
金融数据流的量子机器学习
- 批准号:
10073285 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
Feasibility Studies
N2Vision+: A robot-enabled, data-driven machine vision tool for nitrogen diagnosis of arable soils
N2Vision:一种由机器人驱动、数据驱动的机器视觉工具,用于耕地土壤的氮诊断
- 批准号:
10091423 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
Collaborative R&D
Tracking flood waters over Australia using space gravity data
使用空间重力数据跟踪澳大利亚的洪水
- 批准号:
DP240102399 - 财政年份:2024
- 资助金额:
$ 70.06万 - 项目类别:
Discovery Projects