权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Computational and Statistical Tradeoffs in Massive Data Analysis

职业：海量数据分析中的计算和统计权衡

基本信息

批准号：
1350590
负责人：
Venkat Chandrasekaran
金额：
$ 47.5万
依托单位：
California Institute of Technology
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-02-01 至 2020-01-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1350590&HistoricalAwards=false
关键词：
CAREER Computational Statistical Tradeoffs Massive

项目摘要

In modern signal processing, one is frequently faced with statistical inference problems involving massive datasets. For example, the experiments at the Large Hadron Collider at CERN generate hundreds of petabytes of data each year, which must be stored and processed efficiently in order to further our understanding of particle physics. Similar challenges also arise in seismic monitoring where massive amounts of data are acquired over large areas via cellphone accelerometers. Analyzing such large datasets is usually viewed as a substantial computational challenge. However, if data are a signal processor?s main resource then access to more data should be viewed as an asset rather than as a burden, and larger datasets should lead to a reduction in the runtime of data analysis algorithms.This project blends concepts from computer science and from statistical signal processing to address the challenge with massive datasets by developing ?algorithm weakening? frameworks in which a data analysis procedure backs off to simpler methods as the data scale in size, leveraging the growing inferential strength of the data to ensure that a desired level of statistical accuracy is achieved with reduced runtime. The approach is concretely illustrated across a range of statistical estimation tasks, with convex relaxation techniques playing a prominent role as an algorithm weakening mechanism. In seeking a precise characterization of the computational and statistical tradeoffs obtained via convex relaxation, the investigator formalizes and studies new measures for characterizing the quality of approximation of one convex set by another. An interesting feature of this research is that convex relaxations which provide poor performance in combinatorial optimization problems may nonetheless yield useful solutions when employed in problems with inferential objectives.

在现代信号处理中，经常会遇到涉及海量数据集的统计推断问题。例如，欧洲核子研究中心大型强子对撞机的实验每年产生数百PB的数据，这些数据必须被有效地存储和处理，才能进一步加深我们对粒子物理的理解。类似的挑战也出现在地震监测领域，即通过手机加速度计在大范围内获取海量数据。分析如此庞大的数据集通常被视为一项巨大的计算挑战。然而，如果数据是信号处理器？S的主要资源，那么获取更多数据应该被视为一种资产，而不是负担，更大的数据集应该导致数据分析算法运行时间的减少。该项目融合了计算机科学和统计信号处理的概念，通过开发？算法弱化？框架中，随着数据规模的扩大，数据分析过程退回到更简单的方法，利用不断增长的数据推理强度，以确保在减少运行时间的情况下实现所需的统计准确性水平。该方法在一系列统计估计任务中得到了具体的说明，其中凸松弛技术作为一种算法弱化机制发挥了突出的作用。在寻求通过凸松弛获得的计算和统计权衡的精确特征的过程中，研究者形式化并研究了表征一个凸集对另一个凸集的逼近质量的新度量。这项研究的一个有趣的特点是，在组合优化问题中表现不佳的凸松弛算法，在用于具有推理目标的问题时，仍然可以产生有用的解。