CAREER: Computational and Statistical Tradeoffs in Massive Data Analysis
职业:海量数据分析中的计算和统计权衡
基本信息
- 批准号:1350590
- 负责人:
- 金额:$ 47.5万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-02-01 至 2020-01-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In modern signal processing, one is frequently faced with statistical inference problems involving massive datasets. For example, the experiments at the Large Hadron Collider at CERN generate hundreds of petabytes of data each year, which must be stored and processed efficiently in order to further our understanding of particle physics. Similar challenges also arise in seismic monitoring where massive amounts of data are acquired over large areas via cellphone accelerometers. Analyzing such large datasets is usually viewed as a substantial computational challenge. However, if data are a signal processor?s main resource then access to more data should be viewed as an asset rather than as a burden, and larger datasets should lead to a reduction in the runtime of data analysis algorithms.This project blends concepts from computer science and from statistical signal processing to address the challenge with massive datasets by developing ?algorithm weakening? frameworks in which a data analysis procedure backs off to simpler methods as the data scale in size, leveraging the growing inferential strength of the data to ensure that a desired level of statistical accuracy is achieved with reduced runtime. The approach is concretely illustrated across a range of statistical estimation tasks, with convex relaxation techniques playing a prominent role as an algorithm weakening mechanism. In seeking a precise characterization of the computational and statistical tradeoffs obtained via convex relaxation, the investigator formalizes and studies new measures for characterizing the quality of approximation of one convex set by another. An interesting feature of this research is that convex relaxations which provide poor performance in combinatorial optimization problems may nonetheless yield useful solutions when employed in problems with inferential objectives.
在现代信号处理中,经常会遇到涉及海量数据集的统计推断问题。例如,欧洲核子研究中心大型强子对撞机的实验每年产生数百PB的数据,这些数据必须被有效地存储和处理,才能进一步加深我们对粒子物理的理解。类似的挑战也出现在地震监测领域,即通过手机加速度计在大范围内获取海量数据。分析如此庞大的数据集通常被视为一项巨大的计算挑战。然而,如果数据是信号处理器?S的主要资源,那么获取更多数据应该被视为一种资产,而不是负担,更大的数据集应该导致数据分析算法运行时间的减少。该项目融合了计算机科学和统计信号处理的概念,通过开发?算法弱化?框架中,随着数据规模的扩大,数据分析过程退回到更简单的方法,利用不断增长的数据推理强度,以确保在减少运行时间的情况下实现所需的统计准确性水平。该方法在一系列统计估计任务中得到了具体的说明,其中凸松弛技术作为一种算法弱化机制发挥了突出的作用。在寻求通过凸松弛获得的计算和统计权衡的精确特征的过程中,研究者形式化并研究了表征一个凸集对另一个凸集的逼近质量的新度量。这项研究的一个有趣的特点是,在组合优化问题中表现不佳的凸松弛算法,在用于具有推理目标的问题时,仍然可以产生有用的解。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Venkat Chandrasekaran其他文献
Optimal Regularization for a Data Source
- DOI:
10.1007/s10208-025-09693-y - 发表时间:
2025-01-27 - 期刊:
- 影响因子:2.700
- 作者:
Oscar Leong;Eliza O’ Reilly;Yong Sheng Soh;Venkat Chandrasekaran - 通讯作者:
Venkat Chandrasekaran
Venkat Chandrasekaran的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Venkat Chandrasekaran', 18)}}的其他基金
Learning Algorithms for Inverse Problems from Data: Statistical and Computational Foundations
从数据中学习反问题的算法:统计和计算基础
- 批准号:
2113724 - 财政年份:2021
- 资助金额:
$ 47.5万 - 项目类别:
Standard Grant
相似国自然基金
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
STATISTICAL AND COMPUTATIONAL THRESHOLDS IN SPIN GLASSES AND GRAPH INFERENCE PROBLEMS
自旋玻璃和图推理问题的统计和计算阈值
- 批准号:
2347177 - 财政年份:2024
- 资助金额:
$ 47.5万 - 项目类别:
Standard Grant
Collaborative Research: The computational and neural basis of statistical learning during musical enculturation
合作研究:音乐文化过程中统计学习的计算和神经基础
- 批准号:
2242084 - 财政年份:2023
- 资助金额:
$ 47.5万 - 项目类别:
Standard Grant
Collaborative Research: The computational and neural basis of statistical learning during musical enculturation
合作研究:音乐文化过程中统计学习的计算和神经基础
- 批准号:
2242085 - 财政年份:2023
- 资助金额:
$ 47.5万 - 项目类别:
Standard Grant
Conference: Advances in Statistical and Computational Methods for Analysis of Biomedical, Genetic, and Omics Data
会议:生物医学、遗传和组学数据分析的统计和计算方法的进展
- 批准号:
2232547 - 财政年份:2023
- 资助金额:
$ 47.5万 - 项目类别:
Standard Grant
New statistical and computational tools for optimization of planarian behavioral chemical screens
用于优化涡虫行为化学筛选的新统计和计算工具
- 批准号:
10658688 - 财政年份:2023
- 资助金额:
$ 47.5万 - 项目类别:
CORE 1/2: INIA Stress and Chronic Alcohol Interactions: Computational and Statistical Analysis Core (CSAC)
CORE 1/2:INIA 压力和慢性酒精相互作用:计算和统计分析核心 (CSAC)
- 批准号:
10411629 - 财政年份:2022
- 资助金额:
$ 47.5万 - 项目类别:
Statistical Methods and Computational Tools for Marine Animal Movement, Distribution and Population Size
海洋动物运动、分布和种群规模的统计方法和计算工具
- 批准号:
RGPIN-2019-05688 - 财政年份:2022
- 资助金额:
$ 47.5万 - 项目类别:
Discovery Grants Program - Individual
Developing computational, statistical and machine learning methods to uncover biological mechanisms of complex phenotypes
开发计算、统计和机器学习方法来揭示复杂表型的生物学机制
- 批准号:
RGPIN-2021-04062 - 财政年份:2022
- 资助金额:
$ 47.5万 - 项目类别:
Discovery Grants Program - Individual
Statistical and Computational Tools for Analyzing High-Dimensional Heterogeneous Data
用于分析高维异构数据的统计和计算工具
- 批准号:
2210907 - 财政年份:2022
- 资助金额:
$ 47.5万 - 项目类别:
Standard Grant
Bridging Statistical Hypothesis Tests and Deep Learning for Reliability and Computational Efficiency
连接统计假设检验和深度学习以提高可靠性和计算效率
- 批准号:
2134037 - 财政年份:2022
- 资助金额:
$ 47.5万 - 项目类别:
Continuing Grant