SPX: Automatically Parallelizing Approximate Data Analysis with Mergeable Summaries

SPX:通过可合并摘要自动并行化近似数据分析

基本信息

  • 批准号:
    1918989
  • 负责人:
  • 金额:
    $ 61.42万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2019
  • 资助国家:
    美国
  • 起止时间:
    2019-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

When analyzing massive data sets, generating exact answers to even very basic queries can require huge amounts of compute resources (memory, compute time, and network communication). Fortunately, in many settings, approximate answers suffice. Nevertheless, designing algorithms that operate efficiently at large scale remains a significant challenge. This project studies highly efficient, highly accurate, and highly scalable algorithms for approximate query processing. The key ingredient enabling such efficiency is streaming algorithms that generate mergeable summaries. Streaming algorithms in general compute a small summary of the data, from which it is possible to derive accurate (though approximate) answers to queries. When these summaries are also mergeable, one can process many data sets independently and combine their summaries to answer queries about their combinations (union, intersection, etc). Mergeable summaries enable massive data sets to be processed in highly scalable ways by distributing the data across machines, summarizing each partition, and seamlessly combining the results.This project will develop mergeable summaries for a variety of fundamental problems for which no practical mergeable summary is known (such as entropy approximation and unsupervised-learning tasks including k-means clustering and logistic regression). For other commonly-used problems, such as estimating the number of distinct items in a data stream, the project will substantially improve upon known, already practical, mergeable summaries. This project is tightly entwined with the development of the Data Sketches library, an open-source library of production-quality implementations of mergeable summaries that is widely used in industry and government.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在分析海量数据集时,即使是非常基本的查询,也需要大量的计算资源(内存、计算时间和网络通信)。幸运的是,在许多情况下,近似的答案就足够了。然而,设计大规模有效运行的算法仍然是一个重大挑战。本计画研究高效率、高准确度、高扩充性的近似查询处理演算法。实现这种效率的关键因素是生成可合并摘要的流算法。流算法通常计算数据的一个小摘要,从中可以得出查询的准确(尽管近似)答案。当这些摘要也是可合并的时,人们可以独立地处理许多数据集,并且联合收割机组合它们的摘要以回答关于它们的组合(并集、交集等)的查询。可合并摘要通过将数据分布在机器上,汇总每个分区,并无缝组合结果,使海量数据集以高度可扩展的方式进行处理。该项目将为各种基本问题开发可合并摘要,这些问题目前还没有实用的可合并摘要(例如熵近似和无监督学习任务,包括k均值聚类和逻辑回归)。对于其他常用的问题,例如估计数据流中不同项目的数量,该项目将大大改进已知的,已经实用的,可合并的摘要。该项目与Data Sketches库的开发紧密相连,Data Sketches库是一个开源的可合并摘要的生产质量实现库,广泛用于工业和政府。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(16)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Optimal Parallel Algorithms in the Binary-Forking Model
二分叉模型中的最优并行算法
Parallel Shortest Paths with Negative Edge Weights
具有负边权重的并行最短路径
Contention Resolution with Message Deadlines
Improved Work Span Tradeoff for Single Source Reachability and Approximate Shortest Paths
改进工作跨度权衡单一源可达性和近似最短路径
Brief Announcement: Nested Active-Time Scheduling
简短公告:嵌套活动时间调度
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Justin Thaler其他文献

The Sum-Check Protocol over Fields of Small Characteristic
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Justin Thaler
  • 通讯作者:
    Justin Thaler
BabySpartan: Lasso-based SNARK for non-uniform computation
BabySpartan:基于 Lasso 的 SNARK,用于非均匀计算
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Srinath T. V. Setty;Justin Thaler
  • 通讯作者:
    Justin Thaler

Justin Thaler的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Justin Thaler', 18)}}的其他基金

CAREER: The Polynomial Method in Complexity and Cryptography
职业:复杂性和密码学中的多项式方法
  • 批准号:
    1845125
  • 财政年份:
    2019
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Continuing Grant

相似海外基金

An AI-driven clinical washbasin unit that automatically disinfects pathogens, reduces aerosols and decreases healthcare-acquired infections by 70%
%20人工智能驱动%20临床%20洗脸盆%20单位%20%20自动%20消毒%20病原体,%20减少%20气溶胶%20和%20减少%20医疗保健获得性%20感染%20by%2070%
  • 批准号:
    83001507
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Innovation Loans
Collaborative Research: SHF: Medium: Improving Software Quality by Automatically Reproducing Failures from Bug Reports
协作研究:SHF:中:通过自动重现错误报告中的故障来提高软件质量
  • 批准号:
    2403747
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Continuing Grant
This project will leverage artificial neural networks to automatically build various components of particle filters.
该项目将利用人工神经网络自动构建粒子滤波器的各种组件。
  • 批准号:
    2841890
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Studentship
The DOVE Device to Prevent Opioid Overdose Deaths: An Armband That Senses Overdose and Automatically Injects Naloxone
防止阿片类药物过量死亡的 DOVE 装置:可感应过量并自动注射纳洛酮的臂带
  • 批准号:
    10485568
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
Innovative software for the mining industry that automatically designs optimally shaped slopes in any lithology within a suitably short runtime
适用于采矿业的创新软件,可在适当短的运行时间内自动设计任何岩性的最佳形状斜坡
  • 批准号:
    10078412
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Collaborative R&D
PatentPulseAI: feasibility study of an AI-based solution to automatically assess the value of a given patent and continuously check for infringement.
PatentPulseAI:基于人工智能的解决方案的可行性研究,可自动评估给定专利的价值并持续检查侵权情况。
  • 批准号:
    10079851
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Collaborative R&D
Using Re-inforcement Learning to Automatically Adapt a Remote Therapy Intervention (RTI) for Reducing Adolescent Violence Involvement
使用强化学习自动调整远程治疗干预 (RTI),以减少青少年暴力参与
  • 批准号:
    10834339
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
Detection and Analysis of Automatically Generated Text according to the Applications
根据应用自动生成文本的检测和分析
  • 批准号:
    23K11767
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Evaluation of Matched and Unmatched Stimuli on the Maintenance of Treatment Effects for Automatically Maintained Self-Injurious Behavior (AUTO)
匹配和不匹配刺激对自动维持自伤行为(AUTO)治疗效果维持的评估
  • 批准号:
    10729880
  • 财政年份:
    2023
  • 资助金额:
    $ 61.42万
  • 项目类别:
ArchAI: Using AI to automatically detect archaeology on EO data
ArchAI:利用人工智能自动检测对地观测数据的考古学
  • 批准号:
    10047167
  • 财政年份:
    2022
  • 资助金额:
    $ 61.42万
  • 项目类别:
    Collaborative R&D
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了