AitF: FULL: Query Processing with Optimal Communication Cost

AitF:FULL:具有最佳通信成本的查询处理

基本信息

  • 批准号:
    1535565
  • 负责人:
  • 金额:
    $ 72万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-08-15 至 2020-07-31
  • 项目状态:
    已结题

项目摘要

Big Data analytics is changing traditional query processing in two ways. The first is a shift from single server or small-scale parallel relational databases to massively distributed architectures, where hundreds or thousands of servers are used during the computation of a single query. The second is an increased complexity in the queries being issued, from single- or star-joins, to complex graph-like structured queries. This project develops new algorithms for query processing over large distributed systems, which are optimized for the cost of communication, then implements and evaluates these algorithms in an open-source big data management system and service.The project studies a new approach to query evaluation that computes the entire query at once, replacing the traditional approach based on a query plan. The theoretical part of this project builds on a new model, called the Massively Parallel Communication model (MPC), where the communication is the only cost. The system development is performed over the Myria big data management system and service.The Intellectual Merit of the project consists in advancing the state of the art in both the theory and systems approaches to query evaluation in modern, massive-scale shared-nothing clusters. It develops new, fundamental algorithms for processing queries over massively distributed architectures, with a provably optimal communication cost. The project implements and deploys these algorithms in a system, validating and informing the theoretical model. In particular, the project makes the following contributions: it develops provably optimal, one-round algorithms for skewed data; it studies how and when multiple rounds can be used to further reduce the communication cost; it experiments with these novel algorithms on clusters with up to 1000 worker processes; and it develops a new theoretical model for the communication cost on large shared-nothing architectures with heterogeneous hardware.The Broader Impact of the project is to contribute to a new architecture for massively parallel query processing, where the traditional multi-step, single-join query evaluation approaches are replaced with novel, single-step, multi-join algorithms. This change has the potential to lead to more efficient big data analytics engines, allowing data analysts to explore large datasets more efficiently. As an immediate application, the project will impact the domain scientists who already use the Myria big data management system and service. All algorithmic discoveries in this project will be implemented in the Myria system, and will significantly improve query performance, allowing domain scientists to conduct more complex analytics and explorations over their data.
大数据分析正在以两种方式改变传统的查询处理。首先是从单服务器或小规模并行关系数据库转向大规模分布式体系结构,其中在计算单个查询期间使用数百或数千台服务器。第二个是正在发出的查询的复杂性增加,从单连接或星型连接到复杂的类似图的结构化查询。该项目为大型分布式系统的查询处理开发了新的算法,这些算法针对通信成本进行了优化,然后在开源大数据管理系统和服务中实现和评估这些算法。该项目研究了一种新的查询计算方法,该方法一次计算整个查询,取代了传统的基于查询计划的方法。这个项目的理论部分建立在一个新的模型上,称为大规模并行通信模型(MPC),其中通信是唯一的成本。系统的开发是在Myria大数据管理系统和服务上进行的。该项目的智力价值在于推进了现代大规模无共享集群中查询评估的理论和系统方法的最新水平。它开发了新的基本算法,用于处理大规模分布式架构上的查询,具有可证明的最佳通信成本。该项目在一个系统中实现和部署这些算法,验证并告知理论模型。特别是,该项目做出了以下贡献:它为倾斜数据开发了可证明的最优单轮算法;研究了如何以及何时使用多轮来进一步降低通信成本;它在多达1000个工作进程的集群上实验这些新算法;提出了一种新的基于异构硬件的大型无共享架构的通信成本理论模型。该项目的更广泛影响是为大规模并行查询处理提供一个新的体系结构,其中传统的多步骤、单连接查询计算方法被新的、单步骤、多连接算法所取代。这一变化有可能带来更高效的大数据分析引擎,使数据分析师能够更有效地探索大型数据集。作为即时应用,该项目将影响已经使用Myria大数据管理系统和服务的领域科学家。这个项目中的所有算法发现都将在Myria系统中实现,并将显著提高查询性能,允许领域科学家对他们的数据进行更复杂的分析和探索。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Bag Query Containment and Information Theory
包查询遏制和信息论
  • DOI:
    10.1145/3472391
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Khamis, Mahmoud Abo;Kolaitis, Phokion G.;Ngo, Hung Q.;Suciu, Dan
  • 通讯作者:
    Suciu, Dan
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Dan Suciu其他文献

A Dichotomy for the Generalized Model Counting Problem for Unions of Conjunctive Queries
连接查询并集广义模型计数问题的二分法
Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop
优化 Hadoop 中的大规模半简单数据记录评估
  • DOI:
  • 发表时间:
    2012
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Marianne Shaw;Paraschos Koutris;Bill Howe;Dan Suciu
  • 通讯作者:
    Dan Suciu
Integrating Network-Bound XML Data
集成网络绑定的 XML 数据
XViz: A Tool for Visualizing XPath Expressions
XViz:可视化 XPath 表达式的工具
Cytosolic protein ubiquitylation in normal and endotoxin stimulated human peripheral blood mononuclear cells
正常和内毒素刺激的人外周血单核细胞中胞质蛋白的泛素化
  • DOI:
  • 发表时间:
    2000
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Majetschak;Dan Suciu;K. Häsler;U. Obertacke;F. Schade;H. Jennissen
  • 通讯作者:
    H. Jennissen

Dan Suciu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Dan Suciu', 18)}}的其他基金

III: Small: Datalog with Aggregates: Complexity, Optimization, Evaluation
III:小:带有聚合的数据记录:复杂性、优化、评估
  • 批准号:
    2314527
  • 财政年份:
    2023
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
NSF-BSF: III: Small: Data Driven Schema
NSF-BSF:III:小型:数据驱动模式
  • 批准号:
    2109922
  • 财政年份:
    2021
  • 资助金额:
    $ 72万
  • 项目类别:
    Continuing Grant
III: Medium: Collaborative Research: Reasoning about Optimizers for Data-Intensive Systems
III:媒介:协作研究:数据密集型系统优化器的推理
  • 批准号:
    1954222
  • 财政年份:
    2020
  • 资助金额:
    $ 72万
  • 项目类别:
    Continuing Grant
III:Small: Optimal Query Processing meets Information Theory: from Proofs to Algorithms
III:Small:最优查询处理遇到信息论:从证明到算法
  • 批准号:
    1907997
  • 财政年份:
    2019
  • 资助金额:
    $ 72万
  • 项目类别:
    Continuing Grant
III: Medium: Collaborative Research: A Unified and Declarative Approach to Causal Analysis for Big Data
III:媒介:协作研究:大数据因果分析的统一声明式方法
  • 批准号:
    1703281
  • 财政年份:
    2017
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
III: Small: Scalable Probabilistic Inference for Large Knowledge Bases
III:小:大型知识库的可扩展概率推理
  • 批准号:
    1614738
  • 财政年份:
    2016
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
BIGDATA: Mid-Scale: DCM: A Formal Foundation for Big Data Management
BIGDATA:中型:DCM:大数据管理的正式基础
  • 批准号:
    1247469
  • 财政年份:
    2013
  • 资助金额:
    $ 72万
  • 项目类别:
    Continuing Grant
III: Small: Query Compilation on Probabilistic Databases
III:小:概率数据库上的查询编译
  • 批准号:
    1115188
  • 财政年份:
    2011
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
III: Small: BeliefDB - Adding Belief Annotations to Databases
III:小:BeliefDB - 向数据库添加信念注释
  • 批准号:
    0915054
  • 财政年份:
    2009
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
III COR: Query Evaluation and View Materialization in Probabilistic Data
III COR:概率数据中的查询评估和视图具体化
  • 批准号:
    0713576
  • 财政年份:
    2007
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant

相似国自然基金

钴基Full-Heusler合金的掺杂效应和薄膜噪声特性研究
  • 批准号:
    51871067
  • 批准年份:
    2018
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目

相似海外基金

Human-Robot Co-Evolution: Achieving the full potential of future workplaces
人机协同进化:充分发挥未来工作场所的潜力
  • 批准号:
    DP240100938
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Discovery Projects
SAFER - Secure Foundations: Verified Systems Software Above Full-Scale Integrated Semantics
SAFER - 安全基础:高于全面集成语义的经过验证的系统软件
  • 批准号:
    EP/Y035976/1
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Research Grant
Collaborative Research: NSFGEO-NERC: Advancing capabilities to model ultra-low velocity zone properties through full waveform Bayesian inversion and geodynamic modeling
合作研究:NSFGEO-NERC:通过全波形贝叶斯反演和地球动力学建模提高超低速带特性建模能力
  • 批准号:
    2341238
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
CAREER: Informed Testing — From Full-Field Characterization of Mechanically Graded Soft Materials to Student Equity in the Classroom
职业:知情测试 – 从机械分级软材料的全场表征到课堂上的学生公平
  • 批准号:
    2338371
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Standard Grant
CAREER: From Flamelet to Full-Scale: Advancing Plasma-Assisted Combustion for Low-Emission Sustainable Fuels
职业生涯:从小火焰到全面:推进低排放可持续燃料的等离子体辅助燃烧
  • 批准号:
    2339518
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Continuing Grant
STTR Phase II: Dermatologist-level detection of suspicious pigmented skin lesions from high-resolution full-body images
STTR II 期:通过高分辨率全身图像对可疑色素性皮肤病变进行皮肤科医生级别的检测
  • 批准号:
    2335086
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Cooperative Agreement
Toward carbon-neutral society: Development of a full-sustainable eco-friendly green mining process for gold recovery
迈向碳中和社会:开发完全可持续的环保绿色采矿工艺以回收黄金
  • 批准号:
    24K17540
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Collaborative Research: NSFGEO-NERC: Advancing capabilities to model ultra-low velocity zone properties through full waveform Bayesian inversion and geodynamic modeling
合作研究:NSFGEO-NERC:通过全波形贝叶斯反演和地球动力学建模提高超低速带特性建模能力
  • 批准号:
    2341237
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Continuing Grant
All Analogue Full-duplex Dual-receiver Radio for Wideband Mm-wave Communications
用于宽带毫米波通信的全模拟全双工双接收器无线电
  • 批准号:
    EP/X041581/1
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Research Grant
Full mitigation of birefringence for high-precision optical experiments
完全缓解双折射,实现高精度光学实验
  • 批准号:
    24K00649
  • 财政年份:
    2024
  • 资助金额:
    $ 72万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了