Efficient query processing and optimizations for big data workloads
针对大数据工作负载的高效查询处理和优化
基本信息
- 批准号:RGPIN-2015-04587
- 负责人:
- 金额:$ 4.37万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2016
- 资助国家:加拿大
- 起止时间:2016-01-01 至 2017-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Every aspect of computing has been experiencing exponential growth, from sensory data acquisition throughput to processor power, storage and bandwidth. These exponential improvements are enabling the big data revolution. Big data applications consist of volumes of data that are constantly produced in a streaming fashion (e.g., sensor readings, logs, click-through etc.). In addition typical research and analysis workflows on big data are iterative. Namely a model is built using some data parameters, then iteratively refined using the output of the previous modeling phase. Both such primitives, namely streaming data generation and iterative analysis workflows, provide a lot of opportunity for optimizations.
The goal of this project is to explore these primitives and deliver fundamental algorithms and techniques to efficiently process and optimize big data workloads. The end goal is to encompass such techniques into end-to-end data processing architecture. Streaming data generation provides the opportunity to maintain models already computed on the data in an incremental fashion. As a first example, a statistical operator computed on a data set can be incrementally maintained for new data appended in the data set. In addition, models already computed on the data can be combined (among themselves or with base data) incrementally to compute answers to new modeling query requests. Combining two models could be vastly superior in terms of performance than computing a new model from scratch.
In this project we plan to introduce incremental computations as a first class citizen in our system design. We will incrementally maintain models of interest (as new data arrive); via materialization of such models, analysis phases will be able to re-use results available from prior analysis. Suitable optimization frameworks will be developed to assess when and under what conditions such combinations and incremental maintenance of models is beneficial. It is evident that the performance of subsequent analysis tasks will benefit from model re-use and/or combination for a wide class of models exploring both exact and approximate computations. Second, we plan to build an end-to-end system encompassing our innovations. Our design will be centered on popular languages for statistical processing and data analysis to express modeling workloads (e.g., R) and the suitable systems infrastructure to implement and execute our framework.
The end product of our research will be a system encompassing all of the research conducted delivering very fast big data analytics utilizing familiar analytical query processing interfaces such as R. Such a system will benefit and help data scientists conduct advanced research in a fraction of the time required, by being able to seamlessly re-use and share results in an incremental fashion.
计算的每个方面都在经历指数级增长,从传感数据采集吞吐量到处理器能力、存储和带宽。这些指数级的改进正在推动大数据革命。大数据应用程序由以流式方式不断产生的大量数据组成(例如,传感器读数、日志、点击等)。此外,大数据的典型研究和分析工作流程是迭代的。也就是说,使用一些数据参数构建模型,然后使用前一建模阶段的输出迭代地细化。这两个原语,即流数据生成和迭代分析工作流,提供了很多优化的机会。
该项目的目标是探索这些原语,并提供基本算法和技术,以有效地处理和优化大数据工作负载。最终目标是将这些技术包含到端到端数据处理架构中。流数据生成提供了以增量方式维护已经在数据上计算的模型的机会。作为第一示例,可以针对附加在数据集中的新数据递增地维护在数据集上计算的统计算子。此外,已经在数据上计算的模型可以增量地组合(在它们之间或与基础数据)以计算对新建模查询请求的回答。结合两个模型在性能方面可能比从头开始计算一个新模型要上级得多。
在这个项目中,我们计划在我们的系统设计中引入增量计算作为一等公民。我们将逐步维护感兴趣的模型(随着新数据的到来);通过这些模型的具体化,分析阶段将能够重用先前分析的结果。将制定适当的优化框架,以评估何时以及在何种条件下,这种组合和模型的增量维护是有益的。很明显,后续分析任务的性能将受益于模型的重用和/或广泛的一类模型的组合,探索精确和近似计算。其次,我们计划建立一个包含我们创新的端到端系统。我们的设计将集中在用于统计处理和数据分析的流行语言上,以表达建模工作负载(例如,R)和合适的系统基础设施来实现和执行我们的框架。
我们研究的最终产品将是一个包含所有研究的系统,利用熟悉的分析查询处理接口(如R)提供非常快速的大数据分析。这样的系统将有助于数据科学家在所需时间的一小部分内进行高级研究,因为它能够以增量的方式无缝地重用和共享结果。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Koudas, Nikolaos(Nick)其他文献
Koudas, Nikolaos(Nick)的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Koudas, Nikolaos(Nick)', 18)}}的其他基金
Efficient query processing and optimizations for big data workloads
针对大数据工作负载的高效查询处理和优化
- 批准号:
RGPIN-2015-04587 - 财政年份:2017
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
Efficient query processing and optimizations for big data workloads
针对大数据工作负载的高效查询处理和优化
- 批准号:
RGPIN-2015-04587 - 财政年份:2015
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Data-Parallel Algorithms for Efficient Query Processing on Modern Hardware
现代硬件上高效查询处理的数据并行算法
- 批准号:
RGPIN-2020-06639 - 财政年份:2022
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
Efficient and Scalable Similarity Query Processing on Big Streaming Graphs
大流图上的高效且可扩展的相似性查询处理
- 批准号:
DP210101393 - 财政年份:2021
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Projects
Data-Parallel Algorithms for Efficient Query Processing on Modern Hardware
现代硬件上高效查询处理的数据并行算法
- 批准号:
RGPIN-2020-06639 - 财政年份:2021
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
Data-Parallel Algorithms for Efficient Query Processing on Modern Hardware
现代硬件上高效查询处理的数据并行算法
- 批准号:
RGPIN-2020-06639 - 财政年份:2020
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
Data-Parallel Algorithms for Efficient Query Processing on Modern Hardware
现代硬件上高效查询处理的数据并行算法
- 批准号:
DGECR-2020-00324 - 财政年份:2020
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Launch Supplement
Efficient query processing and optimizations for big data workloads
针对大数据工作负载的高效查询处理和优化
- 批准号:
RGPIN-2015-04587 - 财政年份:2019
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
Efficient Query Processing for Learning-based Data Management
基于学习的数据管理的高效查询处理
- 批准号:
19K11979 - 财政年份:2019
- 资助金额:
$ 4.37万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
CAREER: Efficient Query Processing for Private Data Federations
职业:私有数据联合的高效查询处理
- 批准号:
1846447 - 财政年份:2019
- 资助金额:
$ 4.37万 - 项目类别:
Continuing Grant
Efficient query processing and optimizations for big data workloads
针对大数据工作负载的高效查询处理和优化
- 批准号:
RGPIN-2015-04587 - 财政年份:2018
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual
Efficient query processing and optimizations for big data workloads
针对大数据工作负载的高效查询处理和优化
- 批准号:
RGPIN-2015-04587 - 财政年份:2017
- 资助金额:
$ 4.37万 - 项目类别:
Discovery Grants Program - Individual














{{item.name}}会员




