Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
基本信息
- 批准号:RGPIN-2016-03877
- 负责人:
- 金额:$ 2.26万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
As the volume of data grows, and the speed of new data generation increases, many applications process their data on highly-parallel distributed data management systems. The objective of this proposal is to study distributed data management systems with a focus on two areas: (1) theoretical foundations of massive-scale data management systems; and (2) systems for processing large-scale graph data.***On theoretical foundations, my focus is on understanding the fundamental limitations of distributed algorithms for answering queries over relational data and evaluate how "good" they are. Distributed algorithms differ in their parallelism levels, communication and computation costs, and the number of rounds of computation, i.e., machine synchronizations, they require. My goal is to derive lower or upper bounds on the costs of algorithms that perform a task over data. Specifically, I intend to study two questions:*** 1. Power of synchronization: How many rounds of computation are needed to answer a query?*** 2. Limits of parallelism: What is the maximum number of machines that can be utilized to answer a query?***On large-scale graph processing, my focus is on general-purpose distributed graph systems. Large-scale graphs are at the core of many applications, such as web search, social networks, and genetic analysis. Broadly, these applications perform the following tasks on graphs: (1) batch graph algorithms; (2) machine-learning algorithms; (3) finding subgraphs; and (4) real-time analysis if the graph is evolving. Existing systems support one or two of these tasks. My goal is to build a system that supports all of these tasks and is based on the timely dataflow (TD) execution model. TD is an inherently streaming model, which can support real-time analysis, but has mechanisms to support synchronous and asynchronous computations, which can support batch graph algorithms, machine-learning algorithms and subgraph finding. Specifically, I intend to study the following issues:*** 1. TD operators of a general-purpose graph system.*** 2. Query languages and APIs that efficiently compile to these TD operators.*** 3. Storage models that are efficient under limited cluster memory.*** 4. Tools for testing and debugging graph applications running on TD.***This work will establish theoretical foundations for massive-scale data processing, and produce an open-source prototype system advancing the state of the art in large-scale graph processing.**
随着数据量的增长和新数据生成速度的提高,许多应用程序在高度并行的分布式数据管理系统上处理数据。本提案的目的是研究分布式数据管理系统,重点是两个领域:(1)大规模数据管理系统的理论基础;(2)处理大规模图形数据的系统。在理论基础上,我的重点是理解分布式算法在回答关系数据查询时的基本局限性,并评估它们有多“好”。分布式算法的并行度水平、通信和计算成本以及计算轮数不同,即,机器同步,他们需要。我的目标是推导出在数据上执行任务的算法的成本的下限或上限。具体而言,我打算研究两个问题:** 1。同步的威力:回答一个查询需要多少轮计算?* 2.并行度的限制:可以用来回答查询的最大机器数量是多少?*在大规模图形处理方面,我的重点是通用分布式图形系统。大规模图是许多应用的核心,例如Web搜索、社交网络和遗传分析。概括地说,这些应用程序在图上执行以下任务:(1)批处理图算法;(2)机器学习算法;(3)查找子图;以及(4)实时分析图是否正在演变。现有的系统支持这些任务中的一个或两个。我的目标是构建一个支持所有这些任务的系统,并基于及时更新(TD)执行模型。TD是一个固有的流模型,它可以支持实时分析,但具有支持同步和异步计算的机制,可以支持批处理图算法,机器学习算法和子图查找。具体而言,我打算研究以下问题:** 1。通用图系统的TD算子。* 2.查询语言和API可高效编译为这些TD运算符。* 3.在有限的群集内存下高效的存储模型。* 4.用于测试和调试TD上运行的图形应用程序的工具。*这项工作将为大规模数据处理奠定理论基础,并产生一个开源原型系统,推进大规模图形处理的最新技术。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Salihoglu, Semih其他文献
Salihoglu, Semih的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Salihoglu, Semih', 18)}}的其他基金
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Continuous Graph Querying and Graph OLAP Using Differential Computation
使用差分计算的连续图查询和图 OLAP
- 批准号:
531863-2018 - 财政年份:2021
- 资助金额:
$ 2.26万 - 项目类别:
Collaborative Research and Development Grants
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2021
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Continuous Graph Querying and Graph OLAP Using Differential Computation
使用差分计算的连续图查询和图 OLAP
- 批准号:
531863-2018 - 财政年份:2020
- 资助金额:
$ 2.26万 - 项目类别:
Collaborative Research and Development Grants
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2020
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Continuous Graph Querying and Graph OLAP Using Differential Computation
使用差分计算的连续图查询和图 OLAP
- 批准号:
531863-2018 - 财政年份:2019
- 资助金额:
$ 2.26万 - 项目类别:
Collaborative Research and Development Grants
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2019
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2017
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2016
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
2347760 - 财政年份:2023
- 资助金额:
$ 2.26万 - 项目类别:
Continuing Grant
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2022
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2021
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2020
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2019
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
CAREER: Computer-Intensive Statistical Inference on High-Dimensional and Massive Data: From Theoretical Foundations to Practical Computations
职业:高维海量数据的计算机密集统计推断:从理论基础到实际计算
- 批准号:
1752614 - 财政年份:2018
- 资助金额:
$ 2.26万 - 项目类别:
Continuing Grant
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2017
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual
Foundations of Model Driven Discovery from Massive Data
海量数据中模型驱动发现的基础
- 批准号:
1740741 - 财政年份:2017
- 资助金额:
$ 2.26万 - 项目类别:
Standard Grant
CAREER: Massive Uniform Manipulation: Algorithmic and Control Theoretic Foundations for Large Populations of Simple Robots Controlled by Uniform Inputs
职业:大规模均匀操纵:均匀输入控制的大量简单机器人的算法和控制理论基础
- 批准号:
1553063 - 财政年份:2016
- 资助金额:
$ 2.26万 - 项目类别:
Continuing Grant
Systems and Foundations For Massive-Scale Data Management
大规模数据管理的系统和基础
- 批准号:
RGPIN-2016-03877 - 财政年份:2016
- 资助金额:
$ 2.26万 - 项目类别:
Discovery Grants Program - Individual