III: Medium: Learning-based Synthesis of Data Processing Engines

III：媒介：基于学习的数据处理引擎综合

基本信息

批准号：
1900933
负责人：
Tim Kraska
金额：
$ 120万
依托单位：
Massachusetts Institute of Technology
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-09-01 至 2023-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1900933&HistoricalAwards=false
关键词：
III Medium Learning based Synthesis

项目摘要

Modern data-processing systems are designed to be general-purpose systems, in that they can handle a wide variety of applications and data. Unfortunately, this general-purpose nature causes these systems to achieve below-optimal performance for every single application and user. Rather technical compromises have to be made to support a wide range of use cases, often leading to orders-of-magnitude worse performance than what a highly customized system would be able to achieve. At the same time, developing a database system from scratch for each individual application and user is neither economical nor practical. The goal of this project is to explore how machine learning can be used to automatically customize a database system for a specific application or user to achieve so called 'instance-optimality'. If successful, this project will transform the way that modern database systems that underpin the Internet and many enterprise computing systems are built, resulting in systems with much better performance or systems that are able to process large datasets using much less hardware than current systems. Concretely, the project investigates to what extent learned models can automatically instance-optimize the various components of a large-scale data processing system: 1) data indexing, where a model can predict the location of a key in a database; 2) algorithms, including sorting and joins, where a model can predict where in a sorted list a record should go, or where joining tuples are in another relation; 3) optimizers, where a model can predict the optimal plan to use for processing queries on data, and 4) storage layout, where a model can predict the optimal layout of data for a particular query workload. This raises a number of intellectually deep questions, including what types of models work best, what theoretical guarantees we can give about the performance of these models, how such generated systems will compare to hand-tuned systems, how such systems can exploit new hardware such as TPUs/GPUs and how program synthesis will work with such modelled data, advancing the fields of databases, machine learning, and program modeling and synthesis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现代数据处理系统旨在是通用系统，因为它们可以处理各种应用程序和数据。不幸的是，这种通用性的性质使这些系统为每个应用程序和用户实现低于最佳的性能。必须做出技术折衷方案以支持广泛的用例，通常会导致比高度定制的系统能够实现的质量差异。同时，为每个单独的应用程序和用户从头开始开发数据库系统既不是经济的也不是实际的。该项目的目的是探索如何使用机器学习来自动自动自定义数据库系统，以使特定应用程序或用户获得所谓的“实例 - 优先性”。如果成功，该项目将改变构建Internet和许多企业计算系统的现代数据库系统的方式，从而导致具有更好性能的系统或能够使用比当前系统少得多的硬件处理大型数据集的系统。具体而言，该项目研究了学到的模型在多大程度上可以自动实例化大规模数据处理系统的各个组件：1）数据索引，其中模型可以预测数据库中密钥的位置； 2）算法，包括分类和加入，模型可以预测记录的排序列表中的位置，或者在其他关系中加入元组； 3）优化器，模型可以预测用于处理数据的查询的最佳计划，以及4）存储布局，其中模型可以预测特定查询工作负载的数据的最佳布局。 This raises a number of intellectually deep questions, including what types of models work best, what theoretical guarantees we can give about the performance of these models, how such generated systems will compare to hand-tuned systems, how such systems can exploit new hardware such as TPUs/GPUs and how program synthesis will work with such modelled data, advancing the fields of databases, machine learning, and program modeling and synthesis.This award reflects NSF's statutory mission并被认为是通过基金会的知识分子优点和更广泛的影响审查标准来评估值得支持的。

项目成果

期刊论文数量（15）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

SNARF: A Learning-Enhanced Range Filter

DOI：
10.14778/3529337.3529347
发表时间：
2022-04
期刊：
Proc. VLDB Endow.
影响因子：
0
作者：
Kapil Vaidya;Tim Kraska;Subarna Chatterjee;Eric R. Knorr;M. Mitzenmacher;Stratos Idreos
通讯作者：
Kapil Vaidya;Tim Kraska;Subarna Chatterjee;Eric R. Knorr;M. Mitzenmacher;Stratos Idreos

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

DOI：
发表时间：
2021-05
期刊：
影响因子：
0
作者：
Keyulu Xu;Mozhi Zhang;S. Jegelka;Kenji Kawaguchi
通讯作者：
Keyulu Xu;Mozhi Zhang;S. Jegelka;Kenji Kawaguchi

TreeLine: An Update-In-Place Key-Value Store for Modern Storage

DOI：
10.14778/3561261.3561270
发表时间：
2022-09
期刊：
Proc. VLDB Endow.
影响因子：
0
作者：
Geoffrey X. Yu;Markos Markakis;Andreas Kipf;P. Larson;U. F. Minhas;Tim Kraska
通讯作者：
Geoffrey X. Yu;Markos Markakis;Andreas Kipf;P. Larson;U. F. Minhas;Tim Kraska

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

DOI：
发表时间：
2020-09
期刊：
ArXiv
影响因子：
0
作者：
Keyulu Xu;Jingling Li;Mozhi Zhang;S. Du;K. Kawarabayashi;S. Jegelka
通讯作者：
Keyulu Xu;Jingling Li;Mozhi Zhang;S. Du;K. Kawarabayashi;S. Jegelka

SageDB: An Instance-Optimized Data Analytics System

DOI：
10.14778/3565838.3565857
发表时间：
2022-09
期刊：
Proc. VLDB Endow.
影响因子：
0
作者：
Jialin Ding;Ryan Marcus;Andreas Kipf;Vikram Nathan;Aniruddha Nrusimha;Kapil Vaidya;Alexander van Renen;Tim Kraska
通讯作者：
Jialin Ding;Ryan Marcus;Andreas Kipf;Vikram Nathan;Aniruddha Nrusimha;Kapil Vaidya;Alexander van Renen;Tim Kraska

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Tim Kraska其他文献

Building Database Applications in the Cloud

DOI：
10.3929/ethz-a-006007449
发表时间：
2010
期刊：
影响因子：
0
作者：
Tim Kraska
通讯作者：
Tim Kraska

Towards a Benchmark for the Cloud

迈向云基准

DOI：
发表时间：
2018
期刊：
影响因子：
0
作者：
Carsten Binnig;Donald Kossmann;Tim Kraska;Simon Losing
通讯作者：
Simon Losing

Safe Visual Data Exploration

安全的可视化数据探索

DOI：
发表时间：
2017
期刊：
SIGMOD Conference
影响因子：
0
作者：
Zheguang Zhao;Emanuel Zgraggen;L. Stefani;Carsten Binnig;E. Upfal;Tim Kraska
通讯作者：
Tim Kraska

Self-Organizing Data Containers

自组织数据容器

DOI：
发表时间：
2022
期刊：
Conference on Innovative Data Systems Research
影响因子：
0
作者：
S. Madden;Jialin Ding;Tim Kraska;Sivaprasad Sudhir;David Cohen;T. Mattson;Nesime Tatbul
通讯作者：
Nesime Tatbul