III: Medium: Learning-based Synthesis of Data Processing Engines
III:媒介:基于学习的数据处理引擎综合
基本信息
- 批准号:1900933
- 负责人:
- 金额:$ 120万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2023-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern data-processing systems are designed to be general-purpose systems, in that they can handle a wide variety of applications and data. Unfortunately, this general-purpose nature causes these systems to achieve below-optimal performance for every single application and user. Rather technical compromises have to be made to support a wide range of use cases, often leading to orders-of-magnitude worse performance than what a highly customized system would be able to achieve. At the same time, developing a database system from scratch for each individual application and user is neither economical nor practical. The goal of this project is to explore how machine learning can be used to automatically customize a database system for a specific application or user to achieve so called 'instance-optimality'. If successful, this project will transform the way that modern database systems that underpin the Internet and many enterprise computing systems are built, resulting in systems with much better performance or systems that are able to process large datasets using much less hardware than current systems. Concretely, the project investigates to what extent learned models can automatically instance-optimize the various components of a large-scale data processing system: 1) data indexing, where a model can predict the location of a key in a database; 2) algorithms, including sorting and joins, where a model can predict where in a sorted list a record should go, or where joining tuples are in another relation; 3) optimizers, where a model can predict the optimal plan to use for processing queries on data, and 4) storage layout, where a model can predict the optimal layout of data for a particular query workload. This raises a number of intellectually deep questions, including what types of models work best, what theoretical guarantees we can give about the performance of these models, how such generated systems will compare to hand-tuned systems, how such systems can exploit new hardware such as TPUs/GPUs and how program synthesis will work with such modelled data, advancing the fields of databases, machine learning, and program modeling and synthesis.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代数据处理系统被设计成通用系统,因为它们可以处理各种各样的应用程序和数据。不幸的是,这种通用性导致这些系统为每个应用程序和用户实现了低于最佳性能的性能。相反,必须在技术上做出妥协,以支持广泛的用例,这往往会导致性能比高度定制的系统所能实现的性能差几个数量级。同时,从零开始为每个单独的应用程序和用户开发一个数据库系统既不经济也不实用。这个项目的目标是探索如何使用机器学习来为特定的应用程序或用户自动定制数据库系统,以实现所谓的“实例最优”。如果成功,该项目将改变支撑互联网和许多企业计算系统的现代数据库系统的构建方式,导致系统具有更好的性能,或者系统能够使用比当前系统少得多的硬件来处理大型数据集。具体地说,该项目调查学习的模型可以在多大程度上自动实例优化大规模数据处理系统的各种组件:1)数据索引,其中模型可以预测关键字在数据库中的位置;2)算法,包括排序和联接,其中模型可以预测记录在排序列表中的位置,或者连接元组在另一种关系中的位置;3)优化器,其中模型可以预测用于处理对数据的查询的最优计划;以及4)存储布局,其中模型可以预测针对特定查询工作负载的数据的最优布局。这引发了许多智力上的深层问题,包括哪些类型的模型工作得最好,我们可以为这些模型的性能提供什么理论上的保证,这样的生成系统将如何与手动调整的系统进行比较,这样的系统如何利用TPU/GPU等新硬件,以及程序合成将如何与这些建模数据一起工作,推动数据库、机器学习以及程序建模和合成领域的发展。这一奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(15)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
SNARF: A Learning-Enhanced Range Filter
- DOI:10.14778/3529337.3529347
- 发表时间:2022-04
- 期刊:
- 影响因子:0
- 作者:Kapil Vaidya;Tim Kraska;Subarna Chatterjee;Eric R. Knorr;M. Mitzenmacher;Stratos Idreos
- 通讯作者:Kapil Vaidya;Tim Kraska;Subarna Chatterjee;Eric R. Knorr;M. Mitzenmacher;Stratos Idreos
Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth
- DOI:
- 发表时间:2021-05
- 期刊:
- 影响因子:0
- 作者:Keyulu Xu;Mozhi Zhang;S. Jegelka;Kenji Kawaguchi
- 通讯作者:Keyulu Xu;Mozhi Zhang;S. Jegelka;Kenji Kawaguchi
TreeLine: An Update-In-Place Key-Value Store for Modern Storage
- DOI:10.14778/3561261.3561270
- 发表时间:2022-09
- 期刊:
- 影响因子:0
- 作者:Geoffrey X. Yu;Markos Markakis;Andreas Kipf;P. Larson;U. F. Minhas;Tim Kraska
- 通讯作者:Geoffrey X. Yu;Markos Markakis;Andreas Kipf;P. Larson;U. F. Minhas;Tim Kraska
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
- DOI:
- 发表时间:2020-09
- 期刊:
- 影响因子:0
- 作者:Keyulu Xu;Jingling Li;Mozhi Zhang;S. Du;K. Kawarabayashi;S. Jegelka
- 通讯作者:Keyulu Xu;Jingling Li;Mozhi Zhang;S. Du;K. Kawarabayashi;S. Jegelka
Towards instance-optimized data systems
- DOI:10.14778/3476311.3476392
- 发表时间:2021-07
- 期刊:
- 影响因子:0
- 作者:Tim Kraska
- 通讯作者:Tim Kraska
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Tim Kraska其他文献
Building Database Applications in the Cloud
- DOI:
10.3929/ethz-a-006007449 - 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
Tim Kraska - 通讯作者:
Tim Kraska
Towards a Benchmark for the Cloud
迈向云基准
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Carsten Binnig;Donald Kossmann;Tim Kraska;Simon Losing - 通讯作者:
Simon Losing
Self-Organizing Data Containers
自组织数据容器
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
S. Madden;Jialin Ding;Tim Kraska;Sivaprasad Sudhir;David Cohen;T. Mattson;Nesime Tatbul - 通讯作者:
Nesime Tatbul
Safe Visual Data Exploration
安全的可视化数据探索
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Zheguang Zhao;Emanuel Zgraggen;L. Stefani;Carsten Binnig;E. Upfal;Tim Kraska - 通讯作者:
Tim Kraska
Making the Case for Query-by-Voice with EchoQuery
使用 EchoQuery 进行语音查询的案例
- DOI:
10.1145/2882903.2899394 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Gabriel Lyons;Vinh Q. Tran;Carsten Binnig;U. Çetintemel;Tim Kraska - 通讯作者:
Tim Kraska
Tim Kraska的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Tim Kraska', 18)}}的其他基金
III: Medium: Quantifying the Unknown Unknowns for Data Integration
III:媒介:量化数据集成的未知因素
- 批准号:
2033792 - 财政年份:2020
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
- 批准号:
1947440 - 财政年份:2019
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
III: Medium: Quantifying the Unknown Unknowns for Data Integration
III:媒介:量化数据集成的未知因素
- 批准号:
1562657 - 财政年份:2016
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
- 批准号:
1636698 - 财政年份:2016
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
CAREER: Query Compilation Techniques for Complex Analytics on Enterprise Clusters
职业:企业集群上复杂分析的查询编译技术
- 批准号:
1453171 - 财政年份:2015
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
相似海外基金
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
- 批准号:
2348169 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Continuing Grant
Collaborative Research: III: Medium: VirtualLab: Integrating Deep Graph Learning and Causal Inference for Multi-Agent Dynamical Systems
协作研究:III:媒介:VirtualLab:集成多智能体动态系统的深度图学习和因果推理
- 批准号:
2312501 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
III: Medium: Linear Algebra Operators in Databases to Support Analytic and Machine-Learning Workloads
III:中:数据库中的线性代数运算符支持分析和机器学习工作负载
- 批准号:
2312991 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
III: Medium: Advancing Deep Learning for Inverse Modeling
III:媒介:推进逆向建模的深度学习
- 批准号:
2313174 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: New Machine Learning Empowered Nanoinformatics System for Advancing Nanomaterial Design
合作研究:III:媒介:新的机器学习赋能纳米信息学系统,促进纳米材料设计
- 批准号:
2347592 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: IIS: III: MEDIUM: Learning Protein-ish: Foundational Insight on Protein Language Models for Better Understanding, Democratized Access, and Discovery
协作研究:IIS:III:中等:学习蛋白质:对蛋白质语言模型的基础洞察,以更好地理解、民主化访问和发现
- 批准号:
2310113 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: Towards Effective Detection and Mitigation for Shortcut Learning: A Data Modeling Framework
协作研究:III:媒介:针对捷径学习的有效检测和缓解:数据建模框架
- 批准号:
2310262 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: Towards Effective Detection and Mitigation for Shortcut Learning: A Data Modeling Framework
协作研究:III:媒介:针对捷径学习的有效检测和缓解:数据建模框架
- 批准号:
2310261 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: VirtualLab: Integrating Deep Graph Learning and Causal Inference for Multi-Agent Dynamical Systems
协作研究:III:媒介:VirtualLab:集成多智能体动态系统的深度图学习和因果推理
- 批准号:
2312502 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant
Collaborative Research: III: Medium: Towards Effective Detection and Mitigation for Shortcut Learning: A Data Modeling Framework
协作研究:III:媒介:针对捷径学习的有效检测和缓解:数据建模框架
- 批准号:
2310260 - 财政年份:2023
- 资助金额:
$ 120万 - 项目类别:
Standard Grant














{{item.name}}会员




