权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Small: Automatic Database Management System Tuning Through Large-scale Machine Learning

III：小型：通过大规模机器学习自动调整数据库管理系统

基本信息

批准号：
1423210
负责人：
Andrew Pavlo
金额：
$ 49.97万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-08-01 至 2018-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1423210&HistoricalAwards=false
关键词：
III Small Automatic Database Management

项目摘要

The ability to collect, process, and analyze large amounts of data is paramount for being able to extrapolate new knowledge in business, scientific, and medical applications. Database management systems (DBMSs) are the critical component of modern "Big Data" applications because they are the central repository for all of this information. But tuning a DBMS to perform well is historically a difficult task because they have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use and how often data is written. Getting these settings wrong will prevent the system from answering questions about data in a reasonable amount of time or even cause it to lose data. Many organizations resort to hiring experts to configure these knobs, but this is prohibitively expensive. Personnel cost is estimated to be almost 50% of the total ownership cost of a DBMS, and many administrators spend nearly a quarter of their time on these tuning activities. Furthermore, as databases grow in both size and complexity, optimizing a DBMS to meet the needs of new applications has surpassed the abilities of even the best human experts. Thus, the goal of this proposal is to develop the foundation and corresponding practical techniques for the automatic configuration of DBMSs by using machine learning on large-scale collections of historical performance data. Our approach will differ from previous work in that we seek to reduce the amount of time that is needed to train the algorithms that tune the DBMS for each application by relying on knowledge gained from previous tuning efforts. The results from this work will allow anyone to deploy a DBMS that is able to handle large amounts of data and more complex workloads without any expertise in database administration.Achieving good performance in a database management system (DBMS) is non-trivial because they are complex systems with many tunable options that control nearly all aspects of their runtime operation. Getting this tuning right is critical for modern high-volume and high-throughput workloads, as the performance gains can be significant. As such, many organizations resort to hiring an expensive database administrator to manually tune their DBMS. But the size and complexity of databases have now surpassed the abilities of even the best human experts. Hence, we plan to develop automatic techniques for tuning and optimizing DBMS configurations for a broad class of application workloads. We will explore the foundations of using machine learning to scale DBMSs for larger data sets, thereby removing a major impediment in deriving the full benefits of data-driven decision making applications. The crux of our approach is to map an arbitrary application's workload to features of one or more canonical benchmarks that best represents the workload's properties, and then to collect performance data from the DBMS using that benchmark. This data is then used to train models that will allow us to identify the dependencies between knobs and their effects on the DBMS. From this, the models will select a near-optimal knob setting for the application. This differs from earlier work that focused on optimizing a single DBMS installation in isolation and are unable to leverage knowledge gained from previous tuning efforts. Our approach will not require the user to generate a large sample data set of (potentially expensive) experiments to derive the proper configuration.For further information see project web site at: http://oltpbenchmark.com

收集、处理和分析大量数据的能力对于推断商业、科学和医学应用中的新知识至关重要。数据库管理系统 (DBMS) 是现代“大数据”应用程序的关键组件，因为它们是所有这些信息的中央存储库。但调整 DBMS 以使其性能良好在历史上是一项艰巨的任务，因为它们有数百个配置“旋钮”来控制系统中的所有内容，例如要使用的内存量和写入数据的频率。如果这些设置错误，系统将无法在合理的时间内回答有关数据的问题，甚至导致数据丢失。 Many organizations resort to hiring experts to configure these knobs, but this is prohibitively expensive.据估计，人员成本几乎占 DBMS 总拥有成本的 50%，许多管理员将近四分之一的时间花在这些调优活动上。此外，随着数据库规模和复杂性的增长，优化 DBMS 以满足新应用程序的需求甚至超出了最优秀的人类专家的能力。因此，该提案的目标是通过在大规模历史性能数据集合上使用机器学习来开发 DBMS 自动配置的基础和相应的实用技术。我们的方法与以前的工作不同，因为我们寻求依靠从以前的调优工作中获得的知识来减少训练为每个应用程序调优 DBMS 的算法所需的时间。这项工作的结果将允许任何人部署一个能够处理大量数据和更复杂工作负载的 DBMS，而无需任何数据库管理方面的专业知识。在数据库管理系统 (DBMS) 中实现良好的性能并非易事，因为它们是复杂的系统，具有许多可调选项，可以控制其运行时操作的几乎所有方面。 Getting this tuning right is critical for modern high-volume and high-throughput workloads, as the performance gains can be significant. As such, many organizations resort to hiring an expensive database administrator to manually tune their DBMS. But the size and complexity of databases have now surpassed the abilities of even the best human experts. Hence, we plan to develop automatic techniques for tuning and optimizing DBMS configurations for a broad class of application workloads.我们将探索使用机器学习为更大的数据集扩展 DBMS 的基础，从而消除获得数据驱动决策应用程序的全部优势的主要障碍。我们方法的关键是将任意应用程序的工作负载映射到一个或多个最能代表工作负载属性的规范基准的特征，然后使用该基准从 DBMS 收集性能数据。 This data is then used to train models that will allow us to identify the dependencies between knobs and their effects on the DBMS. From this, the models will select a near-optimal knob setting for the application.这与早期的工作不同，早期的工作侧重于单独优化单个 DBMS 安装，并且无法利用从以前的调优工作中获得的知识。我们的方法不需要用户生成大量（可能昂贵的）实验样本数据集来得出正确的配置。有关更多信息，请参阅项目网站：http://oltpbenchmark.com

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Andrew Pavlo其他文献

On Scalable Transaction Execution in Partitioned Main Memory Database Management Systems

DOI：
发表时间：
2014
期刊：
影响因子：
0
作者：
Andrew Pavlo
通讯作者：
Andrew Pavlo

Non-Volatile Memory Database Management Systems

非易失性内存数据库管理系统

DOI：
10.2200/s00891ed1v01y201812dtm055
发表时间：
2019
期刊：
Synthesis Lectures on Data Management
影响因子：
0
作者：
Joy Arulraj;Andrew Pavlo
通讯作者：
Andrew Pavlo

NULLS!: Revisiting Null Representation in Modern Columnar Formats

NULLS！：重新审视现代列格式中的空表示

DOI：
发表时间：
2024
期刊：
International Workshop on Data Management on New Hardware
影响因子：
0
作者：
Xinyu Zeng;Ruijun Meng;Andrew Pavlo;Wes McKinney;Huanchen Zhang
通讯作者：
Huanchen Zhang

: Database architectures for modern hardware : report from Dagstuhl Seminar 18251

：现代硬件的数据库架构：来自 Dagstuhl 研讨会 18251 的报告

DOI：
发表时间：
2018
期刊：
影响因子：
0
作者：
P. Boncz;G. Graefe;Bingsheng He;K. Sattler;Philippe Bonnet;A. Kemper;Viktor Leis;Justin J. Levandoski;S. Manegold;Danica Porobic;Caetano Sauer;Carsten Binnig;Andrew Crotty;Alex Galakatos;Tim Kraska;E. Z. The;Thomas Leich;Thilo Pionteck;Gunter Saake;Olaf Spinczyk;Andreas Becher;Lekshmi B.G;David Broneske;Tobias Drewes;B. Gurumurthy;K. Meyer;Jürgen Teich;Juan A. Colmenares;Gage Eads;S. Hofmeyr;Sarah Bird;Miquel Moretó;David Chou;Brian Gluzman;Eric Roman;D. B. Bartolini;Nitesh Mor;K. Asanović;John D Kubiatowicz. 2013;Daniel Lemire;Andrew Pavlo;A. Nica
通讯作者：
A. Nica