III: Small: Automatic Database Management System Tuning Through Large-scale Machine Learning
III:小型:通过大规模机器学习自动调整数据库管理系统
基本信息
- 批准号:1423210
- 负责人:
- 金额:$ 49.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-08-01 至 2018-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The ability to collect, process, and analyze large amounts of data is paramount for being able to extrapolate new knowledge in business, scientific, and medical applications. Database management systems (DBMSs) are the critical component of modern "Big Data" applications because they are the central repository for all of this information. But tuning a DBMS to perform well is historically a difficult task because they have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use and how often data is written. Getting these settings wrong will prevent the system from answering questions about data in a reasonable amount of time or even cause it to lose data. Many organizations resort to hiring experts to configure these knobs, but this is prohibitively expensive. Personnel cost is estimated to be almost 50% of the total ownership cost of a DBMS, and many administrators spend nearly a quarter of their time on these tuning activities. Furthermore, as databases grow in both size and complexity, optimizing a DBMS to meet the needs of new applications has surpassed the abilities of even the best human experts. Thus, the goal of this proposal is to develop the foundation and corresponding practical techniques for the automatic configuration of DBMSs by using machine learning on large-scale collections of historical performance data. Our approach will differ from previous work in that we seek to reduce the amount of time that is needed to train the algorithms that tune the DBMS for each application by relying on knowledge gained from previous tuning efforts. The results from this work will allow anyone to deploy a DBMS that is able to handle large amounts of data and more complex workloads without any expertise in database administration.Achieving good performance in a database management system (DBMS) is non-trivial because they are complex systems with many tunable options that control nearly all aspects of their runtime operation. Getting this tuning right is critical for modern high-volume and high-throughput workloads, as the performance gains can be significant. As such, many organizations resort to hiring an expensive database administrator to manually tune their DBMS. But the size and complexity of databases have now surpassed the abilities of even the best human experts. Hence, we plan to develop automatic techniques for tuning and optimizing DBMS configurations for a broad class of application workloads. We will explore the foundations of using machine learning to scale DBMSs for larger data sets, thereby removing a major impediment in deriving the full benefits of data-driven decision making applications. The crux of our approach is to map an arbitrary application's workload to features of one or more canonical benchmarks that best represents the workload's properties, and then to collect performance data from the DBMS using that benchmark. This data is then used to train models that will allow us to identify the dependencies between knobs and their effects on the DBMS. From this, the models will select a near-optimal knob setting for the application. This differs from earlier work that focused on optimizing a single DBMS installation in isolation and are unable to leverage knowledge gained from previous tuning efforts. Our approach will not require the user to generate a large sample data set of (potentially expensive) experiments to derive the proper configuration.For further information see project web site at: http://oltpbenchmark.com
收集、处理和分析大量数据的能力对于能够在商业、科学和医疗应用程序中推断新知识至关重要。数据库管理系统(DBMS)是现代“大数据”应用程序的关键组件,因为它们是所有这些信息的中央存储库。但从历史上看,调优DBMS以提高性能是一项艰巨的任务,因为它们有数百个配置“旋钮”,控制系统中的所有内容,例如要使用的内存量和写入数据的频率。如果这些设置不正确,系统将无法在合理的时间内回答有关数据的问题,甚至会导致数据丢失。许多组织求助于聘请专家来配置这些旋钮,但这样做的成本高得令人望而却步。据估计,人员成本几乎占DBMS总拥有成本的50%,许多管理员将近四分之一的时间花在这些调优活动上。此外,随着数据库的规模和复杂性的增长,优化DBMS以满足新应用程序的需求已经超过了即使是最优秀的人类专家的能力。因此,这项提议的目标是通过在大规模历史性能数据收集上使用机器学习,为数据库管理系统的自动配置开发基础和相应的实用技术。我们的方法将与以前的工作不同,因为我们寻求减少通过依赖于从以前的优化工作中获得的知识来训练为每个应用程序优化DBMS的算法所需的时间量。这项工作的结果将允许任何人部署一个能够处理大量数据和更复杂的工作负载的DBMS,而不需要任何数据库管理专业知识。在数据库管理系统(DBMS)中实现良好的性能并不是一件容易的事,因为它们是具有许多可调选项的复杂系统,几乎控制其运行时操作的所有方面。正确地进行此调优对于现代大容量和高吞吐量工作负载至关重要,因为性能收益可能非常显著。因此,许多组织求助于雇佣一名昂贵的数据库管理员来手动调整他们的DBMS。但数据库的大小和复杂性现在已经超过了最优秀的人类专家的能力。因此,我们计划开发针对广泛的应用程序工作负载调优和优化DBMS配置的自动化技术。我们将探索使用机器学习来为更大的数据集扩展DBMS的基础,从而消除在充分发挥数据驱动的决策应用程序的好处方面的一个主要障碍。我们方法的关键是将任意应用程序的工作负载映射到最能代表工作负载属性的一个或多个规范基准测试的特性,然后使用该基准测试从DBMS收集性能数据。然后,这些数据被用来训练模型,使我们能够识别旋钮之间的依赖关系及其对DBMS的影响。在此基础上,模型将为应用程序选择一个接近最佳的旋钮设置。这与早期的工作不同,早期的工作侧重于孤立地优化单个DBMS安装,并且无法利用从以前的调优工作中获得的知识。我们的方法不需要用户生成大量(可能很昂贵的)实验样本数据集来得出适当的配置。有关更多信息,请参阅项目网站:http://oltpbenchmark.com
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Andrew Pavlo其他文献
On Scalable Transaction Execution in Partitioned Main Memory Database Management Systems
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Andrew Pavlo - 通讯作者:
Andrew Pavlo
Non-Volatile Memory Database Management Systems
非易失性内存数据库管理系统
- DOI:
10.2200/s00891ed1v01y201812dtm055 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Joy Arulraj;Andrew Pavlo - 通讯作者:
Andrew Pavlo
NULLS!: Revisiting Null Representation in Modern Columnar Formats
NULLS!:重新审视现代列格式中的空表示
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Xinyu Zeng;Ruijun Meng;Andrew Pavlo;Wes McKinney;Huanchen Zhang - 通讯作者:
Huanchen Zhang
: Database architectures for modern hardware : report from Dagstuhl Seminar 18251
:现代硬件的数据库架构:来自 Dagstuhl 研讨会 18251 的报告
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
P. Boncz;G. Graefe;Bingsheng He;K. Sattler;Philippe Bonnet;A. Kemper;Viktor Leis;Justin J. Levandoski;S. Manegold;Danica Porobic;Caetano Sauer;Carsten Binnig;Andrew Crotty;Alex Galakatos;Tim Kraska;E. Z. The;Thomas Leich;Thilo Pionteck;Gunter Saake;Olaf Spinczyk;Andreas Becher;Lekshmi B.G;David Broneske;Tobias Drewes;B. Gurumurthy;K. Meyer;Jürgen Teich;Juan A. Colmenares;Gage Eads;S. Hofmeyr;Sarah Bird;Miquel Moretó;David Chou;Brian Gluzman;Eric Roman;D. B. Bartolini;Nitesh Mor;K. Asanović;John D Kubiatowicz. 2013;Daniel Lemire;Andrew Pavlo;A. Nica - 通讯作者:
A. Nica
Enterprise Database Applications and the Cloud: A Difficult Road Ahead
企业数据库应用程序和云:前进的道路艰难
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
M. Stonebraker;Andrew Pavlo;Rebecca Taft;Michael L. Brodie - 通讯作者:
Michael L. Brodie
Andrew Pavlo的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Andrew Pavlo', 18)}}的其他基金
CAREER: Self-Driving Database Management Systems
职业:自动驾驶数据库管理系统
- 批准号:
1846158 - 财政年份:2019
- 资助金额:
$ 49.97万 - 项目类别:
Continuing Grant
SPX: Collaborative Research: Distributed Database Management with Logical Leases and Hardware Transactional Memory
SPX:协作研究:具有逻辑租赁和硬件事务内存的分布式数据库管理
- 批准号:
1822933 - 财政年份:2018
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
III: Small: Non-Invasive Real-Time Analytics in Database Systems using Holistic Query Compilation
III:小型:使用整体查询编译在数据库系统中进行非侵入式实时分析
- 批准号:
1718582 - 财政年份:2017
- 资助金额:
$ 49.97万 - 项目类别:
Continuing Grant
XPS: FULL: DSD: Collaborative Research: Moving the Abyss: Database Management on Future 1000-core Processors
XPS:完整:DSD:协作研究:移动深渊:未来 1000 核处理器上的数据库管理
- 批准号:
1438955 - 财政年份:2014
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
SaTC: CORE: Small: Automatic Exploits Detection and Mitigation for Industrial Control System Protocols
SaTC:核心:小型:工业控制系统协议的自动漏洞检测和缓解
- 批准号:
2345563 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
SaTC: CORE: Small: Automatic Identification of Privilege-guard Variables for Data-only Attacks and Defenses
SaTC:核心:小型:自动识别纯数据攻击和防御的权限保护变量
- 批准号:
2247652 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Continuing Grant
SaTC: CORE: Small: Automatic Detection and Repair of Side Channel Vulnerabilities in Software Code
SaTC:CORE:小型:自动检测和修复软件代码中的侧信道漏洞
- 批准号:
2245344 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Continuing Grant
SaTC: CORE: Small: Sound Automatic Exploit Generation
SaTC:核心:小:声音自动漏洞利用生成
- 批准号:
2234257 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Continuing Grant
CCF: SHF: Small: Self-Adaptive Interference-Avoiding Wireless Receiver Hardware through Real-Time Learning-Based Automatic Optimization of Power-Efficient Integrated Circuits
CCF:SHF:小型:通过基于实时学习的高能效集成电路自动优化实现自适应干扰避免无线接收器硬件
- 批准号:
2218845 - 财政年份:2022
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
NSF-BSF: SHF: Small: Efficient, Automatic, and Trustworthy Smart Contract Verification
NSF-BSF:SHF:小型:高效、自动且值得信赖的智能合约验证
- 批准号:
2110397 - 财政年份:2021
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
SaTC: CORE: Small: Automatic Exploits Detection and Mitigation for Industrial Control System Protocols
SaTC:核心:小型:工业控制系统协议的自动漏洞检测和缓解
- 批准号:
2051621 - 财政年份:2021
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
RI: Small: Accelerating Machine Learning via Randomized Automatic Differentiation
RI:小型:通过随机自动微分加速机器学习
- 批准号:
2007278 - 财政年份:2020
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
SHF: Small: Automatic, adaptive and massive parallel data processing on GPU/RDMA clusters in both synchronous and asynchronous modes
SHF:小型:在同步和异步模式下在 GPU/RDMA 集群上自动、自适应和大规模并行数据处理
- 批准号:
2005884 - 财政年份:2020
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
Adaptive control for agile automatic welding in small batches
小批量敏捷自动焊接的自适应控制
- 批准号:
530116-2018 - 财政年份:2020
- 资助金额:
$ 49.97万 - 项目类别:
Collaborative Research and Development Grants