CAREER: Mining Hints from Text Documents to Guide Automated Database Performance Tuning
职业:从文本文档中挖掘提示来指导自动数据库性能调优
基本信息
- 批准号:2239326
- 负责人:
- 金额:$ 59.49万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-04-01 至 2028-03-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Database management systems; that is, systems that process and manage large data sets, are used widely, across virtually all sectors of industry. Their performance depends on a variety of tuning decisions, determining how the system processes data internally. For lay users, it is very hard to find settings that optimize performance. This has motivated the creation of automated database tuning tools that try to find optimal settings for them. However, crucial information for database tuning is often available in the form of natural language text, including, for instance, the database manual, text documents describing data sets, as well as discussions on database-centric Internet forums. Currently, automated tools are unable to benefit from such text, making them inefficient. This project aims at creating automated database tuning tools that extract useful information for tuning from a variety of text documents. By increasing the quality of automated tuning tools, the project empowers lay users and reduces the need for highly specialized workers in industry, currently causing staff shortages and hampering the adoption of new technology. At the same time, the project aims at the creation of new teaching offerings, helping to educate the next generation of data professionals.The project is divided into two primary research thrusts, dedicated to the two categories of text documents that are most useful for database system tuning: text about data sets and text about database management systems. Transformer-based language models will be used to extract relevant information from such text documents. The resulting insights can be used in multiple ways for database tuning: to guide data profiling operations prior to tuning, to refine cost models used for tuning, or to restrict the search space of tuning choices. The project will explore all of those options, combining insights gained from text with other sources of information (e.g., trial runs that result in performance measurements for specific tuning choices). The project will consider a representative set of classical database tuning problems, including, for instance, the problem of selecting auxiliary index data structures to optimally support data processing, as well as the problem of finding optimal values for database system configuration parameters. All project outcomes will be integrated into a software package for automated database tuning, using text documents as input. This software will be released to the public.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据库管理系统,即处理和管理大型数据集的系统,在几乎所有的行业中都得到了广泛的应用。它们的性能取决于各种调优决策,从而决定系统如何在内部处理数据。对于外行用户来说,很难找到优化性能的设置。这促使了自动数据库调优工具的创建,这些工具试图为它们找到最佳设置。然而,数据库调整的关键信息往往以自然语言文本的形式提供,例如,包括数据库手册、描述数据集的文本文件以及以数据库为中心的因特网论坛上的讨论。目前,自动化工具无法从这样的文本中受益,这使得它们效率低下。这个项目的目标是创建自动化的数据库调优工具,从各种文本文档中提取有用的信息进行调优。通过提高自动调谐工具的质量,该项目增强了非专业用户的能力,并减少了对工业中高度专业化工人的需求,目前这些工人造成了人员短缺,阻碍了新技术的采用。同时,该项目旨在创造新的教学内容,帮助培养下一代数据专业人员。该项目分为两个主要研究主题,致力于对数据库系统调整最有用的两类文本文件:关于数据集的文本和关于数据库管理系统的文本。基于转换器的语言模型将用于从此类文本文档中提取相关信息。所得到的见解可用于数据库调优的多种方式:指导调优前的数据分析操作,改进用于调优的成本模型,或限制调优选项的搜索空间。该项目将探索所有这些选项,将从文本中获得的见解与其他信息来源相结合(例如,对特定调优选择进行性能测量的试运行)。该项目将审议一组具有代表性的经典数据库调整问题,例如,包括选择辅助索引数据结构以最佳地支持数据处理的问题,以及为数据库系统配置参数寻找最佳值的问题。所有项目成果将被整合到一个软件包中,以便使用文本文件作为输入进行自动数据库调整。该软件将向公众发布。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
DB-BERT: making database tuning tools “read” the manual
- DOI:10.1007/s00778-023-00831-y
- 发表时间:2023-12
- 期刊:
- 影响因子:0
- 作者:Immanuel Trummer
- 通讯作者:Immanuel Trummer
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Immanuel Trummer其他文献
AggChecker: A Fact-Checking System for Text Summaries of Relational Data Sets
AggChecker:关系数据集文本摘要的事实检查系统
- DOI:
10.14778/3352063.3352104 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Saehan Jo;Immanuel Trummer;Weicheng Yu;Xuezhi Wang;Cong Yu;Daniel Liu;Niyati Mehta - 通讯作者:
Niyati Mehta
Generating highly customizable python code for data processing with large language models
- DOI:
10.1007/s00778-025-00900-4 - 发表时间:
2025-01-31 - 期刊:
- 影响因子:3.800
- 作者:
Immanuel Trummer - 通讯作者:
Immanuel Trummer
Multi-objective parametric query optimization
多目标参数查询优化
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Immanuel Trummer;Christoph E. Koch - 通讯作者:
Christoph E. Koch
BABOONS: Black-Box Optimization of Data Summaries in Natural Language
- DOI:
10.14778/3551793.3551846 - 发表时间:
2022-07 - 期刊:
- 影响因子:0
- 作者:
Immanuel Trummer - 通讯作者:
Immanuel Trummer
Can Large Language Models Predict Data Correlations from Column Names?
- DOI:
10.14778/3625054.3625066 - 发表时间:
2023-09 - 期刊:
- 影响因子:0
- 作者:
Immanuel Trummer - 通讯作者:
Immanuel Trummer
Immanuel Trummer的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Immanuel Trummer', 18)}}的其他基金
III: Small: Regret-Bounded Query Evaluation via Reinforcement Learning
III:小:通过强化学习进行遗憾限制查询评估
- 批准号:
1910830 - 财政年份:2019
- 资助金额:
$ 59.49万 - 项目类别:
Continuing Grant
相似国自然基金
基于Genome mining技术研究抑制表皮葡萄球菌生物膜形成的次级代谢产物
- 批准号:21242003
- 批准年份:2012
- 资助金额:10.0 万元
- 项目类别:专项基金项目
相似海外基金
NeTS: Small: NSF-DST: Modernizing Underground Mining Operations with Millimeter-Wave Imaging and Networking
NeTS:小型:NSF-DST:利用毫米波成像和网络实现地下采矿作业现代化
- 批准号:
2342833 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Standard Grant
Development of social attention indicators of emerging technologies and science policies with network analysis and text mining
利用网络分析和文本挖掘开发新兴技术和科学政策的社会关注指标
- 批准号:
24K16438 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
ART: Mining the Rich Vein of Research in Montana
艺术:挖掘蒙大拿州研究的丰富脉络
- 批准号:
2331325 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Cooperative Agreement
FightAMR: Novel global One Health surveillance approach to fight AMR using Artificial Intelligence and big data mining
FightAMR:利用人工智能和大数据挖掘对抗 AMR 的新型全球统一健康监测方法
- 批准号:
MR/Y034422/1 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Research Grant
DISES Investigating mercury biogeochemical cycling via mixed-methods in complex artisanal gold mining landscapes and implications for community health
DISES 通过混合方法研究复杂手工金矿景观中的汞生物地球化学循环及其对社区健康的影响
- 批准号:
2307870 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Standard Grant
Toward carbon-neutral society: Development of a full-sustainable eco-friendly green mining process for gold recovery
迈向碳中和社会:开发完全可持续的环保绿色采矿工艺以回收黄金
- 批准号:
24K17540 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Generating green hydrogen from mining wastes
从采矿废物中产生绿色氢气
- 批准号:
IM240100202 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Mid-Career Industry Fellowships
Novel Hydrophobic Concrete for Durable and Resilient Mining Infrastructure
用于耐用且有弹性的采矿基础设施的新型疏水混凝土
- 批准号:
LP230100288 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Linkage Projects
SBIR Phase I: Electromagnetic-ablative PGM Refining for In-situ Asteroid Mining
SBIR 第一阶段:用于小行星原位采矿的电磁烧蚀铂族金属精炼
- 批准号:
2327078 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Standard Grant
Temporal Graph Mining for Anomaly Detection
用于异常检测的时间图挖掘
- 批准号:
DP240101547 - 财政年份:2024
- 资助金额:
$ 59.49万 - 项目类别:
Discovery Projects