权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Large Language Models for Query Optimisation: A New Paradigm in Database Systems

用于查询优化的大型语言模型：数据库系统的新范式

基本信息

批准号：
2726025
负责人：
金额：
--
依托单位：
University of Warwick
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2726025
关键词：
Large Language Models Query Optimisation

项目摘要

Research ImpactThe vision of this research project is to revolutionize query optimisation in database (DB) systems using Large Language Models (LLMs). LLMs belong to the class of foundation models, AI paradigms capable of tackling multiple downstream tasks. I propose a comprehensive investigation into the ability of LLMs to act as 'brains' for efficient query processing in DB systems. Constructing queries efficiently is essential for DBs to run quickly. Database systems leverage query rewriting algorithms to transform queries, so they execute with low latency. Traditionally, this is done through the application of manually constructed rewrite rules, where the rewritten query should yield equivalent output as the original one, while exhibiting higher performance. Replacing white-box query optimisation strategies with zero-shot and few-shot learning via LLMs is an important step towards autonomous DB systems. Success in this research will be impactful in the database community, as it will prove that automating the end-to-end query rewrite process by bridging knowledge from natural language processing and database systems is viable. As an example, submitting a relational query to the DB will require no ad-hoc optimisation from the database administrator (DBA), in the presence of LLM-generated rewrite rules. Furthermore, I expect vendors to save hundreds of developer hours spent on extending the existing systems with ever-more rewrite rules. The necessity for new query rewrite rules is driven by changes in the queries executed, including non-human transactions such as those generated by web applications.Aims and ObjectivesThe envisioned goals of this research are to investigate and prove the following:1. The ability of LLMs to 'understand' the intricacies of existing DB systems. By capturing the logical and physical facets of current DBs, I envision LLMs to adapt well to various downstream database optimisation tasks.2. The efficiency of LLMs as query rewrite mechanisms. This goal aims to uncover how fast (i.e., zero-shot, few-shot) LLMs can learn to optimise queries and their performance against existing DBs. 3. The assets required to build an LLM-powered DB system. This objective aims to reduce the complexity of integrating LLMs into DB systems and beyond to a range of software systems and algorithms. MethodologyThe initial research methodology is to establish an LLM-based foundation for automating query rewriting. The deliverables will serve as artifacts to tackle tasks beyond query optimisation. There are two constituent parts. First, a pipeline for guiding the application of rewrite rules for DB queries will be implemented. The purpose of this is to ensure the order in which the rules are applied is optimal. Generally, finding the optimal order of applying query rewrite rules is an NP-hard problem. The reason is that applying a suboptimal rewrite rule early in the chain may prevent globally optimal rule applications. Second, the rewrite rules are generally designed by human experts, so instead, generating query rewrite rules via LLMs will be investigated through the prism of prompt engineering and adapters to eliminate human error and guesswork. EPSRC Strategic AlignmentBridging LLMs and DBs brings the research community closer to an autonomous DB and it's aligned with the "Artificial intelligence (AI), digitalisation and data: driving value and security" EPSRC objective. Serving information through natural language processing presents a real opportunity for driving innovation in the UK technology sector, with an important economic impact.

研究影响这个研究项目的愿景是革命性的查询优化数据库（DB）系统使用大型语言模型（LLM）。LLM属于基础模型类，能够处理多个下游任务的AI范式。我提出了一个全面的调查能力的LLM作为“大脑”的数据库系统中的高效查询处理。高效地构造查询对于数据库的快速运行至关重要。数据库系统利用查询重写算法来转换查询，因此它们以低延迟执行。传统上，这是通过应用手动构建的重写规则来完成的，其中重写的查询应该产生与原始查询相同的输出，同时表现出更高的性能。通过LLM用零射击和少数射击学习取代白盒查询优化策略是迈向自主DB系统的重要一步。这项研究的成功将对数据库社区产生影响，因为它将证明通过桥接自然语言处理和数据库系统的知识来自动化端到端查询重写过程是可行的。例如，在存在LLM生成的重写规则的情况下，向DB提交关系查询将不需要来自数据库管理员（DBA）的特别优化。此外，我希望供应商能够节省开发人员在扩展现有系统上花费的数百个小时，并使用更多的重写规则。新的查询重写规则的必要性是由执行的查询，包括非人类的事务，如Web applications.Aims和ObjectivesThe设想的目标，本研究的变化是调查和证明以下几点：1。LLM能够“理解”现有DB系统的复杂性。通过捕获当前数据库的逻辑和物理方面，我设想LLM能够很好地适应各种下游数据库优化任务。LLM作为查询重写机制的效率。这个目标旨在揭示如何快速（即，LLM可以学习优化查询及其对现有DB的性能。3.构建LLM驱动的DB系统所需的资产。这一目标旨在降低将LLM集成到DB系统以及一系列软件系统和算法的复杂性。最初的研究方法是建立一个基于LLM的自动化查询重写的基础。可交付成果将作为工件来处理查询优化以外的任务。有两个组成部分。首先，将实现用于指导DB查询的重写规则的应用的管道。这样做的目的是确保应用规则的顺序是最佳的。一般来说，查询重写规则的最优应用顺序是一个NP难问题。原因是在链的早期应用次优重写规则可能会阻止全局最优规则应用。其次，重写规则通常由人类专家设计，因此，通过LLM生成查询重写规则将通过提示工程和适配器的棱镜进行研究，以消除人为错误和猜测。EPSRC战略联盟桥接LLM和DB使研究社区更接近自主DB，并且与“人工智能（AI），数字化和数据：推动价值和安全”EPSRC目标保持一致。通过自然语言处理提供信息为推动英国技术领域的创新提供了真实的机会，并产生了重要的经济影响。