Large Language Models for Query Optimisation: A New Paradigm in Database Systems

用于查询优化的大型语言模型:数据库系统的新范式

基本信息

  • 批准号:
    2726025
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Studentship
  • 财政年份:
    2023
  • 资助国家:
    英国
  • 起止时间:
    2023 至 无数据
  • 项目状态:
    未结题

项目摘要

Research ImpactThe vision of this research project is to revolutionize query optimisation in database (DB) systems using Large Language Models (LLMs). LLMs belong to the class of foundation models, AI paradigms capable of tackling multiple downstream tasks. I propose a comprehensive investigation into the ability of LLMs to act as 'brains' for efficient query processing in DB systems. Constructing queries efficiently is essential for DBs to run quickly. Database systems leverage query rewriting algorithms to transform queries, so they execute with low latency. Traditionally, this is done through the application of manually constructed rewrite rules, where the rewritten query should yield equivalent output as the original one, while exhibiting higher performance. Replacing white-box query optimisation strategies with zero-shot and few-shot learning via LLMs is an important step towards autonomous DB systems. Success in this research will be impactful in the database community, as it will prove that automating the end-to-end query rewrite process by bridging knowledge from natural language processing and database systems is viable. As an example, submitting a relational query to the DB will require no ad-hoc optimisation from the database administrator (DBA), in the presence of LLM-generated rewrite rules. Furthermore, I expect vendors to save hundreds of developer hours spent on extending the existing systems with ever-more rewrite rules. The necessity for new query rewrite rules is driven by changes in the queries executed, including non-human transactions such as those generated by web applications.Aims and ObjectivesThe envisioned goals of this research are to investigate and prove the following:1. The ability of LLMs to 'understand' the intricacies of existing DB systems. By capturing the logical and physical facets of current DBs, I envision LLMs to adapt well to various downstream database optimisation tasks.2. The efficiency of LLMs as query rewrite mechanisms. This goal aims to uncover how fast (i.e., zero-shot, few-shot) LLMs can learn to optimise queries and their performance against existing DBs. 3. The assets required to build an LLM-powered DB system. This objective aims to reduce the complexity of integrating LLMs into DB systems and beyond to a range of software systems and algorithms. MethodologyThe initial research methodology is to establish an LLM-based foundation for automating query rewriting. The deliverables will serve as artifacts to tackle tasks beyond query optimisation. There are two constituent parts. First, a pipeline for guiding the application of rewrite rules for DB queries will be implemented. The purpose of this is to ensure the order in which the rules are applied is optimal. Generally, finding the optimal order of applying query rewrite rules is an NP-hard problem. The reason is that applying a suboptimal rewrite rule early in the chain may prevent globally optimal rule applications. Second, the rewrite rules are generally designed by human experts, so instead, generating query rewrite rules via LLMs will be investigated through the prism of prompt engineering and adapters to eliminate human error and guesswork. EPSRC Strategic AlignmentBridging LLMs and DBs brings the research community closer to an autonomous DB and it's aligned with the "Artificial intelligence (AI), digitalisation and data: driving value and security" EPSRC objective. Serving information through natural language processing presents a real opportunity for driving innovation in the UK technology sector, with an important economic impact.
研究影响这个研究项目的愿景是革命性的查询优化数据库(DB)系统使用大型语言模型(LLM)。LLM属于基础模型类,能够处理多个下游任务的AI范式。我提出了一个全面的调查能力的LLM作为“大脑”的数据库系统中的高效查询处理。高效地构造查询对于数据库的快速运行至关重要。数据库系统利用查询重写算法来转换查询,因此它们以低延迟执行。传统上,这是通过应用手动构建的重写规则来完成的,其中重写的查询应该产生与原始查询相同的输出,同时表现出更高的性能。通过LLM用零射击和少数射击学习取代白盒查询优化策略是迈向自主DB系统的重要一步。这项研究的成功将对数据库社区产生影响,因为它将证明通过桥接自然语言处理和数据库系统的知识来自动化端到端查询重写过程是可行的。例如,在存在LLM生成的重写规则的情况下,向DB提交关系查询将不需要来自数据库管理员(DBA)的特别优化。此外,我希望供应商能够节省开发人员在扩展现有系统上花费的数百个小时,并使用更多的重写规则。新的查询重写规则的必要性是由执行的查询,包括非人类的事务,如Web applications.Aims和ObjectivesThe设想的目标,本研究的变化是调查和证明以下几点:1。LLM能够“理解”现有DB系统的复杂性。通过捕获当前数据库的逻辑和物理方面,我设想LLM能够很好地适应各种下游数据库优化任务。LLM作为查询重写机制的效率。这个目标旨在揭示如何快速(即,LLM可以学习优化查询及其对现有DB的性能。3.构建LLM驱动的DB系统所需的资产。这一目标旨在降低将LLM集成到DB系统以及一系列软件系统和算法的复杂性。最初的研究方法是建立一个基于LLM的自动化查询重写的基础。可交付成果将作为工件来处理查询优化以外的任务。有两个组成部分。首先,将实现用于指导DB查询的重写规则的应用的管道。这样做的目的是确保应用规则的顺序是最佳的。一般来说,查询重写规则的最优应用顺序是一个NP难问题。原因是在链的早期应用次优重写规则可能会阻止全局最优规则应用。其次,重写规则通常由人类专家设计,因此,通过LLM生成查询重写规则将通过提示工程和适配器的棱镜进行研究,以消除人为错误和猜测。EPSRC战略联盟桥接LLM和DB使研究社区更接近自主DB,并且与“人工智能(AI),数字化和数据:推动价值和安全”EPSRC目标保持一致。通过自然语言处理提供信息为推动英国技术领域的创新提供了真实的机会,并产生了重要的经济影响。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

其他文献

吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
生命分子工学・海洋生命工学研究室
生物分子工程/海洋生物技术实验室
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:

的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    --
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
  • 批准号:
    2908917
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
  • 批准号:
    2879438
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
CDT year 1 so TBC in Oct 2024
CDT 第 1 年,预计 2024 年 10 月
  • 批准号:
    2879865
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
  • 批准号:
    2890513
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
  • 批准号:
    2876993
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship

相似海外基金

Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
  • 批准号:
    2411529
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
  • 批准号:
    2411530
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Investigating the potential for developing self-regulation in foreign language learners through the use of computer-based large language models and machine learning
通过使用基于计算机的大语言模型和机器学习来调查外语学习者自我调节的潜力
  • 批准号:
    24K04111
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Multi-agent Self-improving of Large Language Models (LLMs)
大型语言模型 (LLM) 的多智能体自我改进
  • 批准号:
    2903811
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Studentship
Integrating Large Language Models for Long Horizon Task Planning in Multi-robot Scenarios
集成大型语言模型以实现多机器人场景中的长期任务规划
  • 批准号:
    24K07399
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Tuning Large language models to read biological literature
调整大型语言模型以阅读生物文献
  • 批准号:
    BB/Y514032/1
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Research Grant
CAREER: Regularizing Large Language Models for Safe and Reliable Program Generation
职业:规范大型语言模型以安全可靠地生成程序
  • 批准号:
    2340408
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Conference: New horizons in language science: large language models, language structure, and the neural basis of language
会议:语言科学的新视野:大语言模型、语言结构和语言的神经基础
  • 批准号:
    2418125
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Building AI-Powered Responsible Workforce by Integrating Large Language Models into Computer Science Curriculum
通过将大型语言模型集成到计算机科学课程中,打造人工智能驱动的负责任的劳动力队伍
  • 批准号:
    2336061
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Enhancing Factuality in Medical QA: Integrating Structured Knowledge Bases with Large Language Models
增强医学质量保证的真实性:将结构化知识库与大型语言模型相集成
  • 批准号:
    24K20832
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了