权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Fine-grained Data Provenance for Very Expressive Queries

细粒度的数据来源，用于非常富有表现力的查询

基本信息

批准号：
398800066
负责人：
Professor Dr. Torsten Grust
金额：
--
依托单位：
Lehrstuhl für Datenbanksysteme
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2018
资助国家：
德国
起止时间：
2017-12-31 至 2021-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/398800066?language=en
关键词：
Fine grained Data Provenance Very

项目摘要

Data provenance uncovers how database queries transform, filter, merge, and aggregate input data to arrive at the final output. With today's characteristic steep growth in data volume as well as query complexity, the inner workings of a query quickly become hard to assess and validate: where in the input did this piece of output originate? Why did the query emit this item but omit another? How did the query produce this result value and exactly which query constructs participated in the evaluation? Data provenance has answers to these and further questions and the responses explain query internals (and bugs), aid in data quality assessments, and help to build trust in query results—a critical service to data-dependent science and society.With provenance, we shift a query's focus from values and their transformation to the dependencies between output and input data. This research proposal is built on the central hypothesis that abstract interpretation provides an ideal framework to think and reason about as well as to implement this shift of focus. In abstract interpretation, a program analysis discipline first established in the 1970s, all but one (or few) selected aspect(s) of a program's evaluation are ignored. This project will adapt these ideas to develop a view of queries and programs in which input/output dependencies—not: values—assume the primary role.The benefits of data provenance grow with the complexity of the query logic it is able to explain. We set out to derive provenance for advanced query language constructs and idioms like deep nesting, sliding windows, user-defined and built-in functions, or recursion. It is a core goal to embrace practically relevant and complex languages, like modern variants of SQL, where prior work exhibited significant restrictions. We will capitalize on the flexibility of abstract interpretation and design abstract domains that explain provenance at various levels of data granularity, down to individual atomic values (table cells, say). Further adaptations of the abstract domain and query interpretation rules will allow the exploration of new and notoriously difficult types of data provenance (e.g., those of values absent in the output). Abstract interpretation is both, a powerful theoretical but also a practical tool. Building on the latter, we will study parallel provenance derivation for queries over large data volumes and the seamless integration of data provenance into query compilers of existing modern database systems.

Data Provenance揭示了数据库查询如何转换、筛选、合并和聚合输入数据以获得最终输出。随着当今数据量和查询复杂性的急剧增长，查询的内部工作很快就变得很难评估和验证：这段输出来自输入中的哪里？为什么查询会发出这一项，而忽略另一项？查询是如何生成该结果值的，以及哪些查询构造参与了评估？Data Provenance为这些问题和其他问题提供了答案，响应解释了查询的内部结构(和错误)，帮助进行数据质量评估，并帮助建立对查询结果的信任--这是对依赖数据的科学和社会的关键服务。这一研究建议建立在一个中心假设之上，即抽象解释提供了一个理想的框架来思考和推理，并实现了这种焦点的转移。在抽象解释中，最早建立于20世纪70年代的程序分析学科，除了一个(或几个)选定的方面(S)之外，程序评估的所有方面都被忽略。这个项目将采用这些想法来开发一种查询和程序的视图，其中输入/输出依赖关系-而不是：值-承担主要角色。数据来源的好处随着它能够解释的查询逻辑的复杂性而增长。我们开始为高级查询语言构造和习惯用法(如深度嵌套、滑动窗口、用户定义和内置函数或递归)派生来源。它的核心目标是包含实际相关和复杂的语言，如SQL的现代变体，以前的工作显示出很大的限制。我们将利用抽象解释的灵活性，并设计抽象领域，在不同级别的数据粒度上解释来源，直到单个原子值(比方说表格单元格)。对抽象域和查询解释规则的进一步调整将允许探索新的和出了名的困难类型的数据来源(例如，那些在输出中缺失的值)。抽象阐释既是一种强大的理论工具，也是一种实践工具。在后者的基础上，我们将研究针对大数据量的查询的并行来源推导，以及将数据来源无缝地整合到现有现代数据库系统的查询编译器中。