权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Declarative Uncertainty

职业：声明的不确定性

基本信息

批准号：
1750460
负责人：
Oliver Kennedy
金额：
$ 54.23万
依托单位：
SUNY at Buffalo
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-03-01 至 2024-02-29
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1750460&HistoricalAwards=false
关键词：
CAREER Declarative Uncertainty

项目摘要

Data is messy. Fortunately, with minimal human intervention, good data cleaning heuristics produce mostly reliable, usually actionable information from big, messy data. For instance, analysts might automate their curation workflows by using classifiers to predict missing attribute values, or by using an entity-resolver to find and merge duplicate records. Unfortunately, heuristics are also dangerous, as the result of heuristic curation is often taken as fact. Serious mistakes like people being denied a loan due to someone else's bad credit, 12-year olds being identified as terrorists, or billion dollar investment errors, often result when low-confidence, or uncertain heuristic inferences are treated as truth. Many principled tools like probabilistic databases already exist for automatically tracking potential errors in unreliable data, but these tools are not easy to use. As a result, analysts more often resort to simply documenting potential errors and hoping that anyone using the data will realize the implications. This proposal will enable data management systems that can query and organize uncertain data, without being hard to use. The specific aim of this proposal is to decouple the process of asking questions about uncertain data from mechanical concerns like why the data is uncertain, how the user wants to view uncertainty in query results, or which algorithms should be used. To enable this sort of "declarative uncertainty management," the project team will build on a system called Mimir that virtualizes uncertainty by augmenting data curation workflows (e.g., ETL pipelines) with a form of provenance capture through which heuristics can register alternative outputs (e.g., a schema matcher may register multiple potential matches). This provenance can then be used to synthesize a wide range of different physical and visual representations of uncertainty in data and in query results. To enable declarative uncertainty management, this proposal will address specific problems that fall into two general categories: (1) selecting and efficiently constructing qualitative summaries of uncertainty in query results, and (2) enhancing database query compilers and optimizers to support practical, efficient query processing over uncertain data. For further information see the project web page: http://mimirdb.infoThis award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

数据杂乱无章。幸运的是，在最少的人工干预下，好的数据清理启发式方法可以从大量杂乱的数据中产生大多数可靠的、通常可操作的信息。例如，分析师可能会通过使用分类器来预测丢失的属性值，或通过使用实体解析器来查找和合并重复记录，来自动化他们的管理工作流。不幸的是，启发式也是危险的，因为启发式管理的结果通常被认为是事实。严重的错误，比如人们因为别人的不良信用而被拒绝贷款，12岁的孩子被认定为恐怖分子，或者数十亿美元的投资错误，往往会导致信心不足或不确定的启发式推论被视为真理。许多有原则的工具，如概率数据库，已经存在用于自动跟踪不可靠数据中的潜在错误，但这些工具并不容易使用。因此，分析师更多地求助于简单地记录潜在的错误，并希望任何使用数据的人都会意识到其中的影响。这项提议将使数据管理系统能够查询和组织不确定的数据，而不是很难使用。这项提议的具体目的是将询问不确定数据的过程与机械关注的问题分离，例如为什么数据不确定、用户希望如何查看查询结果中的不确定性，或者应该使用哪些算法。为了实现这种“声明性不确定性管理”，项目团队将在一个名为Mimir的系统上构建，该系统通过用一种来源捕获的形式来增强数据管理工作流(例如，ETL管道)来虚拟化不确定性，启发式方法可以通过这种捕获来注册替代输出(例如，模式匹配器可以注册多个潜在的匹配)。然后，这种来源可以用来合成数据和查询结果中不确定性的各种不同的物理和视觉表示。为了实现声明性不确定性管理，该建议将解决两大类具体问题：(1)选择并有效地构造查询结果中不确定性的定性摘要；(2)增强数据库查询编译器和优化器以支持对不确定数据的实用、高效的查询处理。有关更多信息，请参见项目网页：http://mimirdb.infoThis奖反映了国家科学基金会的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（11）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Runtime provenance refinement for notebooks

笔记本的运行时出处细化

DOI：
10.1145/3530800.3534535
发表时间：
2022
期刊：
Proceedings of the 14th International Workshop on the Theory and Practice of Provenance
影响因子：
0
作者：
Deo, Nachiket;Glavic, Boris;Kennedy, Oliver
通讯作者：
Kennedy, Oliver

Loki: Streamlining Integration and Enrichment

Loki：简化集成和丰富

DOI：
发表时间：
2020
期刊：
Human in the Loop Data Analytics
影响因子：
0
作者：
Spoth, William;Kumari, Poonam;Kennedy, Oliver;Nargesian, Fatemeh
通讯作者：
Nargesian, Fatemeh

DataSense: Display Agnostic Data Documentation

DataSense：显示不可知的数据文档

DOI：
发表时间：
2021
期刊：
Conference on Innovative Data Systems Research
影响因子：
0
作者：
Kumari, Poonam;Brachmann, Michael;Kennedy, Oliver;Feng, Su;Glavic, Boris
通讯作者：
Glavic, Boris

Query Log Compression for Workload Analytics

用于工作负载分析的查询日志压缩

DOI：
10.14778/3291264.3291265
发表时间：
2018
期刊：
Proceedings of the VLDB Endowment
影响因子：
2.5
作者：
Xie, Ting;Chandola, Varun;Kennedy, Oliver
通讯作者：
Kennedy, Oliver

Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers

不确定性注释数据库 - 近似某些答案的轻量级方法

DOI：
10.1145/3299869.3319887
发表时间：
2019
期刊：
SIGMOD
影响因子：
0
作者：
Feng, Su;Huber, Aaron;Glavic, Boris;Kennedy, Oliver
通讯作者：
Kennedy, Oliver

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Oliver Kennedy其他文献

PIP: A database system for great and small expectations

PIP：满足大大小小的期望的数据库系统

DOI：
10.1109/icde.2010.5447879
发表时间：
2010
期刊：
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
影响因子：
0
作者：
Oliver Kennedy;Christoph E. Koch
通讯作者：
Christoph E. Koch

Pathological & radiological variables in the diagnosis of bronchopulmonary carcinoids (BPCs) with a focus on Antigen Kiel 67 (Ki-67) proliferation index

支气管肺类癌（BPCs）诊断中的病理学和影像学变量，重点关注抗原基尔 67（Ki-67）增殖指数

DOI：
10.1016/j.lungcan.2025.108493
发表时间：
2025-04-01
期刊：
LUNG CANCER
影响因子：
4.400
作者：
Gaurav Ahuja;Aparna Iyer;Rachel Harwood;Haval Balata;Christopher Craig;Philip A.J. Crosbie;Kath Hewitt;Karen Peplow;Deborah Hutchings;Anna Sharman;Paul Bishop;Leena Joseph;Antonio Paiva-Correia;Anshuman Chaturvedi;James Barr;Angela Leek;Alison Backen;Christina Nuttall;Oliver Kennedy;Andrew Williamson;Matthew Evison
通讯作者：
Matthew Evison