CAREER: Declarative Uncertainty

职业:声明的不确定性

基本信息

  • 批准号:
    1750460
  • 负责人:
  • 金额:
    $ 54.23万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-03-01 至 2024-02-29
  • 项目状态:
    已结题

项目摘要

Data is messy. Fortunately, with minimal human intervention, good data cleaning heuristics produce mostly reliable, usually actionable information from big, messy data. For instance, analysts might automate their curation workflows by using classifiers to predict missing attribute values, or by using an entity-resolver to find and merge duplicate records. Unfortunately, heuristics are also dangerous, as the result of heuristic curation is often taken as fact. Serious mistakes like people being denied a loan due to someone else's bad credit, 12-year olds being identified as terrorists, or billion dollar investment errors, often result when low-confidence, or uncertain heuristic inferences are treated as truth. Many principled tools like probabilistic databases already exist for automatically tracking potential errors in unreliable data, but these tools are not easy to use. As a result, analysts more often resort to simply documenting potential errors and hoping that anyone using the data will realize the implications. This proposal will enable data management systems that can query and organize uncertain data, without being hard to use. The specific aim of this proposal is to decouple the process of asking questions about uncertain data from mechanical concerns like why the data is uncertain, how the user wants to view uncertainty in query results, or which algorithms should be used. To enable this sort of "declarative uncertainty management," the project team will build on a system called Mimir that virtualizes uncertainty by augmenting data curation workflows (e.g., ETL pipelines) with a form of provenance capture through which heuristics can register alternative outputs (e.g., a schema matcher may register multiple potential matches). This provenance can then be used to synthesize a wide range of different physical and visual representations of uncertainty in data and in query results. To enable declarative uncertainty management, this proposal will address specific problems that fall into two general categories: (1) selecting and efficiently constructing qualitative summaries of uncertainty in query results, and (2) enhancing database query compilers and optimizers to support practical, efficient query processing over uncertain data. For further information see the project web page: http://mimirdb.infoThis award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据杂乱无章。幸运的是,在最少的人工干预下,好的数据清理启发式方法可以从大量杂乱的数据中产生大多数可靠的、通常可操作的信息。例如,分析师可能会通过使用分类器来预测丢失的属性值,或通过使用实体解析器来查找和合并重复记录,来自动化他们的管理工作流。不幸的是,启发式也是危险的,因为启发式管理的结果通常被认为是事实。严重的错误,比如人们因为别人的不良信用而被拒绝贷款,12岁的孩子被认定为恐怖分子,或者数十亿美元的投资错误,往往会导致信心不足或不确定的启发式推论被视为真理。许多有原则的工具,如概率数据库,已经存在用于自动跟踪不可靠数据中的潜在错误,但这些工具并不容易使用。因此,分析师更多地求助于简单地记录潜在的错误,并希望任何使用数据的人都会意识到其中的影响。这项提议将使数据管理系统能够查询和组织不确定的数据,而不是很难使用。这项提议的具体目的是将询问不确定数据的过程与机械关注的问题分离,例如为什么数据不确定、用户希望如何查看查询结果中的不确定性,或者应该使用哪些算法。为了实现这种“声明性不确定性管理”,项目团队将在一个名为Mimir的系统上构建,该系统通过用一种来源捕获的形式来增强数据管理工作流(例如,ETL管道)来虚拟化不确定性,启发式方法可以通过这种捕获来注册替代输出(例如,模式匹配器可以注册多个潜在的匹配)。然后,这种来源可以用来合成数据和查询结果中不确定性的各种不同的物理和视觉表示。为了实现声明性不确定性管理,该建议将解决两大类具体问题:(1)选择并有效地构造查询结果中不确定性的定性摘要;(2)增强数据库查询编译器和优化器以支持对不确定数据的实用、高效的查询处理。有关更多信息,请参见项目网页:http://mimirdb.infoThis奖反映了国家科学基金会的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(11)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Runtime provenance refinement for notebooks
笔记本的运行时出处细化
Loki: Streamlining Integration and Enrichment
Loki:简化集成和丰富
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Spoth, William;Kumari, Poonam;Kennedy, Oliver;Nargesian, Fatemeh
  • 通讯作者:
    Nargesian, Fatemeh
DataSense: Display Agnostic Data Documentation
DataSense:显示不可知的数据文档
Query Log Compression for Workload Analytics
用于工作负载分析的查询日志压缩
  • DOI:
    10.14778/3291264.3291265
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Xie, Ting;Chandola, Varun;Kennedy, Oliver
  • 通讯作者:
    Kennedy, Oliver
Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers
不确定性注释数据库 - 近似某些答案的轻量级方法
  • DOI:
    10.1145/3299869.3319887
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Feng, Su;Huber, Aaron;Glavic, Boris;Kennedy, Oliver
  • 通讯作者:
    Kennedy, Oliver
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Oliver Kennedy其他文献

PIP: A database system for great and small expectations
PIP:满足大大小小的期望的数据库系统
Pathological & radiological variables in the diagnosis of bronchopulmonary carcinoids (BPCs) with a focus on Antigen Kiel 67 (Ki-67) proliferation index
支气管肺类癌(BPCs)诊断中的病理学和影像学变量,重点关注抗原基尔 67(Ki-67)增殖指数
  • DOI:
    10.1016/j.lungcan.2025.108493
  • 发表时间:
    2025-04-01
  • 期刊:
  • 影响因子:
    4.400
  • 作者:
    Gaurav Ahuja;Aparna Iyer;Rachel Harwood;Haval Balata;Christopher Craig;Philip A.J. Crosbie;Kath Hewitt;Karen Peplow;Deborah Hutchings;Anna Sharman;Paul Bishop;Leena Joseph;Antonio Paiva-Correia;Anshuman Chaturvedi;James Barr;Angela Leek;Alison Backen;Christina Nuttall;Oliver Kennedy;Andrew Williamson;Matthew Evison
  • 通讯作者:
    Matthew Evison
Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data (Extended version)
不确定数据的排名和窗口查询的某些和可能答案的有效近似(扩展版)
  • DOI:
    10.48550/arxiv.2302.08676
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Su Feng;Boris Glavic;Oliver Kennedy
  • 通讯作者:
    Oliver Kennedy
Jigsaw: efficient optimization over uncertain enterprise data
Jigsaw:不确定企业数据的高效优化
  • DOI:
  • 发表时间:
    2011
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Oliver Kennedy;Suman Nath
  • 通讯作者:
    Suman Nath
Communicating Data Quality in On-Demand Curation
在按需管理中传达数据质量
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    P. Kumari;Said Achmiz;Oliver Kennedy
  • 通讯作者:
    Oliver Kennedy

Oliver Kennedy的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Oliver Kennedy', 18)}}的其他基金

SCC-PG: A Sustainable and Connected Community-Scale Food System to Empower Consumers, Farmers, and Retailers
SCC-PG:可持续且互联的社区规模食品系统,为消费者、农民和零售商提供支持
  • 批准号:
    2125516
  • 财政年份:
    2021
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: U4U - Taming Uncertainty with Uncertainty-Annotated Databases
III:媒介:合作研究:U4U - 利用不确定性注释数据库来克服不确定性
  • 批准号:
    1956149
  • 财政年份:
    2020
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
NSF Student Travel Grant for 2019 Symposium on Cloud Computing (SOCC)
2019 年云计算研讨会 (SOCC) 的 NSF 学生旅费补助
  • 批准号:
    1930814
  • 财政年份:
    2019
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
CIF21 DIBBs: EI: Vizier, Streamlined Data Curation
CIF21 DIBB:EI:Vizier,简化的数据管理
  • 批准号:
    1640864
  • 财政年份:
    2017
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
III: Small: Just in Time Datastructures
III:小:即时数据结构
  • 批准号:
    1617586
  • 财政年份:
    2016
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
CI-P: Planning for a Community Infrastructure to Enable Pocket-Scale Data Management Research
CI-P:规划社区基础设施以实现小型数据管理研究
  • 批准号:
    1629791
  • 财政年份:
    2016
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant

相似海外基金

Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
  • 批准号:
    2316161
  • 财政年份:
    2023
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Continuing Grant
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
  • 批准号:
    2316158
  • 财政年份:
    2023
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Continuing Grant
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
  • 批准号:
    2316159
  • 财政年份:
    2023
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Continuing Grant
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
  • 批准号:
    2316160
  • 财政年份:
    2023
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Continuing Grant
Collaborative Research: PPoSS: Large: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大型:大规模声明性分析的全栈方法
  • 批准号:
    2316157
  • 财政年份:
    2023
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Continuing Grant
Large-Scale Declarative Video Analytics
大规模声明式视频分析
  • 批准号:
    573283-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 54.23万
  • 项目类别:
    University Undergraduate Student Research Awards
Collaborative Research: PPoSS: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大规模声明性分析的全栈方法
  • 批准号:
    2217037
  • 财政年份:
    2022
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
Declarative Query Processing Over Real Time Video Streams
实时视频流上的声明式查询处理
  • 批准号:
    RGPIN-2020-07238
  • 财政年份:
    2022
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Discovery Grants Program - Individual
Collaborative Research: PPoSS: A Full-stack Approach to Declarative Analytics at Scale
协作研究:PPoSS:大规模声明性分析的全栈方法
  • 批准号:
    2217036
  • 财政年份:
    2022
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Standard Grant
Declarative Graph Query Language support for Web and Blockchain Decentralized Applications, Analytics, and Compliance
声明式图形查询语言支持 Web 和区块链去中心化应用程序、分析和合规性
  • 批准号:
    RGPIN-2020-06983
  • 财政年份:
    2022
  • 资助金额:
    $ 54.23万
  • 项目类别:
    Discovery Grants Program - Individual
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了