III: Medium: Collaborative Research: U4U - Taming Uncertainty with Uncertainty-Annotated Databases

III:媒介:合作研究:U4U - 利用不确定性注释数据库来克服不确定性

基本信息

  • 批准号:
    1956123
  • 负责人:
  • 金额:
    $ 46.66万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

Uncertainty is prevalent in data analysis, no matter what the size of the data, the application domain, or type of analysis. Common sources of uncertainty include missing values, sensor errors, bias, outliers, and many other factors. Classical deterministic data management does not track uncertainty and, thus requires data quality issues to be resolved before data is ingested into the system, which is often not feasible. The net effect is that inherently uncertain data is being treated as certain. However, if ignored, data uncertainty results in hard to trace errors, which in turn can have severe real world implications such as unfounded scientific discoveries, financial damages, or even medical decisions based on incorrect data. While there exist techniques for managing incomplete data, these techniques are generally too heavy-weight for real-world usage and may hide relevant information from users. The goal of this project is to develop light-weight techniques for managing uncertain data that empower a wide range of applications to manage uncertainty.Current methods for managing uncertain data are often computationally expensive and are only applicable to limited types of queries. The planned research will result in novel methods for managing uncertain data that bridge the gap between deterministic and incomplete data management. The foundation of this project are uncertainty-annotated databases, which enrich data with uncertainty labels and provide semantics for propagating these labels through queries. The result is a strict generalization of classical data management that combines the performance, generality, and ease-of-use of deterministic data management with the strong correctness guarantees of incomplete database techniques. Achieving this goal is highly non-trivial, because query evaluation over uncertain data is intractable, even for relatively simple uncertain data models and restricted classes of queries. Three main research thrusts will be explored that address the main challenges in developing such a technique: (i) uncertainty-annotated databases will be extended with attribute-level annotations and an compact encoding of an over-approximation of possible answers. This enables the approach to handle missing data and to deal with non-monotone queries such as queries with aggregation; (ii) methods to compactly approximating incomplete databases will be developed to deal with the large or even infinite sets of possible results produced by queries over uncertain data; (iii) optimized algorithms for query evaluation over uncertainty-annotated databases will be developed to address the performance limitations of queries over uncertain data. The planned work will significantly enhance the state-of-the-art in uncertain data management by, for the first time, enabling principled uncertainty management for complex queries at a reasonable cost.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
不确定性在数据分析中普遍存在,无论数据的大小、应用领域或分析类型如何。不确定性的常见来源包括缺失值、传感器误差、偏差、离群值和许多其他因素。传统的确定性数据管理不跟踪不确定性,因此需要在数据被摄入系统之前解决数据质量问题,这通常是不可行的。净效应是,固有的不确定数据被视为确定数据。然而,如果忽略数据的不确定性,则会导致难以追踪的错误,这反过来又会对真实的世界产生严重的影响,例如毫无根据的科学发现,经济损失,甚至是基于错误数据的医疗决策。虽然存在用于管理不完整数据的技术,但这些技术对于现实世界的使用来说通常太重,并且可能对用户隐藏相关信息。这个项目的目标是开发轻量级的技术来管理不确定数据,使广泛的应用程序来管理不确定性。目前管理不确定数据的方法通常是计算昂贵的,只适用于有限类型的查询。计划中的研究将产生管理不确定数据的新方法,弥合确定性和不完整数据管理之间的差距。这个项目的基础是不确定性注释的数据库,它用不确定性标签来丰富数据,并为通过查询传播这些标签提供语义。其结果是经典数据管理的严格概括,结合了确定性数据管理的性能,通用性和易用性与不完整数据库技术的强大正确性保证。实现这一目标是非常重要的,因为不确定数据的查询评估是棘手的,即使是相对简单的不确定数据模型和有限类的查询。将探讨三个主要的研究方向,以解决开发这种技术的主要挑战:(i)不确定性注释的数据库将扩展属性级注释和可能答案的过度近似的紧凑编码。这使得该方法能够处理丢失的数据和处理非单调查询,如查询与聚合;(ii)方法来compounding近似不完整的数据库将被开发来处理大的,甚至无限的可能的结果集查询不确定的数据;(iii)针对不确定性的查询评估的优化算法-将开发附加说明的数据库,以解决查询不确定数据的性能限制问题。这项计划中的工作将首次以合理的成本为复杂查询提供有原则的不确定性管理,从而显著提高不确定性数据管理的最新水平。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(17)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
CaJaDE: explaining query results by augmenting provenance with context
CaJaDE:通过使用上下文增强来源来解释查询结果
  • DOI:
    10.14778/3554821.3554852
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Li, Chenjie;Lee, Juseung;Miao, Zhengjie;Glavic, Boris;Roy, Sudeepa
  • 通讯作者:
    Roy, Sudeepa
Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data
对不确定数据进行排序和窗口查询的某些和可能答案的有效近似
  • DOI:
    10.14778/3583140.3583151
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Feng, Su;Glavic, Boris;Kennedy, Oliver
  • 通讯作者:
    Kennedy, Oliver
Efficient Answering of Historical What-if Queries
高效回答历史假设查询
  • DOI:
    10.1145/3514221.3526138
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Campbell, Felix S.;Arab, Bahareh Sadat;Glavic, Boris
  • 通讯作者:
    Glavic, Boris
Overlay Spreadsheets
叠加电子表格
Hybrid Query and Instance Explanations and Repairs
混合查询和实例解释和修复
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Boris Glavic其他文献

Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data (Extended version)
不确定数据的排名和窗口查询的某些和可能答案的有效近似(扩展版)
  • DOI:
    10.48550/arxiv.2302.08676
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Su Feng;Boris Glavic;Oliver Kennedy
  • 通讯作者:
    Oliver Kennedy
Efficient Stream Provenance via Operator Instrumentation
通过操作员仪表进行高效的流来源
  • DOI:
  • 发表时间:
    2014
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Boris Glavic;K. S. Esmaili;Peter M. Fischer;Nesime Tatbul
  • 通讯作者:
    Nesime Tatbul
Interoperability for Provenance-aware Databases using PROV and JSON
使用 PROV 和 JSON 实现来源感知数据库的互操作性
Solving Why Not Questions for Aggregate Constraints through Query Repair
通过查询修复解决聚合约束的“Why Not”问题
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shatha Algarni;Boris Glavic;Seok;Adriane Chapman
  • 通讯作者:
    Adriane Chapman
SCIPIS: Scalable and concurrent persistent indexing and search in high-end computing systems
SCIPIS:高端计算系统中的可扩展和并发持久索引和搜索
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alexandru Iulian Orhean;Anna Giannakou;Lavanya Ramakrishnan;K. Chard;Boris Glavic;I. Raicu
  • 通讯作者:
    I. Raicu

Boris Glavic的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Boris Glavic', 18)}}的其他基金

III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
  • 批准号:
    2420691
  • 财政年份:
    2024
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
  • 批准号:
    2107107
  • 财政年份:
    2021
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant

相似海外基金

III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
  • 批准号:
    2420691
  • 财政年份:
    2024
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Designing AI Systems with Steerable Long-Term Dynamics
合作研究:III:中:设计具有可操纵长期动态的人工智能系统
  • 批准号:
    2312865
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312932
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
  • 批准号:
    2348169
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Continuing Grant
Collaborative Research: III: Medium: Algorithms for scalable inference and phylodynamic analysis of tumor haplotypes using low-coverage single cell sequencing data
合作研究:III:中:使用低覆盖率单细胞测序数据对肿瘤单倍型进行可扩展推理和系统动力学分析的算法
  • 批准号:
    2415562
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: New Machine Learning Empowered Nanoinformatics System for Advancing Nanomaterial Design
合作研究:III:媒介:新的机器学习赋能纳米信息学系统,促进纳米材料设计
  • 批准号:
    2347592
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Knowledge discovery from highly heterogeneous, sparse and private data in biomedical informatics
合作研究:III:中:生物医学信息学中高度异构、稀疏和私有数据的知识发现
  • 批准号:
    2312862
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: MEDIUM: Responsible Design and Validation of Algorithmic Rankers
合作研究:III:媒介:算法排序器的负责任设计和验证
  • 批准号:
    2312930
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: VirtualLab: Integrating Deep Graph Learning and Causal Inference for Multi-Agent Dynamical Systems
协作研究:III:媒介:VirtualLab:集成多智能体动态系统的深度图学习和因果推理
  • 批准号:
    2312501
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
Collaborative Research: III: Medium: Graph Neural Networks for Heterophilous Data: Advancing the Theory, Models, and Applications
合作研究:III:媒介:异质数据的图神经网络:推进理论、模型和应用
  • 批准号:
    2406648
  • 财政年份:
    2023
  • 资助金额:
    $ 46.66万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了