III: Medium: Quantifying the Unknown Unknowns for Data Integration

III:媒介:量化数据集成的未知因素

基本信息

  • 批准号:
    1562657
  • 负责人:
  • 金额:
    $ 98.48万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-06-01 至 2020-08-31
  • 项目状态:
    已结题

项目摘要

As the amount and variety of data available online explodes, it is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results? In this work, this project will develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) for analytical queries. This will help to better understand answers in the presence of incomplete information across fields ranging from business and the military to medical applications.This project will develop and exploit the following paradoxical statistical phenomenon: the ability to see certain data items more than once (across multiple data sets) enables one to estimate parameters of data items that have never been seen at all. This project will therefore develop new statistical techniques which take advantage of overlapping datasets, and software backed by both theory and experiments. This will enable users with overlapping incomplete data sets to actively "see the unseen," and in many cases perform as though they had access to missing information not represented in any of their data sources. The project will also focus on data validation, and how to use multiple unreliable data sources to correct each other. Further, as the proposed analysis is nuanced and novel, the project will also explore how to best convey valuable insights to the user, via interactive visualizations of the predictions. For further information see the project web site at: http://unknown-unknowns.cs.brown.edu
随着在线可用数据的数量和种类呈爆炸式增长,数据科学家通常会获取和集成不同的数据源,以获得更高质量的结果。但是,即使有一个完全清理和合并的数据集,仍然存在两个基本问题:(1)集成的数据集是否完整,以及(2)任何未知(即未观察到的)数据对查询结果的影响是什么?在这项工作中,该项目将开发和分析技术来估计未知数据(也称为未知未知)对分析查询的影响。这将有助于在从商业、军事到医疗应用等领域存在不完全信息的情况下更好地理解答案。该项目将开发和利用以下矛盾的统计现象:能够多次看到某些数据项(跨多个数据集),使人能够估计从未见过的数据项的参数。因此,该项目将开发利用重叠数据集的新统计技术,以及有理论和实验支持的软件。这将使拥有重叠的不完整数据集的用户能够积极地“看到看不见的东西”,在许多情况下,他们的表现就像他们可以访问任何数据源中没有表示的缺失信息一样。该项目还将专注于数据验证,以及如何使用多个不可靠的数据源来相互更正。此外,由于拟议的分析是细致入微和新颖的,该项目还将探讨如何通过预测的交互式可视化最好地向用户传达有价值的见解。欲了解更多信息,请访问项目网站:http://unknown-unknowns.cs.brown.edu

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tim Kraska其他文献

Building Database Applications in the Cloud
  • DOI:
    10.3929/ethz-a-006007449
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tim Kraska
  • 通讯作者:
    Tim Kraska
Towards a Benchmark for the Cloud
迈向云基准
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Carsten Binnig;Donald Kossmann;Tim Kraska;Simon Losing
  • 通讯作者:
    Simon Losing
Self-Organizing Data Containers
自组织数据容器
Safe Visual Data Exploration
安全的可视化数据探索
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zheguang Zhao;Emanuel Zgraggen;L. Stefani;Carsten Binnig;E. Upfal;Tim Kraska
  • 通讯作者:
    Tim Kraska
Supplementary Materials for Niseko: a Large-Scale Meta-Learning Dataset
Niseko 的补充材料:大规模元学习数据集
  • DOI:
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zeyuan Shang;Emanuel Zgraggen;P. Eichmann;Tim Kraska
  • 通讯作者:
    Tim Kraska

Tim Kraska的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Tim Kraska', 18)}}的其他基金

III: Medium: Quantifying the Unknown Unknowns for Data Integration
III:媒介:量化数据集成的未知因素
  • 批准号:
    2033792
  • 财政年份:
    2020
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
  • 批准号:
    1947440
  • 财政年份:
    2019
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
III: Medium: Learning-based Synthesis of Data Processing Engines
III:媒介:基于学习的数据处理引擎综合
  • 批准号:
    1900933
  • 财政年份:
    2019
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
  • 批准号:
    1636698
  • 财政年份:
    2016
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
CAREER: Query Compilation Techniques for Complex Analytics on Enterprise Clusters
职业:企业集群上复杂分析的查询编译技术
  • 批准号:
    1453171
  • 财政年份:
    2015
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant

相似海外基金

III: Medium: Quantifying the Unknown Unknowns for Data Integration
III:媒介:量化数据集成的未知因素
  • 批准号:
    2033792
  • 财政年份:
    2020
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
SHF: Medium: Quantifying and Designing Around Architectural Risk
SHF:中:围绕架构风险进行量化和设计
  • 批准号:
    1763699
  • 财政年份:
    2018
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
Quantifying the impact of stellar feedback on the interstellar medium
量化恒星反馈对星际介质的影响
  • 批准号:
    533571-2018
  • 财政年份:
    2018
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Canadian Graduate Scholarships Foreign Study Supplements
Quantifying residential access and exposure to urban greenspace using medium and high-resolution satellite imagery: a case study of Metro Vancouver, British Columbia
使用中高分辨率卫星图像量化住宅进入和接触城市绿地:不列颠哥伦比亚省大温哥华地区的案例研究
  • 批准号:
    528800-2018
  • 财政年份:
    2018
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Master's
III: Medium: Collaborative Research: Bayesian Modeling and Inference for Quantifying Terrestrial Ecosystem Functions
III:媒介:协作研究:量化陆地生态系统功能的贝叶斯建模和推理
  • 批准号:
    1563950
  • 财政年份:
    2016
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
III: Medium: Collaborative Research: Bayesian Modeling and Inference for Quantifying Terrestrial Ecosystem Functions
III:媒介:协作研究:量化陆地生态系统功能的贝叶斯建模和推理
  • 批准号:
    1562303
  • 财政年份:
    2016
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
Quantifying the structure of the molecular interstellar medium with the G-virial method
用 G-virial 方法量化分子星际介质的结构
  • 批准号:
    263085682
  • 财政年份:
    2014
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Priority Programmes
Study on Quantifying Visual Quality and Feeling of Semitransparent Medium
量化半透明介质视觉质量和感觉的研究
  • 批准号:
    24656147
  • 财政年份:
    2012
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
RI: Medium: Quantifying and utilizing confidence in machine learning
RI:中:量化和利用机器学习的信心
  • 批准号:
    1162581
  • 财政年份:
    2012
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
RI: Medium: Quantifying Causality in Distributed Spatial Temporal Brain Networks
RI:中:量化分布式时空脑网络中的因果关系
  • 批准号:
    0964197
  • 财政年份:
    2010
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了