III: Medium: Quantifying the Unknown Unknowns for Data Integration

III:媒介:量化数据集成的未知因素

基本信息

  • 批准号:
    1562657
  • 负责人:
  • 金额:
    $ 98.48万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2016
  • 资助国家:
    美国
  • 起止时间:
    2016-06-01 至 2020-08-31
  • 项目状态:
    已结题

项目摘要

As the amount and variety of data available online explodes, it is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results? In this work, this project will develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) for analytical queries. This will help to better understand answers in the presence of incomplete information across fields ranging from business and the military to medical applications.This project will develop and exploit the following paradoxical statistical phenomenon: the ability to see certain data items more than once (across multiple data sets) enables one to estimate parameters of data items that have never been seen at all. This project will therefore develop new statistical techniques which take advantage of overlapping datasets, and software backed by both theory and experiments. This will enable users with overlapping incomplete data sets to actively "see the unseen," and in many cases perform as though they had access to missing information not represented in any of their data sources. The project will also focus on data validation, and how to use multiple unreliable data sources to correct each other. Further, as the proposed analysis is nuanced and novel, the project will also explore how to best convey valuable insights to the user, via interactive visualizations of the predictions. For further information see the project web site at: http://unknown-unknowns.cs.brown.edu
随着在线可用数据的数量和种类爆炸式增长,数据科学家通常会获取和集成不同的数据源以获得更高质量的结果。但是,即使有一个完美的清理和合并的数据集,仍然存在两个基本问题:(1)整合的数据集是否完整,以及(2)任何未知的影响(即,未观察到的)查询结果的数据?在这项工作中,该项目将开发和分析技术,以估计未知数据的影响(又名,未知的未知数)用于分析查询。这将有助于更好地理解从商业和军事到医疗应用等领域存在不完整信息的情况下的答案。该项目将开发和利用以下自相矛盾的统计现象:能够多次看到某些数据项(跨多个数据集)使人们能够估计从未见过的数据项的参数。因此,该项目将开发新的统计技术,利用重叠的数据集,以及由理论和实验支持的软件。这将使具有重叠的不完整数据集的用户能够积极地“看到看不见的东西”,并且在许多情况下,他们的表现就像他们可以访问任何数据源中没有表示的缺失信息一样。 该项目还将关注数据验证,以及如何使用多个不可靠的数据源来相互纠正。 此外,由于所提出的分析是细致入微和新颖的,该项目还将探索如何通过预测的交互式可视化向用户传达有价值的见解。欲了解更多信息,请访问项目网站:http://unknown-unknowns.cs.brown.edu

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Tim Kraska其他文献

Towards a Benchmark for the Cloud
迈向云基准
  • DOI:
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Carsten Binnig;Donald Kossmann;Tim Kraska;Simon Losing
  • 通讯作者:
    Simon Losing
Building Database Applications in the Cloud
  • DOI:
    10.3929/ethz-a-006007449
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Tim Kraska
  • 通讯作者:
    Tim Kraska
Safe Visual Data Exploration
安全的可视化数据探索
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zheguang Zhao;Emanuel Zgraggen;L. Stefani;Carsten Binnig;E. Upfal;Tim Kraska
  • 通讯作者:
    Tim Kraska
Self-Organizing Data Containers
自组织数据容器
Making the Case for Query-by-Voice with EchoQuery
使用 EchoQuery 进行语音查询的案例

Tim Kraska的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Tim Kraska', 18)}}的其他基金

III: Medium: Quantifying the Unknown Unknowns for Data Integration
III:媒介:量化数据集成的未知因素
  • 批准号:
    2033792
  • 财政年份:
    2020
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
  • 批准号:
    1947440
  • 财政年份:
    2019
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
III: Medium: Learning-based Synthesis of Data Processing Engines
III:媒介:基于学习的数据处理引擎综合
  • 批准号:
    1900933
  • 财政年份:
    2019
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
BD Spokes:SPOKE:NORTHEAST:协作:数据共享的许可模型和生态系统
  • 批准号:
    1636698
  • 财政年份:
    2016
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
CAREER: Query Compilation Techniques for Complex Analytics on Enterprise Clusters
职业:企业集群上复杂分析的查询编译技术
  • 批准号:
    1453171
  • 财政年份:
    2015
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant

相似海外基金

III: Medium: Quantifying the Unknown Unknowns for Data Integration
III:媒介:量化数据集成的未知因素
  • 批准号:
    2033792
  • 财政年份:
    2020
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
SHF: Medium: Quantifying and Designing Around Architectural Risk
SHF:中:围绕架构风险进行量化和设计
  • 批准号:
    1763699
  • 财政年份:
    2018
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
Quantifying the impact of stellar feedback on the interstellar medium
量化恒星反馈对星际介质的影响
  • 批准号:
    533571-2018
  • 财政年份:
    2018
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Canadian Graduate Scholarships Foreign Study Supplements
Quantifying residential access and exposure to urban greenspace using medium and high-resolution satellite imagery: a case study of Metro Vancouver, British Columbia
使用中高分辨率卫星图像量化住宅进入和接触城市绿地:不列颠哥伦比亚省大温哥华地区的案例研究
  • 批准号:
    528800-2018
  • 财政年份:
    2018
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Master's
III: Medium: Collaborative Research: Bayesian Modeling and Inference for Quantifying Terrestrial Ecosystem Functions
III:媒介:协作研究:量化陆地生态系统功能的贝叶斯建模和推理
  • 批准号:
    1563950
  • 财政年份:
    2016
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
III: Medium: Collaborative Research: Bayesian Modeling and Inference for Quantifying Terrestrial Ecosystem Functions
III:媒介:协作研究:量化陆地生态系统功能的贝叶斯建模和推理
  • 批准号:
    1562303
  • 财政年份:
    2016
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Continuing Grant
Quantifying the structure of the molecular interstellar medium with the G-virial method
用 G-virial 方法量化分子星际介质的结构
  • 批准号:
    263085682
  • 财政年份:
    2014
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Priority Programmes
Study on Quantifying Visual Quality and Feeling of Semitransparent Medium
量化半透明介质视觉质量和感觉的研究
  • 批准号:
    24656147
  • 财政年份:
    2012
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Grant-in-Aid for Challenging Exploratory Research
RI: Medium: Quantifying and utilizing confidence in machine learning
RI:中:量化和利用机器学习的信心
  • 批准号:
    1162581
  • 财政年份:
    2012
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
RI: Medium: Quantifying Causality in Distributed Spatial Temporal Brain Networks
RI:中:量化分布式时空脑网络中的因果关系
  • 批准号:
    0964197
  • 财政年份:
    2010
  • 资助金额:
    $ 98.48万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了