III: Medium: Table-as-Query: Unifying Data Discovery and Alignment
III:媒介:表即查询:统一数据发现和对齐
基本信息
- 批准号:1956096
- 负责人:
- 金额:$ 100万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-08-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Fueled by advances in information extraction and societal trends that value institutional openness and transparency, structured data are being produced and shared at an overwhelming speed. Open data sharing is central to supporting institutional transparency, but transparency is not achieved if shared data cannot be found and effectively aligned with other data being studied by data scientists, journalists, and others. This project will fundamentally contribute to the new science of open data sharing. The requirements for data discovery and integration over heterogeneous table repositories containing structured data are fundamentally different than they are for federated data integration (where for example, all data within an enterprise is integrated) or data exchange (where data is exchanged among a small set of autonomous peers, for example, between two institutions). This project will lay the theoretical foundations of data discovery (identification, alignment, and integration of tables) within table repositories. It will contribute both to developing the right conceptual framework for studying this problem and to designing systems that solve the table discovery and alignment problems at scale.Today, solutions for data discovery over massive table repositories are in their infancy. Some solutions are highly tied to a specific domain. For example, solutions for finding relevant tables in mass collaboration data (often called web tables) may assume tables are designed for human consumption with rich, human-readable attribute names or metadata, and are relatively small (being designed for display on web pages). Furthermore, solutions often assume that the data scientists know a lot about what data is available and exactly how they want to integrate it with known data. These solutions let a user find tables that join with a specified attribute or union with a query table. But they are inadequate if the best way to extend a query table is to actually join it on several attributes with two other tables and then union the extended result with an existing wider table. This project will develop a more holistic approach to table discovery that both discovers a set of alignable tables as well as the best way to integrate (or align) the new data with a query table. In this new paradigm called "table-as-query", the user does not need to know a priori on which attributes various tables in a repository are best aligned. This project promotes a research agenda under which discovery finds not a single table, but a set of tables that can be combined (aligned) with the query table. The solutions will include integration choices within the table discovery process, looking for a set of tables that can best be aligned with a query table and also finding what the best alignment is. Importantly, the project will not rely on the unique name assumption, which states that different values refer to different and unique entities. Real data contains synonyms (two values that refer to the same entity) and homographs (one value that refers to more than one entity). This project will define new foundations and mathematical principles for studying table alignment and discovery. The search space is massive, so the project will also develop approximate, scalable solutions that can quickly (at interactive speeds) find a good set of tables and good alignments over massive table repositories with millions of tables.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在信息提取技术的进步和重视机构开放性和透明度的社会趋势的推动下,结构化数据正在以压倒性的速度产生和共享。 开放数据共享是支持机构透明度的核心,但如果无法找到共享数据,并与数据科学家、记者和其他人正在研究的其他数据有效对齐,则无法实现透明度。 该项目将从根本上促进开放数据共享的新科学。 在包含结构化数据的异构表存储库上进行数据发现和集成的需求,与联邦数据集成(例如,集成企业内的所有数据)或数据交换(例如,在两个机构之间,在一小组自治对等点之间交换数据)的需求有着根本的不同。 该项目将奠定表存储库中数据发现(表的识别、对齐和集成)的理论基础。 它将有助于开发正确的概念框架来研究这个问题,并设计系统,解决大规模的表发现和对齐问题。今天,在大规模表存储库的数据发现的解决方案还处于起步阶段。 有些解决方案与特定领域密切相关。 例如,用于在大量协作数据中找到相关表(通常称为web表)的解决方案可以假设表被设计用于具有丰富的人类可读属性名称或元数据的人类消费,并且相对较小(被设计用于在网页上显示)。 此外,解决方案通常假设数据科学家非常了解哪些数据可用,以及他们希望如何将其与已知数据集成。 这些解决方案允许用户查找与指定属性连接或与查询表联合的表。 但是,如果扩展查询表的最佳方法是在多个属性上将其与其他两个表连接,然后将扩展结果与现有的更宽的表合并,那么这些方法就不够了。 该项目将开发一种更全面的表发现方法,既可以发现一组可查询的表,也可以找到将新数据与查询表集成(或对齐)的最佳方法。 在这种称为“表即查询”的新范例中,用户不需要先验地知道存储库中的各个表的哪些属性最佳对齐。 这个项目促进了一个研究议程,在这个议程下,发现不是一个单一的表,而是一组可以与查询表组合(对齐)的表。 这些解决方案将包括表发现过程中的集成选择,寻找一组可以与查询表最佳对齐的表,并找到最佳对齐方式。 重要的是,该项目将不依赖于唯一名称假设,即不同的值表示不同的唯一实体。 真实的数据包含同义词(两个值引用同一个实体)和同形异义词(一个值引用多个实体)。 该项目将定义研究表对齐和发现的新基础和数学原理。 搜索空间巨大,因此该项目还将开发近似的、可扩展的解决方案,可以快速(以交互速度)在包含数百万个表的大型表存储库中找到一组良好的表和良好的对齐。该奖项反映了NSF的法定使命,并且通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(14)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
用于直接访问连接查询的排名答案的易于处理的顺序
- DOI:10.1145/3578517
- 发表时间:2023
- 期刊:
- 影响因子:1.8
- 作者:Carmeli, Nofar;Tziavelis, Nikolaos;Gatterbauer, Wolfgang;Kimelfeld, Benny;Riedewald, Mirek
- 通讯作者:Riedewald, Mirek
DomainNet: Homograph Detection for Data Lake Disambiguation
- DOI:10.5441/002/edbt.2021.03
- 发表时间:2021-03
- 期刊:
- 影响因子:0
- 作者:Aristotelis Leventidis;Laura Di Rocco;Wolfgang Gatterbauer;Renée J. Miller;Mirek Riedewald
- 通讯作者:Aristotelis Leventidis;Laura Di Rocco;Wolfgang Gatterbauer;Renée J. Miller;Mirek Riedewald
SANTOS: Relationship-based Semantic Table Union Search
SANTOS:基于关系的语义表联合搜索
- DOI:10.1145/3588689
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Khatiwada, Aamod;Fan, Grace;Shraga, Roee;Chen, Zixuan;Gatterbauer, Wolfgang;Miller, Renée J.;Riedewald, Mirek
- 通讯作者:Riedewald, Mirek
DIALITE: Discover, Align and Integrate Open Data Tables
DIALITE:发现、调整和集成开放数据表
- DOI:10.1145/3555041.3589732
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Khatiwada, Aamod;Shraga, Roee;Miller, Renée J.
- 通讯作者:Miller, Renée J.
Efficient Computation of Quantiles over Joins
通过连接高效计算分位数
- DOI:10.1145/3584372.3588670
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Tziavelis, Nikolaos;Carmeli, Nofar;Gatterbauer, Wolfgang;Kimelfeld, Benny;Riedewald, Mirek
- 通讯作者:Riedewald, Mirek
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Renee Miller其他文献
Estimation of Cardiac Valve Annuli Motion with Deep Learning
利用深度学习估计心脏瓣膜环运动
- DOI:
10.1007/978-3-030-68107-4_15 - 发表时间:
2020 - 期刊:
- 影响因子:1.9
- 作者:
E. Kerfoot;C. E. King;T. Ismail;D. Nordsletten;Renee Miller - 通讯作者:
Renee Miller
Identification of Transversely Isotropic Properties from Magnetic Resonance Elastography Using the Optimised Virtual Fields Method
使用优化虚拟场方法从磁共振弹性成像中识别横向各向同性特性
- DOI:
10.1007/978-3-319-59448-4_40 - 发表时间:
2017 - 期刊:
- 影响因子:3.9
- 作者:
Renee Miller;A. Kolipaka;M. Nash;A. Young - 通讯作者:
A. Young
Functional Characterization of Antibodies Neutralizing Soluble Factors In Vitro and In Vivo
体外和体内中和可溶性因子抗体的功能表征
- DOI:
- 发表时间:
2010 - 期刊:
- 影响因子:0
- 作者:
G. Veldman;Z. Kaymakcalan;Renee Miller;L. Kalghatgi;J. Salfeld - 通讯作者:
J. Salfeld
A computational study of post-infarct mechanical effects of injected biomaterial into ischaemic myocardium
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Renee Miller - 通讯作者:
Renee Miller
Innovative Application of Cerebral rSO2 Monitoring During Shunt Tap in Pediatric Ventricular Malfunctioning Shunts
儿科心室功能不全分流分流期间脑 rSO2 监测的创新应用
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:1.4
- 作者:
T. Abramo;Chuan Zhou;C. Estrada;M. Meredith;Renee Miller;M. Pearson;N. Tulipan;Abby M. Williams - 通讯作者:
Abby M. Williams
Renee Miller的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Renee Miller', 18)}}的其他基金
III: Small: Semantic Version Management in Data Lakes
III:小:数据湖中的语义版本管理
- 批准号:
2325632 - 财政年份:2023
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
III : Medium: Collaborative Research: From Open Data to Open Data Curation
III:媒介:协作研究:从开放数据到开放数据管理
- 批准号:
2107248 - 财政年份:2021
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
CAREER: Managing Schematic Heterogeneity in Database Management Systems
职业:管理数据库管理系统中的原理图异构性
- 批准号:
9702974 - 财政年份:1997
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
相似海外基金
RII Track-4:@NASA: Bluer and Hotter: From Ultraviolet to X-ray Diagnostics of the Circumgalactic Medium
RII Track-4:@NASA:更蓝更热:从紫外到 X 射线对环绕银河系介质的诊断
- 批准号:
2327438 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: Topological Defects and Dynamic Motion of Symmetry-breaking Tadpole Particles in Liquid Crystal Medium
合作研究:液晶介质中对称破缺蝌蚪粒子的拓扑缺陷与动态运动
- 批准号:
2344489 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402836 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402851 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Continuing Grant
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403122 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
- 批准号:
2403134 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
- 批准号:
2321102 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Transforming the Molecular Science Research Workforce through Integration of Programming in University Curricula
协作研究:网络培训:实施:中:通过将编程融入大学课程来改变分子科学研究人员队伍
- 批准号:
2321045 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
- 批准号:
2321103 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant
Collaborative Research: CPS: Medium: Automating Complex Therapeutic Loops with Conflicts in Medical Cyber-Physical Systems
合作研究:CPS:中:自动化医疗网络物理系统中存在冲突的复杂治疗循环
- 批准号:
2322534 - 财政年份:2024
- 资助金额:
$ 100万 - 项目类别:
Standard Grant