Structured Data Integration with Referring Expressions and Types
与引用表达式和类型的结构化数据集成
基本信息
- 批准号:RGPIN-2022-03742
- 负责人:
- 金额:$ 2.11万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Any enterprise will have access to a large number of structured data sources that are managed by its operational information systems or that can be accessed through external sites on the web. The sources will provide interfaces that can range from low-level file systems storing such artifacts as CSV files, to intermediate-level key-value systems managing very large volumes of operational data, through to high-level relational database systems and NewSQL systems providing an SQL interface to their own structured data. The enterprise will also have an abundance of circumstances where code must be written by its IT personal, its data engineers and its data scientists to access many of these data sources at runtime in order to satisfy emerging integration requirements for its existing information systems, or to satisfy the requirements of new information systems currently under development. Such circumstances are instances of the general problem of structured data integration, the focus of this research program. Structured data integration is the problem of finding and executing query plans that manifest efficient ways to interact with data sources to obtain knowledge of their structured data, and to combine the results of this interaction in a way that, as efficiently as possible, provides answers to user queries coded by developers. The problem becomes particularly challenging when user queries are expressed in a high-level non-procedural language, such as SQL, over an integrating schema. Such a schema, sometimes called a mediated schema or ontology, is tailored entirely to the domain of the enterprise, and is therefore devoid of any knowledge of the data sources themselves. The increase in developer productivity enabled by data integration technology that provides such a developer interface is now indisputable. There are two subproblems in data integration that are the focus of this research: (1) the selection of suitable languages for capturing so-called physical designs (physical designs augment logical designs with the source schemas of data residing at various data sources, with mapping rules relating the logical design to source schemas, and with knowledge of the capabilities of the data sources themselves), and (2) the selection of suitable internal languages for expressing user queries and query plans together with the problem of finding query plans given physical designs and user queries. The research program is a continuation of my ongoing work on each of these subproblems, and therefore has a particular focus on the issue of identification in structured data integration, in particular, on how data sources have encoded references to entities of interest to the enterprise and on how the enterprise itself has chosen such means of reference. The development of practical tools for compiling SQL queries over an integrating schema to efficient query plans accessing structured data sources is the ultimate goal and is also well underway.
任何企业都可以访问大量的结构化数据源,这些数据源由其运营信息系统管理,或者可以通过Web上的外部站点访问。这些数据源将提供接口,从存储CSV文件等工件的低级文件系统,到管理大量操作数据的中级键值系统,再到高级关系数据库系统和为自己的结构化数据提供SQL接口的NewSQL系统。企业还将有大量的情况,其中代码必须由其IT人员、其数据工程师和其数据科学家编写,以在运行时访问这些数据源中的许多数据源,以满足其现有信息系统的新兴集成需求,或者满足当前正在开发的新信息系统的需求。这种情况下的结构化数据集成的一般问题,本研究计划的重点实例。结构化数据集成是寻找和执行查询计划的问题,这些查询计划表明与数据源交互以获得其结构化数据的知识的有效方式,并且以尽可能有效地为开发人员编码的用户查询提供答案的方式联合收割机组合这种交互的结果。当用户查询在一个集成模式上用高级非过程语言(如SQL)表示时,这个问题变得特别具有挑战性。这样的模式,有时称为中介模式或本体,完全针对企业的领域定制,因此缺乏数据源本身的任何知识。 通过提供这种开发人员界面的数据集成技术,开发人员生产力的提高现在是无可争议的。数据集成中有两个子问题是本研究的重点:(1)选择合适的语言来捕捉所谓的物理设计(物理设计利用驻留在各种数据源处的数据的源模式、利用将逻辑设计与源模式相关联的映射规则以及利用数据源自身的能力的知识来增强逻辑设计),以及(2)选择合适的内部语言来表达用户查询和查询计划,以及在给定物理设计和用户查询的情况下找到查询计划的问题。该研究计划是我正在进行的工作,对这些子问题的延续,因此有一个特别关注的问题,在结构化数据集成的识别,特别是,数据源如何编码引用实体感兴趣的企业和企业本身如何选择这样的参考手段。开发实用的工具,用于在集成模式上编译SQL查询,以有效地访问结构化数据源的查询计划,是最终目标,也正在进行中。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Weddell, Grant其他文献
Weddell, Grant的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Weddell, Grant', 18)}}的其他基金
Data Integration: from NoSQL to Main Memory Data Sources
数据集成:从 NoSQL 到主内存数据源
- 批准号:
RGPIN-2016-03884 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Data Integration: from NoSQL to Main Memory Data Sources
数据集成:从 NoSQL 到主内存数据源
- 批准号:
RGPIN-2016-03884 - 财政年份:2020
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Data Integration: from NoSQL to Main Memory Data Sources
数据集成:从 NoSQL 到主内存数据源
- 批准号:
RGPIN-2016-03884 - 财政年份:2019
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Data Integration: from NoSQL to Main Memory Data Sources
数据集成:从 NoSQL 到主内存数据源
- 批准号:
RGPIN-2016-03884 - 财政年份:2018
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Data Integration: from NoSQL to Main Memory Data Sources
数据集成:从 NoSQL 到主内存数据源
- 批准号:
RGPIN-2016-03884 - 财政年份:2017
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Data Integration: from NoSQL to Main Memory Data Sources
数据集成:从 NoSQL 到主内存数据源
- 批准号:
RGPIN-2016-03884 - 财政年份:2016
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Integrating and querying information sources in the context of large-scale ontological knowledge
大规模本体知识背景下的信息源整合与查询
- 批准号:
36823-2011 - 财政年份:2015
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Integrating and querying information sources in the context of large-scale ontological knowledge
大规模本体知识背景下的信息源整合与查询
- 批准号:
36823-2011 - 财政年份:2014
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Integrating and querying information sources in the context of large-scale ontological knowledge
大规模本体知识背景下的信息源整合与查询
- 批准号:
36823-2011 - 财政年份:2013
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Integrating and querying information sources in the context of large-scale ontological knowledge
大规模本体知识背景下的信息源整合与查询
- 批准号:
36823-2011 - 财政年份:2012
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
- 批准号:
- 批准年份:2020
- 资助金额:40 万元
- 项目类别:
基于Linked Open Data的Web服务语义互操作关键技术
- 批准号:61373035
- 批准年份:2013
- 资助金额:77.0 万元
- 项目类别:面上项目
Molecular Interaction Reconstruction of Rheumatoid Arthritis Therapies Using Clinical Data
- 批准号:31070748
- 批准年份:2010
- 资助金额:34.0 万元
- 项目类别:面上项目
高维数据的函数型数据(functional data)分析方法
- 批准号:11001084
- 批准年份:2010
- 资助金额:16.0 万元
- 项目类别:青年科学基金项目
染色体复制负调控因子datA在细胞周期中的作用
- 批准号:31060015
- 批准年份:2010
- 资助金额:25.0 万元
- 项目类别:地区科学基金项目
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Seamless integration of Financial data into ESG data
将财务数据无缝集成到 ESG 数据中
- 批准号:
10099890 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Collaborative R&D
Collaborative Research: Constraining next generation Cascadia earthquake and tsunami hazard scenarios through integration of high-resolution field data and geophysical models
合作研究:通过集成高分辨率现场数据和地球物理模型来限制下一代卡斯卡迪亚地震和海啸灾害情景
- 批准号:
2325311 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Standard Grant
CAREER: Next-Generation Methods for Statistical Integration of High-Dimensional Disparate Data Sources
职业:高维不同数据源统计集成的下一代方法
- 批准号:
2422478 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Continuing Grant
Next Generation Tools For Genome-Centric Multimodal Data Integration In Personalised Cardiovascular Medicine
个性化心血管医学中以基因组为中心的多模式数据集成的下一代工具
- 批准号:
10104323 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
EU-Funded
Optimising Data Integration for Sustainable Deployment of Zero Emission Vehicles in UK
优化数据集成以实现英国零排放车辆的可持续部署
- 批准号:
10114156 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
SME Support
NEXT GENERATION TOOLS FOR GENOME-CENTRIC MULTIMODAL DATA INTEGRATION IN PERSONALISED CARDIOVASCULAR MEDICINE
用于个性化心血管医学中以基因组为中心的多模式数据集成的下一代工具
- 批准号:
10098097 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
EU-Funded
Collaborative Research: Constraining next generation Cascadia earthquake and tsunami hazard scenarios through integration of high-resolution field data and geophysical models
合作研究:通过集成高分辨率现场数据和地球物理模型来限制下一代卡斯卡迪亚地震和海啸灾害情景
- 批准号:
2325312 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Standard Grant
CC* INTEGRATION-SMALL: ADIABATIC MICROSERVICE LEVEL LOAD BALANCED FORWARDING ON PISA SWITCH FOR ACCELERATING URGENT PROCESSES IN SCIENCE DATA CENTER NETWORKS
CC* 集成小型:PISA 交换机上的绝热微服务级负载平衡转发,用于加速科学数据中心网络中的紧急进程
- 批准号:
2346729 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Standard Grant
Collaborative Research: Constraining next generation Cascadia earthquake and tsunami hazard scenarios through integration of high-resolution field data and geophysical models
合作研究:通过集成高分辨率现场数据和地球物理模型来限制下一代卡斯卡迪亚地震和海啸灾害情景
- 批准号:
2325310 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Standard Grant
CAREER: New data integration approaches for efficient and robust meta-estimation, model fusion and transfer learning
职业:新的数据集成方法,用于高效、稳健的元估计、模型融合和迁移学习
- 批准号:
2337943 - 财政年份:2024
- 资助金额:
$ 2.11万 - 项目类别:
Continuing Grant