Design and implementation of big complex semantic data management system

复杂大语义数据管理系统的设计与实现

基本信息

  • 批准号:
    RGPIN-2014-05796
  • 负责人:
  • 金额:
    $ 1.46万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2015
  • 资助国家:
    加拿大
  • 起止时间:
    2015-01-01 至 2016-12-31
  • 项目状态:
    已结题

项目摘要

In the age of big data, how to effectively store, manage, retrieve, and analyze such large-scale complex data in a timely and efficient manner is a major challenge. The primary goal of the proposed research is to design and implement a cost effective and highly scalable database management system for big complex semantic data storage, query, and analytics by combining and extending ideas from parallel database systems, MapReduce computing platform, and our previous work on complex semantic database model Information Networking Model (INM) and INM-DBMS. A parallel database system is a database management system (DBMS) implemented on a multiprocessor system with high-degree connectivity. It features data modeling using well-defined schemas, declarative query languages with high levels of abstraction, sophisticated query optimizers, and a rich runtime environment that supports efficient execution strategies. MapReduce computing paradigm, started by Google and made popular by the open source Hadoop, is a cost-effective distributed data storage and processing systems on large clusters of low-cost commodity machines connected with high-bandwidth network. It has gained a lot of attention in recent years from industry and research. INM-DBMS is a complex semantic database management system that features hierarchical and composite binary and higher degree relationships and their combinations, built-in semantics for consistency and integrity constraints for various relationships, and rich deductive and active rules. It has a concise and compact but expressive language consisting of three parts, information definition language (IDL), information manipulation language (IML) and information query language (IQL). IDL and IML provide powerful constructs to express the rich semantics and integrity constraints associated with various relationships. The declarative query language IQL effectively incorporates many useful features found in database, XML and logic programming languages such as logical variables, implicit existential and non-existential quantification, explicit universal quantification, negation as failure, tree expressions, etc. It can explore the natural networking structure of objects to extract and construct meaningful results in a concise, natural, and compact way. We will first extend the INM data model to adapt to schema-free and semi-structured, and then implement a big semantic database management system based on it. The system will consist of two layers to achieve high concurrency. The manipulation layer is in charge of definitions, manipulations and queries while Data Layer takes care of data storage. In the manipulation layer, objects are evenly allocate to storage nodes by a hash function and query tasks are analysed and interpreted into parallel tasks via an INM MapReduce library interface to achieve partition balancing and partitioned parallelism. In the data layer, the structured semantic data is partitioned and stored in various nodes to ensure dynamic scalability. In the methodology part, four kinds of nodes are differentiated to execute different jobs. The nodes are independent of each other and separated from material machines, which makes the system highly extensible in physical resource allocation and extremely robust at fault-tolerance. Our developed big complex semantic DBMS can significantly contribute to the effective management, efficient retrieval, and timely analytics of various heterogeneous, semi-structured and unstructured massive data. It can be used in many applications such as semantic search engine, complex social network services with exploded increasing data, large-scale data analysis and data mining, knowledge graph establishing and knowledge discovery, etc.
在大数据时代,如何有效地存储、管理、检索和及时、高效地分析如此大规模的复杂数据是一个重大挑战。本研究的主要目标是设计和实现一个具有成本效益和高度可扩展性的数据库管理系统,用于大型复杂语义数据的存储、查询和分析,结合和扩展并行数据库系统、MapReduce计算平台以及我们以前在复杂语义数据库模型信息网络模型(INM)和INM-DBMS上的工作。 并行数据库系统是在具有高度连接性的多处理器系统上实现的数据库管理系统(DBMS)。它的特点是使用定义良好的模式进行数据建模,具有高抽象级别的声明性查询语言,复杂的查询优化器以及支持高效执行策略的丰富运行时环境。 MapReduce计算范式,由Google发起,并由开源Hadoop流行,是一种具有成本效益的分布式数据存储和处理系统,在大型低成本商品机器集群上连接高带宽网络。近年来,它受到了工业和研究的广泛关注。 INM-DBMS是一个复杂的语义数据库管理系统,其特点是分层和复合的二元和更高程度的关系及其组合,内置的语义一致性和完整性约束的各种关系,以及丰富的演绎和主动规则。它有一个简洁紧凑但富有表现力的语言,由信息定义语言(IDL)、信息操作语言(IML)和信息查询语言(IQL)三部分组成。IDL和IML提供了强大的结构来表达与各种关系相关的丰富语义和完整性约束。声明式查询语言IQL有效地结合了数据库、XML和逻辑编程语言中的许多有用特性,如逻辑变量、隐式存在和不存在量化、显式泛量化、失败否定、树表达式等,它可以探索对象的自然网络结构,以简洁、自然和紧凑的方式提取和构造有意义的结果。 我们首先将INM数据模型扩展到无模式和半结构化,然后在此基础上实现一个大型的语义数据库管理系统,该系统分为两层,以实现高并发。操作层负责定义、操作和查询,数据层负责数据存储。在操作层,通过哈希函数将对象均匀分配到存储节点,并通过INM MapReduce库接口将查询任务解析为并行任务,实现分区平衡和分区并行。在数据层,将结构化语义数据划分存储在各个节点上,保证动态可扩展性。在方法论部分,四种类型的节点进行区分,以执行不同的作业。节点之间相互独立,与实体机器分离,这使得系统在物理资源分配上具有高度的可扩展性,并具有极强的容错能力。 我们开发的大型复杂语义数据库管理系统能够有效地管理、高效地检索和及时地分析各种异构、半结构化和非结构化的海量数据。它可以应用于语义搜索引擎、数据爆炸式增长的复杂社会网络服务、大规模数据分析和数据挖掘、知识图谱的建立和知识发现等领域。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Liu, Mengchi其他文献

A Survey on NoSQL Stores
NoSQL 存储调查
  • DOI:
    10.1145/3158661
  • 发表时间:
    2018-06-01
  • 期刊:
  • 影响因子:
    16.6
  • 作者:
    Davoudian, Ali;Chen, Liu;Liu, Mengchi
  • 通讯作者:
    Liu, Mengchi
Efficient Algorithms for High Utility Itemset Mining Without Candidate Generation
Towards intelligent E-learning systems.
3SEPIAS: A Semi-Structured Search Engine for Personal Information in dAtaspace System
3SEPIAS:dAtaspace系统中个人信息的半结构化搜索引擎
  • DOI:
    10.1016/j.ins.2012.06.013
  • 发表时间:
    2013
  • 期刊:
  • 影响因子:
    8.1
  • 作者:
    Zhong, Ming;Liu, Mengchi;He, Yanxiang
  • 通讯作者:
    He, Yanxiang

Liu, Mengchi的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Liu, Mengchi', 18)}}的其他基金

Design and implementation of big complex semantic data management system
复杂大语义数据管理系统的设计与实现
  • 批准号:
    RGPIN-2014-05796
  • 财政年份:
    2018
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
Design and implementation of big complex semantic data management system
复杂大语义数据管理系统的设计与实现
  • 批准号:
    RGPIN-2014-05796
  • 财政年份:
    2017
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
Design and implementation of big complex semantic data management system
复杂大语义数据管理系统的设计与实现
  • 批准号:
    RGPIN-2014-05796
  • 财政年份:
    2016
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
Design and implementation of big complex semantic data management system
复杂大语义数据管理系统的设计与实现
  • 批准号:
    RGPIN-2014-05796
  • 财政年份:
    2014
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
XML-based intelligent information systems
基于XML的智能信息系统
  • 批准号:
    193552-2004
  • 财政年份:
    2008
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
XML-based intelligent information systems
基于XML的智能信息系统
  • 批准号:
    193552-2004
  • 财政年份:
    2006
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
XML-based intelligent information systems
基于XML的智能信息系统
  • 批准号:
    193552-2004
  • 财政年份:
    2005
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
Server for intelligent information systems
智能信息系统服务器
  • 批准号:
    300166-2004
  • 财政年份:
    2004
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Research Tools and Instruments - Category 1 (<$150,000)
XML-based intelligent information systems
基于XML的智能信息系统
  • 批准号:
    193552-2004
  • 财政年份:
    2004
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual
Foundations and implementations of intelligent information systems
智能信息系统基础与实现
  • 批准号:
    193552-2000
  • 财政年份:
    2003
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Discovery Grants Program - Individual

相似海外基金

MEGASKILLS [MEthodology of Psycho-pedagogical, Big Data and Commercial Video GAmes procedures for the European SKILLS Agenda Implementation]
MEGASKILLS [欧洲技能议程实施的心理教育学、大数据和商业视频游戏程序的方法]
  • 批准号:
    10069843
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
    EU-Funded
Ethics Core (FABRIC)
道德核心 (FABRIC)
  • 批准号:
    10662376
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
SUD-t Map: A Big Data Digital Platform to Identify and Characterize SUD Treatment Opportunities to Address Health Disparities
SUD-t 地图:一个大数据数字平台,用于识别和描述 SUD 治疗机会,以解决健康差异
  • 批准号:
    10594828
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
Construction of an Efficient and Robust Ophthalmic Big Data and AI System through Implementation of Federated Learning
通过实施联邦学习构建高效、鲁棒的眼科大数据和人工智能系统
  • 批准号:
    23K17434
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
    Grant-in-Aid for Challenging Research (Pioneering)
Resource Development Core
资源开发核心
  • 批准号:
    10746571
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
Exploratory Research Project - ADAPT
探索性研究项目 - ADAPT
  • 批准号:
    10577122
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
Development of personalized healthy food incentives to improve diet and cardiovascular risk
制定个性化健康食品激励措施以改善饮食和心血管风险
  • 批准号:
    10663538
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
A national study on the effects of air pollution and temperature on children's neurodevelopmental outcomes
一项关于空气污染和温度对儿童神经发育结果影响的全国研究
  • 批准号:
    10585592
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
US Ten Day Seminar on the Epidemiology and Prevention of Cardiovascular Diseases and Stroke
美国心血管疾病及中风流行病学及预防十天研讨会
  • 批准号:
    10754206
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
West Africa Center of Excellence for Data Science Research Education
西非数据科学研究教育卓越中心
  • 批准号:
    10713853
  • 财政年份:
    2023
  • 资助金额:
    $ 1.46万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了