权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Knowledge Extraction via Learning Processes and Data Models with Imprecision

通过不精确的学习过程和数据模型提取知识

基本信息

批准号：
RGPIN-2017-06245
负责人：
Reformat, Marek
金额：
$ 1.46万
依托单位：
University of Alberta
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2018
资助国家：
加拿大
起止时间：
2018-01-01 至 2019-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=663123
关键词：
Knowledge Extraction via Learning Processes

项目摘要

The web represents an immense repository of information. A number of sources of structured and even more often unstructured data is growing every day. There is no doubt that our dependency on data increases continuously. However, the increased amount of data - although recognized as positive and beneficial fact - creates an even bigger issue related to our ability to fully utilize that data. That situation increases the pressure to develop automatic and more efficient approaches suitable for advanced processing of data leading to creating logic structures that ‘lifting' raw data into a knowledge-like level. ***Fortunately, we are at the onset of significant and far-reaching changes in the way data are represented and stored on the web. The concept of knowledge graphs becomes a new way of expressing pieces of data and relations between them. Resource Description Framework - one of the most fundamental aspects of the Semantic Web - is recognized as the most suitable data format for representing knowledge graphs. ***The proposed research project puts a special emphasis on processes of constructing, updating and utilizing knowledge graphs built based on data and information obtained on the web and extracted from documents. A key innovation of this project is a fusion of web technologies, fuzzy-based techniques, and concepts of category theory and topos to fully explore data taking advantage of Resource Description Framework's intrinsic interconnectivity, and setting up a basis for knowledge synthesis processes.***These activities will lead to establishing coherent rudiments of knowledge creation processes. In a nutshell, the proposed methodology focuses on forming knowledge-rich structures following the steps: 1) extracting information from documents and representing it in a form of so-called information knowledge graphs that contain specific pieces of information; 2) clustering and generalization of those graphs leading to construction of conceptual knowledge graphs; 3) maintaining both types of graphs via incremental updates using aggregation and data assimilation techniques taking into account imprecision and confidence levels in different pieces of data; and 4) constructing logic structures in a form of internal logic of topos based on conceptual knowledge graphs and linking those structures with information graphs for validation and cognitive purposes.***It is expected that the project will lead to significant contributions in methodologies aiming at building a new generation of systems that support the users in their activities related to collecting data from the web, and processing it towards creation of knowledge. This will lead to development of knowledge systems capable of validating correctness of information extracted from data, and synthesizing new concepts based on it.***Overall, the project encompasses state-of-the-art research and HQP training in an important for Canada IT area.

网络代表着一个巨大的信息库。许多结构化数据源，甚至更多的非结构化数据源每天都在增长。毫无疑问，我们对数据的依赖不断增加。然而，数据量的增加-尽管被认为是积极和有益的事实-造成了一个更大的问题，即我们是否有能力充分利用这些数据。这种情况增加了开发自动和更有效的方法的压力，这些方法适用于高级数据处理，从而创建逻辑结构，将原始数据“提升”到知识级别。* 幸运的是，我们正处于数据在网络上的表示和存储方式发生重大而深远变化的开端。知识图的概念成为表达数据片段及其之间关系的新方式。资源描述框架是语义网最基本的方面之一，被认为是表示知识图的最合适的数据格式。*** 拟议的研究项目特别强调构建、更新和利用知识图的过程，知识图是根据从网上获得的数据和信息以及从文件中提取的数据和信息构建的。该项目的一个关键创新是融合了网络技术、基于模糊的技术以及范畴理论和主题的概念，以充分利用资源描述框架的内在互连性来探索数据，并为知识综合过程奠定基础。这些活动将导致建立知识创造过程的连贯一致的雏形。概括地说，该方法的核心是构建知识丰富的结构，其步骤如下：1）从文档中提取信息，并将其表示为包含特定信息的信息知识图的形式; 2）对这些图进行聚类和泛化，从而构建概念知识图; 3）考虑到不同数据的不精确性和置信水平，通过使用聚合和数据同化技术的增量更新来维护这两种类型的图表;（4）基于概念知识图构建主题内部逻辑形式的逻辑结构，并将这些结构与信息图连接起来，以达到验证和认知的目的。预计该项目将对旨在建立新一代系统的方法学作出重大贡献，这些系统支持用户从网上收集数据并对其进行处理以创造知识的活动。这将导致知识系统的发展，能够验证从数据中提取的信息的正确性，并在此基础上合成新的概念。总的来说，该项目包括在加拿大重要的IT领域进行最先进的研究和HQP培训。