权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Negative Knowledge at Web Scale

网络规模的负面知识

基本信息

批准号：
453095897
负责人：
Dr. Simon Razniewski
金额：
--
依托单位：
Zentralbereich Forschung und Entwicklung
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2021
资助国家：
德国
起止时间：
2020-12-31 至 2023-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/453095897?language=en
关键词：
Negative Knowledge Web Scale

项目摘要

Structured knowledge is crucial in a range of applications such as question answering, dialogue or recommender systems. The required knowledge is usually stored in knowledge bases (KBs), and recent years have seen a rise of interest in KB construction, querying and maintenance. Some KBs focus on lexical information, others on geospatial knowledge, activities, or common sense. But most prominently, KBs capture encyclopedic knowledge, with notable projects being Wikidata, DBpedia, or the Google Knowledge Graph. These KBs store positive statements such as “Saarbrücken is the capital of the Saarland”, and are a key asset for many knowledge-intensive AI applications.A major limitation of all these KBs is their inability to deal with negative information. At present, all major knowledge bases only contain positive information, whereas statements such as that Tom Cruise did not win an Oscar can only be deduced by inferences that require substantial assumptions. As KBs generally only contain subsets of what is true, users often have to guess whether information not contained in a KBs is false, or truth is merely unknown to the KB. Not being able to formally distinguish whether a statement is false or unknown poses challenges in a variety of applications. In medicine, for instance, it is important to distinguish between knowing about the absence of a biochemical reaction between substances, and not knowing about its existence at all. In corporate integrity, it is important to know whether a person was never employed by a certain competitor, while in anti-corruption investigations, absence of family relations needs to be ascertained. In the domain of (fake) news, there is an important distinction between rumors whose truth is unknown (such as “Malayan Airlines 370 was hijacked”), and those established to be false (“Obama was born in Kenya”).While negative information has received great attention in logics and database theory, it is still absent from current web-scale knowledge bases. For instance, Wikidata, DBpedia and YAGO all only contain positive information, and at best allow limited inferences about negation via schema constraints. Similarly, text extraction and statistical inferences so far have only tackled positive information. In this project we aim to overcome the current restriction of knowledge bases to positive information by research that encompasses three components: (i) statistical inferencing techniques for generating negative information, (ii) web-validation and joint consolidation techniques for resolving contradictions and inconsistencies, and (iii) ranking techniques that allow to retrieve negative information as relevant in specific use cases.

结构化知识在问答、对话或推荐系统等一系列应用中至关重要。所需的知识通常存储在知识库（KB），近年来已经看到了知识库的建设，查询和维护的兴趣上升。一些知识库侧重于词汇信息，其他知识库侧重于地理空间知识、活动或常识。但最突出的是，知识库捕获了百科知识，著名的项目是Wikidata，DBpedia或Google Knowledge Graph。这些知识库存储积极的陈述，如“萨尔布吕肯是萨尔兰的首府”，并且是许多知识密集型AI应用程序的关键资产。所有这些知识库的一个主要限制是它们无法处理负面信息。目前，所有主要的知识库都只包含积极的信息，而像汤姆克鲁斯没有赢得奥斯卡奖这样的陈述只能通过需要大量假设的推论来推断。由于知识库通常只包含真实信息的子集，因此用户经常不得不猜测知识库中不包含的信息是否为假，或者知识库仅仅不知道真实信息。不能正式区分一个陈述是虚假的还是未知的，这在各种应用中构成了挑战。例如，在医学中，区分知道物质之间不存在生化反应和完全不知道它的存在是很重要的。在公司廉正方面，重要的是要知道一个人是否从未受雇于某个竞争对手，而在反腐败调查中，需要确定是否存在家庭关系。在（假）新闻领域，有一个重要的区别，谣言的真相是未知的（如“马来亚航空公司370被劫持”），和那些被确定为虚假的（“奥巴马出生在肯尼亚”）。虽然负面信息受到了极大的关注，在逻辑和数据库理论，它仍然是缺乏目前的网络规模的知识库。例如，Wikidata、DBpedia和YAGO都只包含肯定信息，并且最多允许通过模式约束进行有限的否定推理。同样，文本提取和统计推断到目前为止只处理了积极的信息。在这个项目中，我们的目标是通过研究来克服目前知识库对积极信息的限制，该研究包括三个组成部分：（i）用于生成负面信息的统计推断技术，（ii）用于解决矛盾和不一致的网络验证和联合整合技术，以及（iii）允许检索与特定用例相关的负面信息的排名技术。