权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

III: Small: Revisiting Experimental Evaluation Protocols for Link Prediction in Knowledge Graphs

III：小：重新审视知识图中链接预测的实验评估协议

基本信息

批准号：
2346959
负责人：
Carlos Rivero
金额：
$ 39.35万
依托单位：
Rochester Institute of Tech
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-06-01 至 2027-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2346959&HistoricalAwards=false
关键词：
III Small Revisiting Experimental Evaluation

项目摘要

This project aims to advance the understanding of link prediction in knowledge graphs. Knowledge graphs connect information through links. For example, a person is linked to a movie because they acted in the movie. This allows that person to be linked to other actors in the movie or to link the movie to the actor's other movies. These links create a knowledge graph. Many services like search engines increasingly rely on knowledge graphs, resulting in millions of users interacting with these graphs daily. Knowledge graphs are typically incomplete; that is, there are many missing links between entities that are in fact related. These missing links hinder graphs' effectiveness. For example, a search engine with missing links cannot completely or accurately answer a user question. Link prediction algorithms aim to make knowledge graphs more complete and therefore more accurate. Current evaluation protocols ignore the nature of the links predicted by a link prediction algorithm (interpretability), and rely on datasets crafted using random selection and arbitrary thresholds (bias). The current understanding of benefits and drawbacks of link prediction algorithms is thus quite limited. Without such understanding, the field of link prediction in knowledge graphs cannot properly advance as there is no clear direction on how to do so.Link prediction algorithms commonly rely on machine learning, so they train link prediction models to complete knowledge graphs. There are three main issues hindering our understanding of what link prediction models can accomplish: 1) The lack of methods to interpret a set of link predictions rather than individual predictions, and to quantify model interpretability; 2) The use of random selection and arbitrary thresholds to evaluate link prediction that introduce biases; and 3) The lack of homogeneous comparisons that hinder replicability due to variations in the experimental evaluation protocol. This project proposes the following advances: 1) New methods to compute global interpretations of the link predictions that a model deems correct; 2) New methods to detect and interpret the link prediction rules a model has learned, such as "if actor A acts in movie M, then A is in M's cast;" 3) New interpretation metrics and analyses considering various strategies to generate incorrect knowledge and its expected plausibility; 4) New definitions of anomalies, a.k.a. data redundancy, to understand biases in benchmarking datasets while taking link prediction rules into account; 5) New methods to partition datasets into splits that preserve graph features with statistical guarantees; 6) New statistically sound methods to select subgraphs from real-world knowledge graphs for use as benchmarking datasets; 7) An open-source link prediction framework to reduce barriers when replicating results; and 8) A link prediction evaluation module available through Google Colab, and publicly-available link prediction models to promote open comparisons.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

本项目旨在促进对知识图中链接预测的理解。知识图谱通过链接连接信息。例如，一个人被链接到一部电影，因为他们在这部电影中扮演了角色。这允许这个人链接到电影中的其他演员，或者将电影链接到演员的其他电影。这些链接创建了一个知识图谱。像搜索引擎这样的许多服务越来越依赖于知识图谱，导致每天有数百万用户与这些图谱进行交互。知识图谱通常是不完整的；也就是说，在实际上相关的实体之间存在许多缺失的链接。这些缺失的链接阻碍了图表的有效性。例如，缺少链接的搜索引擎无法完整或准确地回答用户的问题。链接预测算法旨在使知识图更完整，从而更准确。目前的评估协议忽略了链路预测算法预测的链路的性质（可解释性），并依赖于使用随机选择和任意阈值（偏差）制作的数据集。因此，目前对链路预测算法的优缺点的理解是相当有限的。如果没有这样的理解，知识图中的链接预测领域就无法正常发展，因为没有明确的方向。链接预测算法通常依赖于机器学习，因此它们训练链接预测模型来完成知识图。有三个主要问题阻碍了我们对链接预测模型的理解：1)缺乏解释一组链接预测而不是单个预测的方法，以及量化模型可解释性的方法；2)使用随机选择和任意阈值来评估引入偏差的链路预测；3)由于实验评估方案的差异，缺乏同质比较，阻碍了可复制性。本项目提出以下进展：1)计算模型认为正确的链接预测的全局解释的新方法；2)检测和解释模型所学到的链接预测规则的新方法，例如“如果演员a在电影M中表演，那么a在M的演员阵容中；”3)考虑各种产生错误知识的策略及其预期合理性的新的解释指标和分析；4)新的异常定义，即数据冗余，以理解基准数据集的偏差，同时考虑链接预测规则；5)新的数据集分割方法，在保留统计保证的图特征的情况下；6)从现实世界的知识图中选择子图作为基准数据集的新的统计可靠方法；7)开源链接预测框架，减少复制结果时的障碍；8)通过谷歌Colab提供的链路预测评价模块，以及公开可用的链路预测模型，促进公开比较。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。