权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Encyclopedic Lexical Representations for Natural Language Processing

自然语言处理的百科全书式词汇表示

基本信息

批准号：
EP/V025961/1
负责人：
Steven Schockaert
金额：
$ 76.1万
依托单位：
CARDIFF UNIVERSITY
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FV025961%2F1
关键词：
Encyclopedic Lexical Representations Natural Language

项目摘要

The field of Natural Language Processing (NLP) has made unprecedented progress over the last decade, fuelled by the introduction of increasingly powerful neural network models. These models have an impressive ability to discover patterns in training examples, and to transfer these patterns to previously unseen test cases. Despite their strong performance in many NLP tasks, however, the extent to which they "understand" language is still remarkably limited. The key underlying problem is that language understanding requires a vast amount of world knowledge, which current NLP systems are largely lacking. In this project, we focus on conceptual knowledge, and more in particular on: (i) capturing what properties are associated with a given concept (e.g. lions are dangerous, boats can float); (ii) characterising how different concepts are related (e.g. brooms are used for cleaning, bees produce honey).Our proposed approach relies on the fact that Wikipedia contains a wealth of such knowledge. A key problem, however, is that important properties and relationships are often not explicitly mentioned in text, especially if they follow straightforwardly from other information, for a human reader (e.g. if X is an animal that can fly then X probably has wings). Apart from learning to extract knowledge expressed in text, we thus also have to learn how to reason about conceptual knowledge.A central question is how conceptual knowledge should be represented. Current NLP systems heavily rely on vector representations. Each concept is then represented by a single vector. It is now well-understood how such representations can be learned, and they are straightforward to incorporate into neural network architectures. However, they also have important theoretical limitations in terms of what knowledge they can capture, and they only allow for shallow and heuristic forms of reasoning. In contrast, in symbolic AI, conceptual knowledge is typically represented using facts and rules. This enables powerful forms of reasoning, but symbolic representations are harder to learn and to use in neural networks. Moreover, symbolic representations are also limited because they cannot capture aspects of knowledge that are matters of degree (e.g. similarity and typicality), which is especially restrictive when modelling commonsense knowledge.The solution we propose relies on a novel hybrid representation framework, which combines the main advantages of vector representations with those of symbolic methods. In particular, we will explicitly represent properties and relationships, as in symbolic frameworks, but these properties and relations will be encoded as vectors. Each concept will thus be associated with several property vectors, while pairs of related concepts will be associated with one or more relation vectors. Our vectors will thus intuitively play the same role that facts play in symbolic frameworks, with associated neural network models then playing the role of rules.The main output from this project will consist in a comprehensive resource, in which conceptual knowledge is encoded in this hybrid way. We expect that our resource will play an important role in NLP, given the importance of conceptual knowledge for language understanding and its highly complementary nature to existing resources. To demonstrate its usefulness, we will focus on two challenging applications: reading comprehension and topic/trend modelling. We will also develop three case studies. In one case study, we will learn representations of companies, by using our resource to summarise the activities of companies in a semantically meaningful way. In another case study, we will use our resource to identify news stories that are relevant to a given theme. Finally, we will use our methods to learn semantically coherent descriptions of emerging trends in patents.

自然语言处理（NLP）领域在过去十年中取得了前所未有的进步，这得益于引入越来越强大的神经网络模型。这些模型具有令人印象深刻的能力，可以在训练示例中发现模式，并将这些模式转移到以前未见过的测试用例中。尽管他们在许多NLP任务中表现出色，然而，他们“理解”语言的程度仍然非常有限。关键的潜在问题是，语言理解需要大量的世界知识，而目前的NLP系统在很大程度上缺乏这些知识。在这个项目中，我们专注于概念性知识，特别是：(i)捕获与给定概念相关的属性（例如狮子是危险的，船可以漂浮）；（ii）描述不同概念之间的关系（例如，扫帚用于清洁，蜜蜂生产蜂蜜）。我们提出的方法依赖于维基百科包含丰富的此类知识这一事实。然而，一个关键问题是，对于人类读者来说，重要的属性和关系通常没有在文本中明确提及，特别是当它们直接跟随其他信息时（例如，如果X是一种会飞的动物，那么X可能有翅膀）。因此，除了学习提取文本表达的知识外，我们还必须学习如何对概念知识进行推理。一个核心问题是概念知识应该如何表示。目前的NLP系统严重依赖向量表示。然后，每个概念由单个向量表示。现在人们已经很好地理解了这种表征是如何被学习的，并且它们很容易被整合到神经网络架构中。然而，它们在能够捕获的知识方面也有重要的理论限制，并且它们只允许浅层和启发式的推理形式。相反，在符号人工智能中，概念知识通常使用事实和规则来表示。这使得强大的推理形式成为可能，但符号表示更难学习，也更难在神经网络中使用。此外，符号表示也有局限性，因为它们不能捕捉知识的程度问题（例如相似性和典型性），这在对常识性知识建模时尤其受限制。我们提出的解决方案依赖于一个新的混合表示框架，它结合了向量表示和符号方法的主要优点。特别是，我们将显式地表示属性和关系，就像在符号框架中一样，但是这些属性和关系将被编码为向量。因此，每个概念将与几个属性向量相关联，而相关概念对将与一个或多个关系向量相关联。因此，我们的向量将直观地扮演与事实在符号框架中扮演的角色相同的角色，而相关的神经网络模型则扮演规则的角色。这个项目的主要产出将包括一个综合资源，其中概念知识以这种混合方式编码。鉴于概念知识对语言理解的重要性及其与现有资源的高度互补性，我们期望我们的资源将在NLP中发挥重要作用。为了证明它的有用性，我们将重点关注两个具有挑战性的应用：阅读理解和主题/趋势建模。我们还将开展三个案例研究。在一个案例研究中，我们将通过使用我们的资源以语义上有意义的方式总结公司的活动来学习公司的表示。在另一个案例研究中，我们将使用我们的资源来识别与给定主题相关的新闻故事。最后，我们将使用我们的方法来学习对专利新趋势的语义连贯描述。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models

DOI：
10.1145/3539618.3591667
发表时间：
2023-05
期刊：
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
影响因子：
0
作者：
N. Li;Hanane Kteich;Zied Bouraoui;Steven Schockaert
通讯作者：
N. Li;Hanane Kteich;Zied Bouraoui;Steven Schockaert

Cabbage Sweeter than Cake? Analysing the Potential of Large Language Models for Learning Conceptual Spaces

卷心菜比蛋糕甜？

DOI：
10.18653/v1/2023.emnlp-main.725
发表时间：
2023
期刊：
影响因子：
0
作者：
Chatterjee U
通讯作者：
Chatterjee U

Embeddings as epistemic states: Limitations on the use of pooling operators for accumulating knowledge

作为认知状态的嵌入：使用池算子来积累知识的限制

DOI：
10.1016/j.ijar.2023.108981
发表时间：
2023
期刊：
International Journal of Approximate Reasoning
影响因子：
3.9
作者：
Schockaert S
通讯作者：
Schockaert S

Solving Hard Analogy Questions with Relation Embedding Chains

使用关系嵌入链解决困难类比问题

DOI：
10.18653/v1/2023.emnlp-main.382
发表时间：
2023
期刊：
影响因子：
0
作者：
Kumar N
通讯作者：
Kumar N

Ultra-Fine Entity Typing with Prior Knowledge about Labels: A Simple Clustering Based Strategy

具有标签先验知识的超精细实体类型：基于简单聚类的策略

DOI：
10.18653/v1/2023.findings-emnlp.786
发表时间：
2023
期刊：
影响因子：
0
作者：
Li N
通讯作者：
Li N

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Steven Schockaert其他文献

Using social media to find places of interest: a case study

使用社交媒体寻找感兴趣的地方：案例研究

DOI：
10.1145/2442952.2442954
发表时间：
2012
期刊：
影响因子：
0
作者：
Steven Van Canneyt;O. Laere;Steven Schockaert;B. Dhoedt
通讯作者：
B. Dhoedt

Cardiff University at SemEval-2020 Task 6: Fine-tuning BERT for Domain-Specific Definition Classification

卡迪夫大学 SemEval-2020 任务 6：针对特定领域的定义分类微调 BERT

DOI：
发表时间：
2020
期刊：
International Workshop on Semantic Evaluation
影响因子：
0
作者：
Shelan S. Jeawak;Luis Espinosa Anke;Steven Schockaert
通讯作者：
Steven Schockaert

Possible and Necessary Answer Sets of Possibilistic Answer Set Programs

可能性答案集程序的可能和必要答案集

DOI：
10.1109/ictai.2012.117
发表时间：
2012
期刊：
2012 IEEE 24th International Conference on Tools with Artificial Intelligence
影响因子：
0
作者：
Kim Bauters;Steven Schockaert;M. D. Cock;D. Vermeir
通讯作者：
D. Vermeir