权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Unsupervised Background Knowledge for Language Understanding

语言理解的无监督背景知识

基本信息

批准号：
MR/T042001/1
负责人：
Jose Camacho Collados
金额：
$ 143.12万
依托单位：
CARDIFF UNIVERSITY
依托单位国家：
英国
项目类别：
Fellowship
财政年份：
2021
资助国家：
英国
起止时间：
2021 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=MR%2FT042001%2F1
关键词：
Unsupervised Background Knowledge Language Understanding

项目摘要

Significant progress in Artificial Intelligence (AI) has been made in recent years, and this has resulted in a huge expectation on what this technology can offer us in the future. However, there are still many challenges that must be addressed before this promise can be turned into a reality, and one of these challenges is Natural Language Processing (NLP). If a computer is ever to understand humans in a natural way and to demonstrate a level of intelligence that we would normally expect, then the problem of Language Understanding must be solved.Making computers to understand natural languages is a non-trivial task. Current approaches to language understanding rely on end-to-end supervised learning, exemplified by deep learning techniques in recent years. Typically, a corpus of relevant text is collected and then used to train the computer to perform a certain task. However, this approach may have several problems, e.g., the words extracted and used to train a computer often have implicit meanings and can be ambiguous. Consider the following two sentences, for example:(1) We found many birds during our visit to the zoo: eagles, parrots, cranes...(2) The crane was hurt and could barely move. A computer will not be able to understand from these training examples that there are in fact two types of crane (bird and machine) and the fact that only one type of crane (bird) can get hurt. It is widely recognised that handling word ambiguity and, more broadly, understanding what words mean, is a significant challenge in NLP. For instance, Google Translate, widely considered as the state-of-the-art in machine translation, fails to translate these two sentences correctly even to closely related languages such as Spanish. Generally speaking, current techniques are hard to generalize across different tasks and domains, especially in applications requiring language understanding. The proposed research intends to develop theories and novel solutions to bridge this gap by combining and leveraging lexical resources and unsupervised techniques for analysing text corpora, thereby learning the much-needed, but not-explicitly-available background knowledge. Our goal is then to seamlessly integrate this background knowledge into real-world applications for more accurate language understanding. We will exploit these techniques in different languages, making them directly applicable in important multilingual NLP tasks, including lower-resourced languages such as Welsh, and in domains with direct societal impact such as social media and health care.

近年来，人工智能（AI）取得了重大进展，这使得人们对这项技术在未来能为我们带来什么寄予了巨大的期望。然而，在这一承诺变为现实之前，仍有许多挑战必须解决，其中一个挑战是自然语言处理（NLP）。如果计算机能够以一种自然的方式理解人类，并展示出我们通常期望的智能水平，那么语言理解的问题必须得到解决。让计算机理解自然语言是一项不平凡的任务。当前的语言理解方法依赖于端到端的监督学习，近年来的深度学习技术就是一个例子。通常，收集相关文本的语料库，然后用于训练计算机执行特定任务。然而，这种方法可能有几个问题，例如，提取并用于训练计算机的单词通常具有隐含的含义，并且可能是模棱两可的。考虑下面的两个句子，例如：(1)我们在参观动物园时发现了许多鸟类：鹰、鹦鹉、鹤……起重机受了伤，几乎不能移动。计算机将无法从这些训练示例中理解实际上有两种类型的起重机（鸟和机器）以及只有一种类型的起重机（鸟）会受伤的事实。人们普遍认为，处理单词歧义，更广泛地说，理解单词的意思，是NLP中的一个重大挑战。例如，被公认为机器翻译领域最先进的谷歌Translate，即使将这两个句子翻译成西班牙语等密切相关的语言，也无法正确翻译。一般来说，当前的技术很难在不同的任务和领域之间进行泛化，特别是在需要语言理解的应用程序中。本研究旨在通过结合和利用词汇资源和无监督技术来分析文本语料库，从而学习急需但不明确可用的背景知识，从而发展理论和新的解决方案来弥补这一差距。我们的目标是将这些背景知识无缝地整合到现实世界的应用中，以获得更准确的语言理解。我们将在不同的语言中利用这些技术，使它们直接适用于重要的多语言NLP任务，包括资源较低的语言，如威尔士语，以及具有直接社会影响的领域，如社交媒体和医疗保健。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Robust Hate Speech Detection in Social Media: A Cross-Dataset Empirical Evaluation

社交媒体中强大的仇恨言论检测：跨数据集实证评估

DOI：
10.18653/v1/2023.woah-1.25
发表时间：
2023
期刊：
影响因子：
0
作者：
Antypas D
通讯作者：
Antypas D

Generative Language Models for Paragraph-Level Question Generation

DOI：
10.48550/arxiv.2210.03992
发表时间：
2022-10
期刊：
影响因子：
0
作者：
Asahi Ushio;Fernando Alva-Manchego;José Camacho-Collados
通讯作者：
Asahi Ushio;Fernando Alva-Manchego;José Camacho-Collados

Politics, Sentiment and Virality: A Large-Scale Multilingual Twitter Analysis in Greece, Spain and United Kingdom

DOI：
10.2139/ssrn.4166108
发表时间：
2022
期刊：
SSRN Electronic Journal
影响因子：
0
作者：
Dimosthenis Antypas;A. Preece;José Camacho-Collados
通讯作者：
Dimosthenis Antypas;A. Preece;José Camacho-Collados

Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences

评估语义空间中分布假设的局限性：基于特征的关系知识和共现的影响

DOI：
10.18653/v1/2022.starsem-1.15
发表时间：
2022
期刊：
影响因子：
0
作者：
Anderson M
通讯作者：
Anderson M

Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

DOI：
发表时间：
2021-11
期刊：
ArXiv
影响因子：
0
作者：
A. Edwards;Asahi Ushio;José Camacho-Collados;Hélène de Ribaupierre;A. Preece
通讯作者：
A. Edwards;Asahi Ushio;José Camacho-Collados;Hélène de Ribaupierre;A. Preece

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jose Camacho Collados其他文献

AI for Analyzing Mental Health Disorders Among Social Media Users: Quarter-Century Narrative Review of Progress and Challenges

用于分析社交媒体用户心理健康障碍的人工智能：对进展与挑战的 25 年叙述性回顾

DOI：
10.2196/59225
发表时间：
2024-01-01
期刊：
JOURNAL OF MEDICAL INTERNET RESEARCH
影响因子：
6.000
作者：
David Owen;Amy J Lynham;Sophie E Smart;Antonio F Pardiñas;Jose Camacho Collados
通讯作者：
Jose Camacho Collados

Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing

利用注释者在自然语言处理中的分歧进行联邦学习

DOI：
10.1162/tacl_a_00664
发表时间：
2024
期刊：
Transactions of the Association for Computational Linguistics
影响因子：
10.9
作者：
Nuria Rodríguez;Eugenio Martínez Cámara;Jose Camacho Collados;M. V. Luzón;Francisco Herrera
通讯作者：
Francisco Herrera