权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

From trivial representations to learning concepts in AI by exploiting unique data

通过利用独特的数据，从琐碎的表示到学习人工智能中的概念

基本信息

批准号：
EP/X017680/1
负责人：
Sotirios Tsaftaris
金额：
$ 25.78万
依托单位：
University of Edinburgh
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX017680%2F1
关键词：
trivial representations learning concepts AI

项目摘要

The prospect of an AI-based revolution and its socio-economic benefits is tantalising. We want to live in a world where AI learns effectively with high performance and minimal risks. Such a world is extremely exciting. We tend to believe that AI learns higher level concepts from data, but this is not what happens. Particularly in data such as images, AI extracts rather trivial (low-level) notions from the data even when provided with millions of examples. We often hear that providing more data with high diversity should help improve the information that AI can extract. This data amassing does have though privacy and cost implications. Indeed, considerable cost comes also by the need to pre-process and to sanitise data (i.e. remove unwanted information). More critically, though, in several key applications (e.g. healthcare) some events (e.g. disease) can be rare or truly unique. Collecting more and more data will not change the relative frequency of such rare data. It appears that current AI is not data efficient: it poorly leverages the goldmine of information present in unique and rare data.This project aims to answer a key research question: **Why does AI struggle with concepts, and what is the role of unique data? **We suspect there are several reasons why AI struggles with concepts: A) The mechanisms we use to extract information from data (known as representation learning) rely on very simple assumptions that do not reflect how real data exist in the world. For example, we know that data have correlations, and we now make simplified assumptions of no correlation at all. We propose to introduce stronger assumptions of causal relationships in the concepts we want to extract. This should in turn help us extract better information. B) To learn any model, we do have to use optimisation processes to find the parameters of the model. We find a weakness in these processes: data that are unique and rare do not get so much attention, or if they do get some, it happens by chance. This leads to considerable inconsistency in the extraction of information. In addition, sometimes wrong information is extracted, either because we found suboptimal representations or because we latched on some data that escaped from the sanitisation process -since no such perfect process can always be guaranteed. We want to understand why such inconsistency exists and propose to devise methods that can ensure that when we train models, we can consistently extract information even from rare data.There is a tight connection between B and A. Without new methods that better optimise learning functions we cannot extract representations reliably from rare data, and hence we cannot impose the causal relationships we need. There is an additional element about this work that helps answer the second part of the question. Rare and unique data may actually reveal unique causal relationships. This is a very tantalising prospect that the work we propose aims to investigate. There are considerable and broad rewards of the work we propose. We put herein the underpinnings for an AI that, because it is data efficient, should not require blind amassing of data with all the privacy fears this engenders for the general public. Because it learns high-lever concepts it will be more adept to empower decision tools that can support how decisions have been reached. And because we introduce strong causal priors in extracting these concepts, we reduce the risk of learning trivial data associations. Overall, a major goal of the AI research community is to create AI that can generalise to new unseen data beyond what was available during training time. We hope that our AI will bring us closer to this goal, thus further paving the way to broader deployment of AI to the real world.

基于人工智能的革命及其社会经济效益的前景是诱人的。我们希望生活在一个人工智能以高性能和最小风险有效学习的世界中。这样的世界是非常令人兴奋的。我们倾向于认为人工智能从数据中学习更高层次的概念，但事实并非如此。特别是在图像等数据中，即使提供了数百万个示例，人工智能也会从数据中提取相当琐碎（低级）的概念。我们经常听到，提供更多具有高度多样性的数据应该有助于改善人工智能可以提取的信息。这种数据积累确实有隐私和成本方面的影响。事实上，相当大的成本也来自于对数据的预处理和净化（即删除不需要的信息）的需要。然而，更关键的是，在几个关键应用（例如医疗保健）中，一些事件（例如疾病）可能是罕见的或真正独特的。收集越来越多的数据不会改变这种罕见数据的相对频率。目前的人工智能似乎并不具有数据效率：它无法充分利用独特和稀有数据中存在的信息金矿。该项目旨在回答一个关键的研究问题：** 为什么人工智能与概念斗争，独特数据的作用是什么？** 我们怀疑人工智能在概念上遇到困难有几个原因：A）我们用来从数据中提取信息的机制（称为表征学习）依赖于非常简单的假设，这些假设并不能反映世界上真实的数据是如何存在的。例如，我们知道数据具有相关性，现在我们简化了根本没有相关性的假设。我们建议在我们想要提取的概念中引入更强的因果关系假设。这反过来又有助于我们获得更好的信息。B）为了学习任何模型，我们必须使用优化过程来找到模型的参数。我们在这些过程中发现了一个弱点：独特和罕见的数据不会得到如此多的关注，或者即使它们得到了一些关注，也是偶然发生的。这导致在提取信息方面存在相当大的不一致性。此外，有时会提取错误的信息，要么是因为我们发现了次优的表示，要么是因为我们锁定了一些从消毒过程中逃逸的数据-因为没有这样完美的过程总是可以保证的。我们想了解为什么存在这种不一致性，并提出设计方法，以确保当我们训练模型时，即使从罕见的数据中，我们也可以始终如一地提取信息。如果没有更好地优化学习函数的新方法，我们就无法从稀有数据中可靠地提取表征，因此我们无法强加我们需要的因果关系。关于这项工作，还有一个额外的因素有助于回答问题的第二部分。罕见和独特的数据实际上可能揭示独特的因果关系。这是一个非常诱人的前景，我们提出的工作旨在调查。我们提出的工作有相当大的和广泛的回报。我们在这里提出了人工智能的基础，因为它是数据高效的，不应该需要盲目积累数据，这会给公众带来所有的隐私恐惧。因为它学习了高级概念，它将更善于授权决策工具，以支持如何做出决策。由于我们在提取这些概念时引入了强因果先验，我们降低了学习琐碎数据关联的风险。总的来说，人工智能研究社区的一个主要目标是创建能够概括训练期间可用数据之外的新数据的人工智能。我们希望我们的人工智能将使我们更接近这一目标，从而进一步为人工智能在真实的世界中的更广泛部署铺平道路。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Unveiling Fairness Biases in Deep Learning-Based Brain MRI Reconstruction

揭示基于深度学习的脑 MRI 重建中的公平偏差

DOI：
10.48550/arxiv.2309.14392
发表时间：
2023
期刊：
影响因子：
0
作者：
Du Y
通讯作者：
Du Y

Deep Generative Models - Third MICCAI Workshop, DGM4MICCAI 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings

深度生成模型 - 第三届 MICCAI 研讨会，DGM4MICCAI 2023，与 MICCAI 2023 同期举行，加拿大不列颠哥伦比亚省温哥华，2023 年 10 月 8 日，会议记录

DOI：
10.1007/978-3-031-53767-7_1
发表时间：
2024
期刊：
影响因子：
0
作者：
Fernandez V
通讯作者：
Fernandez V

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sotirios Tsaftaris其他文献

Towards joint segmentation and registration of the myocardium in CP-BOLD MRI at rest

DOI：
10.1186/1532-429x-18-s1-w34
发表时间：
2016-01-27
期刊：
Research article
影响因子：
作者：
Ilkay Oksuz;Rohan Dharmakumar;Sotirios Tsaftaris
通讯作者：
Sotirios Tsaftaris

Towards reliable myocardial blood-oxygen-level-dependent (BOLD) CMR using late effects of regadenoson with simultaneous <sup>13</sup>n-ammonia pet validation in a whole-body hybrid PET/MR system

DOI：
10.1186/1532-429x-18-s1-o19
发表时间：
2016-01-27
期刊：
Short communication
影响因子：
作者：
Hsin-Jung Yang;Damini Dey;Jane M Sykes;John Butler;Behzad Sharif;Debiao Li;Sotirios Tsaftaris;Piotr Slomka;Frank S Prato;Rohan Dharmakumar
通讯作者：
Rohan Dharmakumar

BOLD contrast: A challenge for cardiac image analysis

DOI：
10.1186/1532-429x-18-s1-w27
发表时间：
2016-01-27
期刊：
Research article
影响因子：
作者：
Ilkay Oksuz;Marco Bevilacqua;Anirban Mukhopadhyay;Rohan Dharmakumar;Sotirios Tsaftaris
通讯作者：
Sotirios Tsaftaris

A virtual power plant for coordinating batteries and EVs of distributed zero-energy houses considering the distribution system constraints

DOI：
10.1016/j.est.2024.114905
发表时间：
2025-01-15
期刊：
Research article
影响因子：
作者：
Mohammed Qais;Desen Kirli;Edward Moroshko;Aristides Kiprakis;Sotirios Tsaftaris
通讯作者：
Sotirios Tsaftaris