From trivial representations to learning concepts in AI by exploiting unique data

通过利用独特的数据,从琐碎的表示到学习人工智能中的概念

基本信息

  • 批准号:
    EP/X017680/1
  • 负责人:
  • 金额:
    $ 25.78万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2023
  • 资助国家:
    英国
  • 起止时间:
    2023 至 无数据
  • 项目状态:
    未结题

项目摘要

The prospect of an AI-based revolution and its socio-economic benefits is tantalising. We want to live in a world where AI learns effectively with high performance and minimal risks. Such a world is extremely exciting. We tend to believe that AI learns higher level concepts from data, but this is not what happens. Particularly in data such as images, AI extracts rather trivial (low-level) notions from the data even when provided with millions of examples. We often hear that providing more data with high diversity should help improve the information that AI can extract. This data amassing does have though privacy and cost implications. Indeed, considerable cost comes also by the need to pre-process and to sanitise data (i.e. remove unwanted information). More critically, though, in several key applications (e.g. healthcare) some events (e.g. disease) can be rare or truly unique. Collecting more and more data will not change the relative frequency of such rare data. It appears that current AI is not data efficient: it poorly leverages the goldmine of information present in unique and rare data.This project aims to answer a key research question: **Why does AI struggle with concepts, and what is the role of unique data? **We suspect there are several reasons why AI struggles with concepts: A) The mechanisms we use to extract information from data (known as representation learning) rely on very simple assumptions that do not reflect how real data exist in the world. For example, we know that data have correlations, and we now make simplified assumptions of no correlation at all. We propose to introduce stronger assumptions of causal relationships in the concepts we want to extract. This should in turn help us extract better information. B) To learn any model, we do have to use optimisation processes to find the parameters of the model. We find a weakness in these processes: data that are unique and rare do not get so much attention, or if they do get some, it happens by chance. This leads to considerable inconsistency in the extraction of information. In addition, sometimes wrong information is extracted, either because we found suboptimal representations or because we latched on some data that escaped from the sanitisation process -since no such perfect process can always be guaranteed. We want to understand why such inconsistency exists and propose to devise methods that can ensure that when we train models, we can consistently extract information even from rare data.There is a tight connection between B and A. Without new methods that better optimise learning functions we cannot extract representations reliably from rare data, and hence we cannot impose the causal relationships we need. There is an additional element about this work that helps answer the second part of the question. Rare and unique data may actually reveal unique causal relationships. This is a very tantalising prospect that the work we propose aims to investigate. There are considerable and broad rewards of the work we propose. We put herein the underpinnings for an AI that, because it is data efficient, should not require blind amassing of data with all the privacy fears this engenders for the general public. Because it learns high-lever concepts it will be more adept to empower decision tools that can support how decisions have been reached. And because we introduce strong causal priors in extracting these concepts, we reduce the risk of learning trivial data associations. Overall, a major goal of the AI research community is to create AI that can generalise to new unseen data beyond what was available during training time. We hope that our AI will bring us closer to this goal, thus further paving the way to broader deployment of AI to the real world.
基于AI的革命及其社会经济利益的前景正在诱人。我们希望生活在AI有效地学习高性能和最小风险的世界中。这样的世界非常令人兴奋。我们倾向于相信AI从数据中学习了更高级别的概念,但这不是发生的事情。特别是在图像之类的数据中,即使提供了数百万个示例,AI也从数据中提取了相当微不足道的(低级)概念。我们经常听到,提供更多具有高度多样性的数据应有助于改善AI可以提取的信息。这些数据积累确实具有隐私和成本含义。实际上,还需要进行巨大的成本,这是需要预处理和消毒数据(即删除不需要的信息)。但是,更重要的是,在一些关键应用(例如医疗保健)中,某些事件(例如疾病)可能是罕见或真正独特的。收集越来越多的数据不会改变这种罕见数据的相对频率。看来当前的AI并非数据有效:它利用了独特和稀有数据中存在的信息的金矿。本项目旨在回答一个关键的研究问题:** AI为什么在概念上挣扎,而独特数据的作用是什么? **我们怀疑AI与概念斗争的原因有几个原因:a)我们用来从数据中提取信息的机制(称为表示学习)依赖于不反映世界上真实数据存在的非常简单的假设。例如,我们知道数据具有相关性,现在我们完全做出了无相关性的简化假设。我们建议在要提取的概念中介绍有关因果关系的更牢固的假设。这反过来应该帮助我们提取更好的信息。 b)要学习任何模型,我们必须使用优化过程来找到模型的参数。我们在这些过程中发现了一个弱点:独特而罕见的数据不会得到太多关注,或者如果确实得到了一些,则会偶然发生。这导致信息提取的不一致。此外,有时会提取错误的信息,要么是因为我们发现了次优表示,要么是因为我们锁定了从卫理过程中逃脱的一些数据,因此始终不能保证这样的完美过程。我们想了解为什么存在这种不一致的情况,并建议设计方法,以确保训练模型时,我们甚至可以从稀有数据中始终如一地提取信息。b和A之间存在紧密的联系。如果没有新的方法来更好地优化学习功能,我们就无法从稀有数据中可靠地提取表示形式,因此我们不能构成因果关系,我们需要我们需要。这项工作还有一个其他要素,有助于回答问题的第二部分。稀有和独特的数据实际上可能揭示了独特的因果关系。我们建议的工作旨在调查这是一个非常诱人的前景。我们提出的工作有很大的回报。我们将其放置在此处的AI基础上,因为它有效地数据,因此不需要盲目地积累数据,而所有隐私都担心这会引起公众。因为它了解了高杠杆概念,因此更擅长赋予可以支持决策的决策工具。而且由于我们在提取这些概念时引入了强大的因果先验,因此我们降低了学习微不足道数据关联的风险。总体而言,AI研究社区的主要目标是创建AI,可以推广到培训时间期间可用的数据以外的新数据。我们希望我们的人工智能能使我们更接近这个目标,从而进一步为更广泛的AI部署到现实世界铺平道路。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Unveiling Fairness Biases in Deep Learning-Based Brain MRI Reconstruction
揭示基于深度学习的脑 MRI 重建中的公平偏差
  • DOI:
    10.48550/arxiv.2309.14392
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Du Y
  • 通讯作者:
    Du Y
Deep Generative Models - Third MICCAI Workshop, DGM4MICCAI 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings
深度生成模型 - 第三届 MICCAI 研讨会,DGM4MICCAI 2023,与 MICCAI 2023 同期举行,加拿大不列颠哥伦比亚省温哥华,2023 年 10 月 8 日,会议记录
  • DOI:
    10.1007/978-3-031-53767-7_1
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Fernandez V
  • 通讯作者:
    Fernandez V
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sotirios Tsaftaris其他文献

Towards joint segmentation and registration of the myocardium in CP-BOLD MRI at rest
  • DOI:
    10.1186/1532-429x-18-s1-w34
  • 发表时间:
    2016-01-27
  • 期刊:
  • 影响因子:
  • 作者:
    Ilkay Oksuz;Rohan Dharmakumar;Sotirios Tsaftaris
  • 通讯作者:
    Sotirios Tsaftaris
Towards reliable myocardial blood-oxygen-level-dependent (BOLD) CMR using late effects of regadenoson with simultaneous <sup>13</sup>n-ammonia pet validation in a whole-body hybrid PET/MR system
  • DOI:
    10.1186/1532-429x-18-s1-o19
  • 发表时间:
    2016-01-27
  • 期刊:
  • 影响因子:
  • 作者:
    Hsin-Jung Yang;Damini Dey;Jane M Sykes;John Butler;Behzad Sharif;Debiao Li;Sotirios Tsaftaris;Piotr Slomka;Frank S Prato;Rohan Dharmakumar
  • 通讯作者:
    Rohan Dharmakumar
BOLD contrast: A challenge for cardiac image analysis
  • DOI:
    10.1186/1532-429x-18-s1-w27
  • 发表时间:
    2016-01-27
  • 期刊:
  • 影响因子:
  • 作者:
    Ilkay Oksuz;Marco Bevilacqua;Anirban Mukhopadhyay;Rohan Dharmakumar;Sotirios Tsaftaris
  • 通讯作者:
    Sotirios Tsaftaris
A virtual power plant for coordinating batteries and EVs of distributed zero-energy houses considering the distribution system constraints
  • DOI:
    10.1016/j.est.2024.114905
  • 发表时间:
    2025-01-15
  • 期刊:
  • 影响因子:
  • 作者:
    Mohammed Qais;Desen Kirli;Edward Moroshko;Aristides Kiprakis;Sotirios Tsaftaris
  • 通讯作者:
    Sotirios Tsaftaris
Dictionary learning for unsupervised identification of ischemic territories in CP-BOLD Cardiac MRI at rest
  • DOI:
    10.1186/1532-429x-17-s1-q13
  • 发表时间:
    2015-02-03
  • 期刊:
  • 影响因子:
  • 作者:
    Marco Bevilacqua;Cristian Rusu;Rohan Dharmakumar;Sotirios Tsaftaris
  • 通讯作者:
    Sotirios Tsaftaris

Sotirios Tsaftaris的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sotirios Tsaftaris', 18)}}的其他基金

CHAI - EPSRC AI Hub for Causality in Healthcare AI with Real Data
CHAI - EPSRC AI 中心,利用真实数据研究医疗保健 AI 中的因果关系
  • 批准号:
    EP/Y028856/1
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Research Grant
CardiacA.I.: Machine learning for the analysis of multimodal cardiac MR images used in the diagnosis of coronary heart disease
CardiacA.I.:用于分析诊断冠心病的多模态心脏 MR 图像的机器学习
  • 批准号:
    EP/P022928/1
  • 财政年份:
    2017
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Research Grant

相似国自然基金

小样本条件下的异质信息网络表示学习与应用
  • 批准号:
    62306322
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
图表示学习辅助的精准可信药物推荐研究
  • 批准号:
    62306014
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于动作表示与生成模型及人类反馈强化学习的智能运动教练研究
  • 批准号:
    62373183
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
治疗肽识别的BERT表示学习机制研究
  • 批准号:
    62371318
  • 批准年份:
    2023
  • 资助金额:
    49.00 万元
  • 项目类别:
    面上项目
基于表示学习的蛋白质翻译后修饰与疾病的关联预测
  • 批准号:
    62302198
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CIF: Small: Learning Low-Dimensional Representations with Heteroscedastic Data Sources
CIF:小:使用异方差数据源学习低维表示
  • 批准号:
    2331590
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Standard Grant
Career: Learning Multimodal Representations of the Physical World
职业:学习物理世界的多模态表示
  • 批准号:
    2339071
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Continuing Grant
Multiple Representations of Learning in Dynamics and Control: Exploring the Synergy of Low-Cost Portable Lab Equipment, Virtual Labs, and AI within Student Learning Activities
动力学和控制中学习的多重表示:探索低成本便携式实验室设备、虚拟实验室和人工智能在学生学习活动中的协同作用
  • 批准号:
    2336998
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Standard Grant
CAREER: Towards Trustworthy Machine Learning via Learning Trustworthy Representations: An Information-Theoretic Framework
职业:通过学习可信表示实现可信机器学习:信息理论框架
  • 批准号:
    2339686
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
    Continuing Grant
Frontocortical representations of amygdala-mediated learning under uncertainty
不确定性下杏仁核介导的学习的额皮质表征
  • 批准号:
    10825354
  • 财政年份:
    2024
  • 资助金额:
    $ 25.78万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了