From trivial representations to learning concepts in AI by exploiting unique data
通过利用独特的数据,从琐碎的表示到学习人工智能中的概念
基本信息
- 批准号:EP/X017680/1
- 负责人:
- 金额:$ 25.78万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
The prospect of an AI-based revolution and its socio-economic benefits is tantalising. We want to live in a world where AI learns effectively with high performance and minimal risks. Such a world is extremely exciting. We tend to believe that AI learns higher level concepts from data, but this is not what happens. Particularly in data such as images, AI extracts rather trivial (low-level) notions from the data even when provided with millions of examples. We often hear that providing more data with high diversity should help improve the information that AI can extract. This data amassing does have though privacy and cost implications. Indeed, considerable cost comes also by the need to pre-process and to sanitise data (i.e. remove unwanted information). More critically, though, in several key applications (e.g. healthcare) some events (e.g. disease) can be rare or truly unique. Collecting more and more data will not change the relative frequency of such rare data. It appears that current AI is not data efficient: it poorly leverages the goldmine of information present in unique and rare data.This project aims to answer a key research question: **Why does AI struggle with concepts, and what is the role of unique data? **We suspect there are several reasons why AI struggles with concepts: A) The mechanisms we use to extract information from data (known as representation learning) rely on very simple assumptions that do not reflect how real data exist in the world. For example, we know that data have correlations, and we now make simplified assumptions of no correlation at all. We propose to introduce stronger assumptions of causal relationships in the concepts we want to extract. This should in turn help us extract better information. B) To learn any model, we do have to use optimisation processes to find the parameters of the model. We find a weakness in these processes: data that are unique and rare do not get so much attention, or if they do get some, it happens by chance. This leads to considerable inconsistency in the extraction of information. In addition, sometimes wrong information is extracted, either because we found suboptimal representations or because we latched on some data that escaped from the sanitisation process -since no such perfect process can always be guaranteed. We want to understand why such inconsistency exists and propose to devise methods that can ensure that when we train models, we can consistently extract information even from rare data.There is a tight connection between B and A. Without new methods that better optimise learning functions we cannot extract representations reliably from rare data, and hence we cannot impose the causal relationships we need. There is an additional element about this work that helps answer the second part of the question. Rare and unique data may actually reveal unique causal relationships. This is a very tantalising prospect that the work we propose aims to investigate. There are considerable and broad rewards of the work we propose. We put herein the underpinnings for an AI that, because it is data efficient, should not require blind amassing of data with all the privacy fears this engenders for the general public. Because it learns high-lever concepts it will be more adept to empower decision tools that can support how decisions have been reached. And because we introduce strong causal priors in extracting these concepts, we reduce the risk of learning trivial data associations. Overall, a major goal of the AI research community is to create AI that can generalise to new unseen data beyond what was available during training time. We hope that our AI will bring us closer to this goal, thus further paving the way to broader deployment of AI to the real world.
基于人工智能的革命及其社会经济效益的前景是诱人的。我们希望生活在一个人工智能以高性能和最小风险有效学习的世界中。这样的世界是非常令人兴奋的。我们倾向于认为人工智能从数据中学习更高层次的概念,但事实并非如此。特别是在图像等数据中,即使提供了数百万个示例,人工智能也会从数据中提取相当琐碎(低级)的概念。我们经常听到,提供更多具有高度多样性的数据应该有助于改善人工智能可以提取的信息。这种数据积累确实有隐私和成本方面的影响。事实上,相当大的成本也来自于对数据的预处理和净化(即删除不需要的信息)的需要。然而,更关键的是,在几个关键应用(例如医疗保健)中,一些事件(例如疾病)可能是罕见的或真正独特的。收集越来越多的数据不会改变这种罕见数据的相对频率。目前的人工智能似乎并不具有数据效率:它无法充分利用独特和稀有数据中存在的信息金矿。该项目旨在回答一个关键的研究问题:** 为什么人工智能与概念斗争,独特数据的作用是什么?** 我们怀疑人工智能在概念上遇到困难有几个原因:A)我们用来从数据中提取信息的机制(称为表征学习)依赖于非常简单的假设,这些假设并不能反映世界上真实的数据是如何存在的。例如,我们知道数据具有相关性,现在我们简化了根本没有相关性的假设。我们建议在我们想要提取的概念中引入更强的因果关系假设。这反过来又有助于我们获得更好的信息。B)为了学习任何模型,我们必须使用优化过程来找到模型的参数。我们在这些过程中发现了一个弱点:独特和罕见的数据不会得到如此多的关注,或者即使它们得到了一些关注,也是偶然发生的。这导致在提取信息方面存在相当大的不一致性。此外,有时会提取错误的信息,要么是因为我们发现了次优的表示,要么是因为我们锁定了一些从消毒过程中逃逸的数据-因为没有这样完美的过程总是可以保证的。我们想了解为什么存在这种不一致性,并提出设计方法,以确保当我们训练模型时,即使从罕见的数据中,我们也可以始终如一地提取信息。如果没有更好地优化学习函数的新方法,我们就无法从稀有数据中可靠地提取表征,因此我们无法强加我们需要的因果关系。关于这项工作,还有一个额外的因素有助于回答问题的第二部分。罕见和独特的数据实际上可能揭示独特的因果关系。这是一个非常诱人的前景,我们提出的工作旨在调查。我们提出的工作有相当大的和广泛的回报。我们在这里提出了人工智能的基础,因为它是数据高效的,不应该需要盲目积累数据,这会给公众带来所有的隐私恐惧。因为它学习了高级概念,它将更善于授权决策工具,以支持如何做出决策。由于我们在提取这些概念时引入了强因果先验,我们降低了学习琐碎数据关联的风险。总的来说,人工智能研究社区的一个主要目标是创建能够概括训练期间可用数据之外的新数据的人工智能。我们希望我们的人工智能将使我们更接近这一目标,从而进一步为人工智能在真实的世界中的更广泛部署铺平道路。
项目成果
期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Unveiling Fairness Biases in Deep Learning-Based Brain MRI Reconstruction
揭示基于深度学习的脑 MRI 重建中的公平偏差
- DOI:10.48550/arxiv.2309.14392
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Du Y
- 通讯作者:Du Y
Deep Generative Models - Third MICCAI Workshop, DGM4MICCAI 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings
深度生成模型 - 第三届 MICCAI 研讨会,DGM4MICCAI 2023,与 MICCAI 2023 同期举行,加拿大不列颠哥伦比亚省温哥华,2023 年 10 月 8 日,会议记录
- DOI:10.1007/978-3-031-53767-7_1
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Fernandez V
- 通讯作者:Fernandez V
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sotirios Tsaftaris其他文献
Towards joint segmentation and registration of the myocardium in CP-BOLD MRI at rest
- DOI:
10.1186/1532-429x-18-s1-w34 - 发表时间:
2016-01-27 - 期刊:
- 影响因子:
- 作者:
Ilkay Oksuz;Rohan Dharmakumar;Sotirios Tsaftaris - 通讯作者:
Sotirios Tsaftaris
Towards reliable myocardial blood-oxygen-level-dependent (BOLD) CMR using late effects of regadenoson with simultaneous <sup>13</sup>n-ammonia pet validation in a whole-body hybrid PET/MR system
- DOI:
10.1186/1532-429x-18-s1-o19 - 发表时间:
2016-01-27 - 期刊:
- 影响因子:
- 作者:
Hsin-Jung Yang;Damini Dey;Jane M Sykes;John Butler;Behzad Sharif;Debiao Li;Sotirios Tsaftaris;Piotr Slomka;Frank S Prato;Rohan Dharmakumar - 通讯作者:
Rohan Dharmakumar
BOLD contrast: A challenge for cardiac image analysis
- DOI:
10.1186/1532-429x-18-s1-w27 - 发表时间:
2016-01-27 - 期刊:
- 影响因子:
- 作者:
Ilkay Oksuz;Marco Bevilacqua;Anirban Mukhopadhyay;Rohan Dharmakumar;Sotirios Tsaftaris - 通讯作者:
Sotirios Tsaftaris
A virtual power plant for coordinating batteries and EVs of distributed zero-energy houses considering the distribution system constraints
- DOI:
10.1016/j.est.2024.114905 - 发表时间:
2025-01-15 - 期刊:
- 影响因子:
- 作者:
Mohammed Qais;Desen Kirli;Edward Moroshko;Aristides Kiprakis;Sotirios Tsaftaris - 通讯作者:
Sotirios Tsaftaris
Dictionary learning for unsupervised identification of ischemic territories in CP-BOLD Cardiac MRI at rest
- DOI:
10.1186/1532-429x-17-s1-q13 - 发表时间:
2015-02-03 - 期刊:
- 影响因子:
- 作者:
Marco Bevilacqua;Cristian Rusu;Rohan Dharmakumar;Sotirios Tsaftaris - 通讯作者:
Sotirios Tsaftaris
Sotirios Tsaftaris的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sotirios Tsaftaris', 18)}}的其他基金
CHAI - EPSRC AI Hub for Causality in Healthcare AI with Real Data
CHAI - EPSRC AI 中心,利用真实数据研究医疗保健 AI 中的因果关系
- 批准号:
EP/Y028856/1 - 财政年份:2024
- 资助金额:
$ 25.78万 - 项目类别:
Research Grant
CardiacA.I.: Machine learning for the analysis of multimodal cardiac MR images used in the diagnosis of coronary heart disease
CardiacA.I.:用于分析诊断冠心病的多模态心脏 MR 图像的机器学习
- 批准号:
EP/P022928/1 - 财政年份:2017
- 资助金额:
$ 25.78万 - 项目类别:
Research Grant
相似海外基金
CIF: Small: Learning Low-Dimensional Representations with Heteroscedastic Data Sources
CIF:小:使用异方差数据源学习低维表示
- 批准号:
2331590 - 财政年份:2024
- 资助金额:
$ 25.78万 - 项目类别:
Standard Grant
Career: Learning Multimodal Representations of the Physical World
职业:学习物理世界的多模态表示
- 批准号:
2339071 - 财政年份:2024
- 资助金额:
$ 25.78万 - 项目类别:
Continuing Grant
Multiple Representations of Learning in Dynamics and Control: Exploring the Synergy of Low-Cost Portable Lab Equipment, Virtual Labs, and AI within Student Learning Activities
动力学和控制中学习的多重表示:探索低成本便携式实验室设备、虚拟实验室和人工智能在学生学习活动中的协同作用
- 批准号:
2336998 - 财政年份:2024
- 资助金额:
$ 25.78万 - 项目类别:
Standard Grant
CAREER: Towards Trustworthy Machine Learning via Learning Trustworthy Representations: An Information-Theoretic Framework
职业:通过学习可信表示实现可信机器学习:信息理论框架
- 批准号:
2339686 - 财政年份:2024
- 资助金额:
$ 25.78万 - 项目类别:
Continuing Grant
Frontocortical representations of amygdala-mediated learning under uncertainty
不确定性下杏仁核介导的学习的额皮质表征
- 批准号:
10825354 - 财政年份:2024
- 资助金额:
$ 25.78万 - 项目类别:
Collaborative Research: ECCS: Small: Personalized RF Sensing: Learning Optimal Representations of Human Activities and Ethogram on the Fly
合作研究:ECCS:小型:个性化射频传感:学习人类活动的最佳表示和动态行为图
- 批准号:
2233503 - 财政年份:2023
- 资助金额:
$ 25.78万 - 项目类别:
Standard Grant
Dissecting the functional organization of local hippocampal circuits underlying spatial representations
剖析空间表征下局部海马回路的功能组织
- 批准号:
10590363 - 财政年份:2023
- 资助金额:
$ 25.78万 - 项目类别:
NSF-BSF: Learning the concept of Dynamic Equilibrium across disciplines with SystEms Augmented Mechanistic Representations
NSF-BSF:通过 SystEms 增强机械表示学习跨学科动态平衡的概念
- 批准号:
2240216 - 财政年份:2023
- 资助金额:
$ 25.78万 - 项目类别:
Standard Grant
Task Representations in Ventral Tegmental Area Dopamine Neurons across Shifts in Behavioral Strategy and Reward Expectation
腹侧被盖区多巴胺神经元的任务表征跨越行为策略和奖励期望的转变
- 批准号:
10679825 - 财政年份:2023
- 资助金额:
$ 25.78万 - 项目类别:
The geometry of neural representations reflecting abstraction in humans
反映人类抽象的神经表征的几何形状
- 批准号:
10682315 - 财政年份:2023
- 资助金额:
$ 25.78万 - 项目类别: