权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Enhanced AI Perception through Unified Joint Embedding of Multimodal Sensory Data

通过多模态传感数据的统一联合嵌入增强人工智能感知

基本信息

批准号：
2874479
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2874479
关键词：
Enhanced AI Perception through Unified

项目摘要

Artificial Intelligence (AI), a field aiming to create machines that can think and learn like humans, and its subset, Deep Learning, which uses multi-layered neural networks to represent human brains, rely heavily on large collections of labeled data to train the machines. This dependency often hinders their application in dynamic, real-world scenarios. In contrast, humans natively process and intertwine multiple senses - from hearing the symphony of urban sounds to feeling the object's fine textures and interpreting distances visually. This natural ability to blend our senses and interpret our surroundings can potentially be the missing link in AI's evolution. This raises the central research question: can the integration of multisensory data close the gap between human cognition and machine learning, so that machines can learn more from natural sensory experiences and less from extensive labeled data?The objectives of this research are as follows. First, this research aims to develop multimodal computational models capable of extracting structures from diverse sensory inputs. To address the scarcity of labeled multimodal datasets, we leverage naturally occurring paired data to distinguish between modality-specific information and integrate them, drawing inspiration from human learning processes initiated with audio-visual cues (e.g., we match the sound of a specific bird to its photo). Second, this research seeks to extend AI's perception range by incorporating novel sensory modalities, including thermal data, tactile signals, and spatial depth. This integration aims to augment AI's perceptual range beyond the conventional modalities of vision and audio. Third, this research aims to establish a holistic perception system. The developed multimodal computational model will be trained to process a wide array of sensory modalities concurrently, covering text and visual cues, auditory signals, spatial depth, thermal sensations, IMU readings, tactile signals, etc.The novelty of the research methodology is as follows. First, this research uses contrastive learning. This technique helps models to find similarities and differences across data points from different modalities. Consequently, the models can correlate patterns and build connections across multiple modalities. Second, this research explores and associates novel modality pairs such as visual-depth and visual-touch, which have not been extensively researched before. Moreover, this research aims to move beyond the traditional dual-modality embeddings and develops a unified embedding landscape from diverse naturally co-occurring modalities. In this context, embeddings are essential mathematical representations that simplify complex data for machine interpretation. By leveraging state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs), we aim to identify features specific to each rare modality, match the pairs, and establish connections between them. This is facilitated by the zero-shot learning capabilities of LLMs/VLMs, a feature that enables models to interpret and execute tasks they have never encountered before. Consequently, the model can better encode information from various sensory modalities into a unified, joint embedded space.This project falls within the EPSRC's Artificial Intelligence Technologies research area. We aspire to improve AI's perception through the exploration of joint embedding of multimodal sensory data. Our refined approach aims to develop a more interconnected and enriched embedded space, enhancing its adaptivity to a variety of downstream tasks. For example, in the field of robotic automation, the unified, joint embedding derived from this research have the potential to significantly improve robots' perception and revolutionize operational efficiency across different scenarios.

人工智能（AI）是一个旨在创造能够像人类一样思考和学习的机器的领域，它的子集深度学习（Deep Learning）使用多层神经网络来代表人类大脑，严重依赖于大量标记数据来训练机器。这种依赖关系通常会阻碍它们在动态、真实场景中的应用。相比之下，人类天生处理和交织多种感官——从听到城市声音的交响乐，到感受物体的精细纹理，再到视觉上解读距离。这种融合我们的感官和解读周围环境的自然能力可能是人工智能进化中缺失的一环。这就提出了一个核心的研究问题：多感官数据的整合能否缩小人类认知和机器学习之间的差距，从而使机器能够更多地从自然感官体验中学习，而从大量标记数据中学习得更少？本研究的目的如下：首先，本研究旨在开发能够从不同感官输入中提取结构的多模态计算模型。为了解决标记的多模态数据集的稀缺性，我们利用自然发生的成对数据来区分特定模态的信息并整合它们，从由视听线索发起的人类学习过程中汲取灵感（例如，我们将特定鸟类的声音与其照片相匹配）。其次，本研究试图通过结合新的感官模式（包括热数据、触觉信号和空间深度）来扩展人工智能的感知范围。这种整合旨在扩大人工智能的感知范围，超越视觉和音频的传统模式。第三，本研究旨在建立一个整体的感知系统。开发的多模态计算模型将被训练来同时处理广泛的感官模式，包括文本和视觉线索、听觉信号、空间深度、热感觉、IMU读数、触觉信号等。研究方法的新颖性如下。首先，本研究使用了对比学习。该技术帮助模型发现来自不同模态的数据点之间的相似性和差异性。因此，模型可以关联模式并跨多个模态建立连接。其次，本研究探索并关联了以往未被广泛研究的新的情态对，如视觉深度和视觉触觉。此外，本研究旨在超越传统的双模态嵌入，从多种自然共存的模式中发展出统一的嵌入景观。在这种情况下，嵌入是为机器解释简化复杂数据的基本数学表示。通过利用最先进的大型语言模型（llm）和视觉语言模型（vlm），我们的目标是识别特定于每种罕见模态的特征，匹配它们对，并在它们之间建立联系。llm / vlm的零学习能力促进了这一点，该功能使模型能够解释和执行它们以前从未遇到过的任务。因此，该模型可以更好地将来自各种感觉模态的信息编码到统一的、联合的嵌入空间中。该项目属于EPSRC的人工智能技术研究领域。我们渴望通过探索多模态感知数据的联合嵌入来提高人工智能的感知能力。我们的改进方法旨在开发一个更加互联和丰富的嵌入式空间，增强其对各种下游任务的适应性。例如，在机器人自动化领域，本研究衍生的统一、联合嵌入有可能显著提高机器人的感知能力，并彻底改变不同场景下的操作效率。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

其他文献

吉治仁志他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)

Hitoshi Yoshiji 等：“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

LiDAR Implementations for Autonomous Vehicle Applications

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
通讯作者：

生命分子工学・海洋生命工学研究室

生物分子工程/海洋生物技术实验室

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

吉治仁志他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)

Hitoshi Yoshiji 等人：“血管医学与科学系列分子医学图解”Yodosha（涉谷正志编辑）125（2000）。

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)

钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响：“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,

DOI：
发表时间：
期刊：
影响因子：
0
作者：
通讯作者：

的其他文献

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

立即体验

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers

用于实时测量循环生物标志物的植入式生物传感器微系统

批准号：
2901954
财政年份：
2028
资助金额：
--
项目类别：
Studentship

Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions

利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案

批准号：
2896097
财政年份：
2027
资助金额：
--
项目类别：
Studentship

A Robot that Swims Through Granular Materials

可以在颗粒材料中游动的机器人

批准号：
2780268
财政年份：
2027
资助金额：
--
项目类别：
Studentship

Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.

严重空间天气事件对核电和保障监督的恢复力的可能性和影响。

批准号：
2908918
财政年份：
2027
资助金额：
--
项目类别：
Studentship

Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface

质子、α 和 γ 辐照辅助应力腐蚀开裂：了解燃料-不锈钢界面

批准号：
2908693
财政年份：
2027
资助金额：
--
项目类别：
Studentship

Field Assisted Sintering of Nuclear Fuel Simulants

核燃料模拟物的现场辅助烧结

批准号：
2908917
财政年份：
2027
资助金额：
--
项目类别：
Studentship

Assessment of new fatigue capable titanium alloys for aerospace applications

评估用于航空航天应用的新型抗疲劳钛合金

批准号：
2879438
财政年份：
2027
资助金额：
--
项目类别：
Studentship

Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in

使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型，以分析白细胞介素 17 抑制剂的细胞和表观遗传效应

批准号：
2890513
财政年份：
2027
资助金额：
--
项目类别：
Studentship

CDT year 1 so TBC in Oct 2024

CDT 第 1 年，预计 2024 年 10 月

批准号：
2879865
财政年份：
2027
资助金额：
--
项目类别：
Studentship

Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds

了解野生鸟类肠道微生物组、行为和城市化之间的相互作用

批准号：
2876993
财政年份：
2027
资助金额：
--
项目类别：
Studentship

相似国自然基金

AI 辅助药物设计姜黄素化合物的靶向结构修饰及其防治肝衰竭的成药性研究

批准号：
JCZRLH202500512
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

应用于AI芯片的先进封装TSV关键技术研发

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

AI驱动的工业微生物合成元件挖掘与产品智造

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

基于“治未病”理论构建AI赋能下的肥胖伴焦虑状态针灸数智化防治体系

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

基于AI 技术的高校网络舆情监测与治理路径研究

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

基于可穿戴设备与AI动态优化的阿尔茨海默病早期生活方式干预系统研发及效应研究

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

成渝交通一体化背景下的高速公路智慧管控系统：大数据驱动、AI预警与数智决策

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

AI驱动药物研发的技术发展趋势及重庆技术创新路径选择战略研究

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

AI赋能职业教育：“智慧职教”平台教学视频核心知识抽取研究

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

患者安全视角下医疗AI技术对医务人员风险感知的双刃剑机制研究

批准号：
批准年份：
2025
资助金额：
0.0 万元
项目类别：
省市级项目

相似海外基金

Home helper robots: Understanding our future lives with human-like AI

家庭帮手机器人：用类人人工智能了解我们的未来生活

批准号：
FT230100021
财政年份：
2025
资助金额：
--
项目类别：
ARC Future Fellowships

An innovative platform using ML/AI to analyse farm data and deliver insights to improve farm performance, increasing farm profitability by 5-10%

An%20innovative%20platform%20using%20ML/AI%20to%20analysis%20farm%20data%20and%20deliver%20insights%20to%20improv%20farm%20performance,%20increasing%20farm%20profitability%20by%205-10%