权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Multimodal, Interpretable, and Interactive Machine Learning for Multimedia

多媒体的多模式、可解释和交互式机器学习

基本信息

批准号：
RGPIN-2020-05471
负责人：
Khan, Naimul
金额：
$ 2.4万
依托单位：
Ryerson University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=752554
关键词：
Multimodal Interpretable Interactive Machine Learning

项目摘要

Despite the widespread adoption of ML in many domains, some underlying issues reduce the scope and ubiquity of ML adoption, especially for multimedia signal processing, where signals can take many forms. The proposed research program will attempt at solving two key issues: 1) ML for multimodal multimedia signals, and 2) increasing interpretability and interactivity of multimedia signal processing with ML. Multimodal signal is prevalent in many application areas of ML, such as action recognition (camera, depth, inertial sensors) and computer-aided diagnosis (Ultrasound, MRI, CT, PET). The rise of Deep Learning (DL) has resulted in impressive performance on learning from a single modality. However, challenges remain in how to combine these modalities efficiently. Another issue is the black box nature of ML, particularly DL, due to its complexity. This is critical in sensitive domains such as medical, where a thorough understanding of the model is necessary to earn trust of healthcare professionals. A closely related issue is interactivity, which enables domain knowledge integration in an ML model through user feedback. Genuine human questions such as interest and relevance are inherently tricky for ML models to capture. Together, interpretability and interactivity can help in increasing the trust of ML. The long-term objective of this proposal is to develop tools and techniques for multimodal, interpretable, and interactive ML that can serve as an enabling technology for a broad range of applications while training HQP for digital media and medical imaging industries. The short-term objectives are to create: 1. A Multi-level Multimodal learning method with Convolutional Neural Networks that can combine different levels of data abstraction and perform end-to-end training through novel moment-gated fusion layers that preserve discriminative information while reducing the feature space dimension, and associated loss functions that force capture of discriminative and correlated multimodal information; 2. Interactive and Interpretable (I2ML) through model agnostic explanation and interaction, where we extend our recently proposed model agnostic method to generate global explanations, and provide a human-in-the-loop mechanism for manipulation of the underlying features to capture domain expertise; 3. Applications of the frameworks where we apply and validate the proposed multimodal and I2ML frameworks in two domains: 1) gesture recognition in AR, 2) computer-aided diagnosis. The expected outcome of the program is new techniques and tools for the ubiquitous adoption of ML. With a rising digital economy and Canada's emerging role as a global technology hub, the proposed program will benefit Canada immensely, opening new means of integration between ML and multimedia for a broad range of applications, thus creating both new creative exploration and technological opportunities.

尽管ML在许多领域得到了广泛的采用，但一些潜在的问题降低了ML采用的范围和普遍性，特别是对于信号可以采取多种形式的多媒体信号处理。拟议的研究计划将试图解决两个关键问题：1）ML多模态多媒体信号，和2）增加可解释性和多媒体信号处理与ML的交互性。多模态信号在机器学习的许多应用领域都很普遍，例如动作识别（相机，深度，惯性传感器）和计算机辅助诊断（超声，MRI，CT，PET）。深度学习（DL）的兴起在从单一模式学习方面取得了令人印象深刻的性能。然而，在如何有效地将这些方式联合收割机结合起来方面仍然存在挑战。另一个问题是ML的黑盒性质，特别是DL，由于其复杂性。这在医疗等敏感领域至关重要，在这些领域，对模型的透彻理解是赢得医疗保健专业人员信任的必要条件。一个密切相关的问题是交互性，它通过用户反馈实现ML模型中的领域知识集成。真正的人类问题，如兴趣和相关性，对于ML模型来说，捕捉起来本质上是棘手的。总之，可解释性和交互性可以帮助增加ML的信任。该提案的长期目标是为多模式，可解释和交互式ML开发工具和技术，这些工具和技术可以作为广泛应用的支持技术，同时为数字媒体和医学成像行业培训HQP。短期目标是：1。一种具有卷积神经网络的多级多模态学习方法，其可以联合收割机不同级别的数据抽象并通过新颖的时刻门控融合层执行端到端训练，所述时刻门控融合层在减少特征空间维度的同时保留有区别的信息，并且相关联的损失函数强制捕获有区别的和相关的多模态信息; 2.交互式和可解释的（I2 ML）通过模型不可知的解释和交互，在那里我们扩展了我们最近提出的模型不可知的方法来生成全局解释，并提供了一个人在回路中的机制，用于操纵底层功能以捕获领域专业知识; 3.我们在两个领域应用和验证所提出的多模态和I2 ML框架的框架的应用：1）AR中的手势识别，2）计算机辅助诊断。该计划的预期成果是普遍采用ML的新技术和工具。随着数字经济的崛起和加拿大作为全球技术中心的新兴角色，拟议的计划将使加拿大受益匪浅，为ML和多媒体之间的广泛应用开辟新的整合手段，从而创造新的创造性探索和技术机会。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Khan, Naimul其他文献

CNN-Based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors

DOI：
10.1109/jsen.2020.3028561
发表时间：
2021-02-01
期刊：
IEEE SENSORS JOURNAL
影响因子：
4.3
作者：
Ahmad, Zeeshan;Khan, Naimul
通讯作者：
Khan, Naimul

Mobile Health-Supported Virtual Reality and Group Problem Management Plus: Protocol for a Cluster Randomized Trial Among Urban Refugee and Displaced Youth in Kampala, Uganda (Tushirikiane4MH, Supporting Each Other for Mental Health).

DOI：
10.2196/42342
发表时间：
2022-12-08
期刊：
JMIR RESEARCH PROTOCOLS
影响因子：
1.7
作者：
Logie, Carmen H;Okumu, Moses;Kortenaar, Jean-Luc;Gittings, Lesley;Khan, Naimul;Hakiza, Robert;Kibuuka Musoke, Daniel;Nakitende, Aidah;Katisi, Brenda;Kyambadde, Peter;Khan, Torsum;Lester, Richard;Mbuagbaw, Lawrence
通讯作者：
Mbuagbaw, Lawrence

Classification of lung pathologies in neonates using dual-tree complex wavelet transform.

DOI：
10.1186/s12938-023-01184-x
发表时间：
2023-12-04
期刊：
BIOMEDICAL ENGINEERING ONLINE
影响因子：
3.9
作者：
Aujla, Sagarjit;Mohamed, Adel;Tan, Ryan;Magtibay, Karl;Tan, Randy;Gao, Lei;Khan, Naimul;Umapathy, Karthikeyan
通讯作者：
Umapathy, Karthikeyan