权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Multimodal, Interpretable, and Interactive Machine Learning for Multimedia

多媒体的多模式、可解释和交互式机器学习

基本信息

批准号：
RGPIN-2020-05471
负责人：
Khan, Naimul
金额：
$ 2.4万
依托单位：
Ryerson University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=740609
关键词：
Multimodal Interpretable Interactive Machine Learning

项目摘要

Despite the widespread adoption of ML in many domains, some underlying issues reduce the scope and ubiquity of ML adoption, especially for multimedia signal processing, where signals can take many forms. The proposed research program will attempt at solving two key issues: 1) ML for multimodal multimedia signals, and 2) increasing interpretability and interactivity of multimedia signal processing with ML. Multimodal signal is prevalent in many application areas of ML, such as action recognition (camera, depth, inertial sensors) and computer-aided diagnosis (Ultrasound, MRI, CT, PET). The rise of Deep Learning (DL) has resulted in impressive performance on learning from a single modality. However, challenges remain in how to combine these modalities efficiently. Another issue is the black box nature of ML, particularly DL, due to its complexity. This is critical in sensitive domains such as medical, where a thorough understanding of the model is necessary to earn trust of healthcare professionals. A closely related issue is interactivity, which enables domain knowledge integration in an ML model through user feedback. Genuine human questions such as interest and relevance are inherently tricky for ML models to capture. Together, interpretability and interactivity can help in increasing the trust of ML. The long-term objective of this proposal is to develop tools and techniques for multimodal, interpretable, and interactive ML that can serve as an enabling technology for a broad range of applications while training HQP for digital media and medical imaging industries. The short-term objectives are to create: 1. A Multi-level Multimodal learning method with Convolutional Neural Networks that can combine different levels of data abstraction and perform end-to-end training through novel moment-gated fusion layers that preserve discriminative information while reducing the feature space dimension, and associated loss functions that force capture of discriminative and correlated multimodal information; 2. Interactive and Interpretable (I2ML) through model agnostic explanation and interaction, where we extend our recently proposed model agnostic method to generate global explanations, and provide a human-in-the-loop mechanism for manipulation of the underlying features to capture domain expertise; 3. Applications of the frameworks where we apply and validate the proposed multimodal and I2ML frameworks in two domains: 1) gesture recognition in AR, 2) computer-aided diagnosis. The expected outcome of the program is new techniques and tools for the ubiquitous adoption of ML. With a rising digital economy and Canada's emerging role as a global technology hub, the proposed program will benefit Canada immensely, opening new means of integration between ML and multimedia for a broad range of applications, thus creating both new creative exploration and technological opportunities.

尽管ML在许多领域被广泛采用，但一些潜在的问题限制了ML的采用范围和普遍性，特别是在多媒体信号处理中，其中信号可以采取多种形式。提出的研究计划将试图解决两个关键问题：1)多模式多媒体信号的ML；2)用ML提高多媒体信号处理的可解释性和互动性。多峰信号广泛存在于动作识别(相机、深度、惯性传感器)和计算机辅助诊断(超声、核磁共振、CT、PET)等领域。深度学习的兴起带来了令人印象深刻的单一学习方式。然而，在如何有效地结合这些模式方面仍然存在挑战。另一个问题是ML的黑箱性质，特别是DL，因为它的复杂性。这在医疗等敏感领域至关重要，在这些领域，对该模型的透彻理解是赢得医疗保健专业人员信任的必要条件。一个密切相关的问题是互动性，它通过用户反馈将领域知识集成到ML模型中。对于ML模型来说，兴趣和相关性等真正的人类问题本身就很难捕捉到。加在一起，可解释性和互动性有助于增加ML的信任。这项提案的长期目标是开发多模式、可解释和交互式ML的工具和技术，这些工具和技术可用作广泛应用的使能技术，同时培训数字媒体和医学成像行业的HQP。短期目标是：1.一种基于卷积神经网络多层多模式学习方法，它可以结合不同层次的数据抽象，并通过新颖的矩门控融合层进行端到端的训练，该融合层在保留区分信息的同时降低特征空间维度，以及与强制捕获区分和相关的多模式信息相关的损失函数；2.通过模型无关的解释和交互来交互和解释(I2ML)，其中我们扩展了我们最近提出的模型不可知的方法来生成全局解释，并提供了人在环中的机制来操纵底层特征以获取领域专业知识；3.框架的应用，我们在两个领域应用和验证了所提出的多通道和I2ML框架：1)AR中的手势识别；2)计算机辅助诊断。该计划的预期结果是普遍采用ML的新技术和工具。随着数字经济的崛起和加拿大作为全球技术中心的新兴角色，拟议中的计划将使加拿大受益匪浅，为广泛的应用打开ML和多媒体之间整合的新途径，从而创造新的创造性探索和技术机会。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Khan, Naimul其他文献

CNN-Based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors

DOI：
10.1109/jsen.2020.3028561
发表时间：
2021-02-01
期刊：
IEEE SENSORS JOURNAL
影响因子：
4.3
作者：
Ahmad, Zeeshan;Khan, Naimul
通讯作者：
Khan, Naimul

Mobile Health-Supported Virtual Reality and Group Problem Management Plus: Protocol for a Cluster Randomized Trial Among Urban Refugee and Displaced Youth in Kampala, Uganda (Tushirikiane4MH, Supporting Each Other for Mental Health).

DOI：
10.2196/42342
发表时间：
2022-12-08
期刊：
JMIR RESEARCH PROTOCOLS
影响因子：
1.7
作者：
Logie, Carmen H;Okumu, Moses;Kortenaar, Jean-Luc;Gittings, Lesley;Khan, Naimul;Hakiza, Robert;Kibuuka Musoke, Daniel;Nakitende, Aidah;Katisi, Brenda;Kyambadde, Peter;Khan, Torsum;Lester, Richard;Mbuagbaw, Lawrence
通讯作者：
Mbuagbaw, Lawrence

Classification of lung pathologies in neonates using dual-tree complex wavelet transform.

DOI：
10.1186/s12938-023-01184-x
发表时间：
2023-12-04
期刊：
BIOMEDICAL ENGINEERING ONLINE
影响因子：
3.9
作者：
Aujla, Sagarjit;Mohamed, Adel;Tan, Ryan;Magtibay, Karl;Tan, Randy;Gao, Lei;Khan, Naimul;Umapathy, Karthikeyan
通讯作者：
Umapathy, Karthikeyan