权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Making Visualization Scalable (MAVIS) for explaining machine learning classification models

使可视化可扩展（MAVIS）用于解释机器学习分类模型

基本信息

批准号：
EP/X029689/1
负责人：
Roy Ruddle
金额：
$ 72.72万
依托单位：
University of Leeds
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX029689%2F1
关键词：
Making Visualization Scalable MAVIS explaining

项目摘要

People leverage the power of interactive visualization to make sense of, and gain insights from large/complex datasets. In machine learning (ML), developers need to understand how their models work. Stakeholders who make decisions with, or are affected by, such models also need that understanding (albeit in broader terms), for legislative and good business practice reasons. That is the essence of "Explainable AI" (XAI) - an approach that allows the reasoning of ML and other types of AI model to be explained.Visualizations work by showing people graphical patterns from which characteristics "pop out" or can be found by inspection ("visual search"). Although some types of visualization are agnostic to scale (e.g., outliers pop out in a box plot, irrespective of the number of values), many visualizations break down (e.g., due to overplotting, trying to use colour to encode categorical variables that have dozens of levels, or forcing people to zoom/scroll a lot just to see all the data). The Making Visualization Scalable (MAVIS) project will address these deficiencies by bringing together a multidisciplinary team of experts in visualization, user evaluation, visual communication, machine learning and statistics, with a strong track record in fundamental and applied visualization research. The team are all part of the Leeds Institute for Data Analytics (LIDA), where researchers from six faculties work collaboratively to address specialist challenges. The project's aim is to develop and evaluate methods for visually communicating and interacting with data in visualizations that are effective for the large/complex data that is commonplace in XAI, specifically focussing on the explainability of ML classification models (deep neural decision trees, gradient boosting, random forests, etc.).The project is divided into four work packages (WPs). WP1 will identify fine-grained tasks that developers and stakeholders perform to explain ML models, via a literature review and real-world scenarios. We will also create datasets suitable for investigating those tasks, and publish the annotated/documented datasets and our software as a resource for other researchers and practitioners. The heart of the project is WP2 (static visualizations) and WP3 (interactive visualizations), where we will answer two questions that are central to the funding call: "how to improve visualizations?" and "how should people interact with data and visualizations?" Our driving hypothesis in WP2 is, counterintuitively, that as data gets more complex visualizations should be made simpler. To investigate that hypothesis, we will: (a) quantify (response time; error rate) and characterise the scales of data where visual encodings (colour, etc.) break down and impede people from gaining insights, (b) address the breakdowns by developing and evaluating new encoding simplification methods based on visual mappings and view transformations, and (c) compare and evaluate widely used and hybrid chart types. WP3 has a similar structure, starting by investigating how people interact in the WP1 tasks, to identify barriers, obstacles and inefficiencies. From that analysis, we will identify requirements for new interaction designs, and develop and evaluate corresponding solutions. By following this rigorous approach, we are confident that our new visualization designs will transform the effectiveness with which people can work (as we have previously shown in genomics, petrophysics and other applications).WP4 grounds our fundamental research in real-world scenarios from transport, health and business. We will perform two phases of field evaluations to corroborate the benefits of our best visual communication (WP2) and interaction designs (WP3), and answer the question "how improving visualizations and interactions can improve human centred decision making?" when people need to understand, diagnose, or explain ML classification models.

人们利用交互式可视化的力量来理解大型/复杂的数据集，并从中获得见解。在机器学习（ML）中，开发人员需要了解他们的模型是如何工作的。出于立法和良好商业实践的原因，使用这些模型做出决策或受其影响的利益相关者也需要这种理解（尽管是更广泛的理解）。这就是“可解释的人工智能”（XAI）的本质——一种允许机器学习和其他类型人工智能模型推理的方法。可视化的工作原理是向人们展示图形模式，从这些图形模式中“弹出”出特征，或者可以通过检查（“视觉搜索”）找到特征。尽管某些类型的可视化与比例无关（例如，不管值的数量多少，框图中的异常值都会弹出），但许多可视化会崩溃（例如，由于过度绘图，试图使用颜色来编码具有数十个级别的分类变量，或者迫使人们放大/滚动很多只是为了查看所有数据）。使可视化可扩展（MAVIS）项目将通过汇集可视化、用户评估、视觉通信、机器学习和统计方面的多学科专家团队来解决这些缺陷，并在基础和应用可视化研究方面有着良好的记录。该团队都是利兹数据分析研究所（LIDA）的一部分，来自六个学院的研究人员合作解决专业挑战。该项目的目标是开发和评估可视化中与数据进行视觉交流和交互的方法，这些方法对XAI中常见的大型/复杂数据有效，特别关注ML分类模型的可解释性（深度神经决策树，梯度增强，随机森林等）。该项目分为四个工作包（wp）。WP1将通过文献回顾和现实世界场景，确定开发人员和涉众执行的细粒度任务，以解释ML模型。我们还将创建适合调查这些任务的数据集，并发布注释/文档数据集和我们的软件，作为其他研究人员和从业者的资源。该项目的核心是WP2（静态可视化）和WP3（交互式可视化），我们将在其中回答两个与筹资呼吁有关的核心问题：“如何改进可视化？”和“人们应该如何与数据和可视化进行交互？”我们在WP2中的驱动假设是，与直觉相反，随着数据变得越来越复杂，可视化应该变得更简单。为了研究这一假设，我们将：(a)量化（响应时间；错误率）并描述视觉编码（颜色等）崩溃并阻碍人们获得洞察力的数据规模，(b)通过开发和评估基于视觉映射和视图转换的新编码简化方法来解决崩溃问题，以及(c)比较和评估广泛使用的混合图表类型。WP3也有类似的结构，从调查人们如何在WP1任务中相互作用开始，以识别障碍、障碍和低效率。从分析中，我们将确定新的交互设计的需求，并开发和评估相应的解决方案。通过遵循这种严格的方法，我们相信新的可视化设计将改变人们工作的效率（正如我们之前在基因组学、岩石物理学和其他应用中所展示的那样）。WP4将我们的基础研究立足于交通、健康和商业等现实场景。我们将执行两个阶段的现场评估，以证实我们最好的视觉传达（WP2）和交互设计（WP3）的好处，并在人们需要理解、诊断或解释ML分类模型时回答“如何改进可视化和交互可以改善以人为中心的决策？”这个问题。