CNS Core: Small: Transparently Scaling Graph Neural Network Training to Large-Scale Models and Graphs

CNS 核心：小型：透明地将图神经网络训练扩展到大规模模型和图

基本信息

批准号：
2224054
负责人：
Marco Serafini
金额：
$ 53.22万
依托单位：
University of Massachusetts Amherst
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-10-01 至 2025-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2224054&HistoricalAwards=false
关键词：
CNS Core Small Transparently Scaling

项目摘要

Large-scale graphs with billions of edges are ubiquitous in many industry, science, and engineering fields such as recommendation systems, social graph analysis, knowledge bases, materials science, and biology. In particular, Graph Neural Networks (GNN), an emerging class of machine learning (ML) models, are increasingly adopted due to their superior performance in many tasks. Unfortunately, the progress towards training GNNs on large-scale real-world graphs is undermined by the lack of adequate systems support for ML practitioners. This project will develop fundamental research on algorithms, systems, and infrastructures to meet the pressing and growing need for GNN training systems that can scale to both large graph datasets and large expressive GNN models transparently to users. First, this project will develop split parallelism, a novel parallel training paradigm designed to support arbitrarily large-scale graphs and GNN models by scaling out to distributed and multi-GPU (graphics processing unit) systems. Split parallelism is tailored to the specific bottlenecks of GNNs and introduces a set of techniques to transparently split the training computation across GPUs. Second, this project will develop systems for scalable graph sampling, which can be a major performance bottleneck in GNN training. It will develop a novel fragment-based in-GPU sampling approach that transparently splits samples into multiple fragments to maximize data access locality and scalability.Supporting large-scale graphs and GNN models will unleash innovation in a wide range of domains by making it easier for ML practitioners to develop large and expressive models without having to work around the scalability limitations of current GNN training systems. The project will develop novel approaches for parallel training and sampling and will introduce innovations in algorithms, infrastructure, and system design for the areas of general machine learning and graph analytics. This project will stress technology transfer to integrate the findings into popular open-source GNN training tools such as the Deep Graph Library (DGL). The PIs will also support colleagues at their department working on question answering using knowledge graphs. The project will improve the training of both graduate and undergraduate students, emphasizing demographic diversity.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在许多行业，科学和工程领域（例如推荐系统，社会图分析，知识库，材料科学和生物学）等许多行业，科学和工程领域，具有数十亿个边缘的大型图形图形。特别是，由于在许多任务中的出色表现，越来越多地采用了新兴的机器学习（ML）模型的图形神经网络（GNN）。不幸的是，由于缺乏对ML从业人员的足够的系统支持，在大规模现实图表上培训GNN的进展受到了破坏。该项目将开发有关算法，系统和基础架构的基本研究，以满足GNN培训系统的紧迫和不断增长的需求，这些需求可以扩展到大型图形数据集和大型表达式GNN模型，并向用户透明地透明地表达。首先，该项目将发展平行性，这是一种新型的并行训练范式，旨在通过扩展到分布式和多GPU（图形处理单元）系统来支持任意大规模的图形和GNN模型。分裂并行性是针对GNNS的特定瓶颈量身定制的，并引入了一组技术，以透明地将训练计算跨GPU分开。其次，该项目将开发用于可扩展图采样的系统，这可能是GNN培训中的主要性能瓶颈。它将开发一种新型的基于碎片的IN-GPU样本方法，该方法将样品透明地拆分为多个碎片，以最大程度地利用数据访问局部性和可扩展性。支持大规模的图形和GNN模型将在广泛的领域中释放创新，从而使ML实践者更容易地在范围内开发出大型和表达的模型，而无需开发大型和表达的模型，而无需遵守当前的GN量级限制。当前的GN GN量级限制了GN的量表。该项目将开发用于并行培训和采样的新方法，并将针对通用机器学习和图形分析领域的算法，基础架构和系统设计引入创新。该项目将强调技术转移，以将调查结果集成到流行的开源GNN培训工具中，例如深图库（DGL）。 PI还将支持其部门的同事使用知识图来回答问题。该项目将改善研究生和本科生的培训，强调人口多样性。该奖项反映了NSF的法定任务，并被认为是通过基金会的知识分子优点和更广泛的影响评估标准来评估值得支持的。