Graph Neural Network Inference on Multi-FPGA Clusters
多 FPGA 集群上的图神经网络推理
基本信息
- 批准号:2894270
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Neural networks have been widely deployed to achieve state-of-the-art performance in tasks within various domains, such as in image classification, machine translation, and text generation. Such models are typically executed on Graphical Processing Units (GPU), which are widely commercially available, and offer large performance improvements over general-purpose computers due to their deeply parallelized architecture.With increasing complexity in cutting edge models, GPUs have shown a performance limitation due to expensive data management mechanisms. In particular, low-latency applications such as in high-energy physics or autonomous vehicles show the need for custom hardware to achieve sub-microsecond computation. Field-Programmable Gate Arrays (FPGA) are a class of integrated circuit which are well capable of meeting these requirements due to their reconfigurable fabric, and have been shown to achieve up to 10x latency and throughput improvements over GPU counterparts, with orders of magnitude lower power consumption. Additionally, FPGAs provide the flexibility to perform fine-grained optimizations in the network implementation, due to their reconfigurability.In recent times, Graph Neural Networks (GNNs) have attracted great attention due to their classification performance on non-Euclidean data, such as in social networks, drug discovery and recommendation systems. FPGA acceleration proves particularly beneficial for GNNs given their irregular memory access patterns, resulting from the sparse structure of graphs. These unique compute requirements have been addressed by several FPGA accelerators in the literature. Despite the benefits of inference on reconfigurable logic, high-end FPGAs are still limited by resource availability on-chip. This challenge can be addressed by FPGA clusters connecting multiple devices through high-speed interconnects. This offers the ability to scale inference performance approximately linearly with the number of devices connected in the network. This approach has been explored in the literature to accelerate Convolutional Neural Networks (CNN), through an exploration of dedicated layer partitioning approaches.Although this method has proved effective for CNN acceleration, GNNs offer an unexplored problem setting. GNNs have shown an inherently shallower structure than CNNs since the number of layers corresponds to the number of neighbours through which features propagate. As such, my research aims to demonstrate that GNN inference on FPGA clusters benefits most from partitioning in the graph rather than layer dimension.Several graph partitioning approaches have been proposed in the literature; a naïve approach involves splitting the adjacency matrix into regular node intervals. Alternatively, dynamic sliding-window based approaches consider the graph data, leading to denser partitions and higher spatial locality. In real-time applications, the latency of this pre-processing step needs to be traded-off against the added throughput in node feature transformations per layer. With any given partitioning scheme, a distributed node transformation engine requires careful consideration of data coherency, a classic problem in computer architecture. The distribution of feature updates across several devices with dedicated memory components shows the need for "residual" connections between devices such that messages can be computed. Various hardware optimisations could then be explored to limit the overhead of intra-device communication.In conclusion, as the demand for efficient hardware acceleration grows beyond traditional GPUs, FPGAs present a compelling solution. However, scalability challenges in high-end FPGAs prompt the exploration of FPGA clusters. For GNNs, the proposal to shift from layer to graph partitioning in FPGA clusters shows promise, but refining partitioning strategies and addressing data coherency are critical for unlocking the full potential
神经网络已被广泛应用于各种领域的任务中,如图像分类、机器翻译和文本生成。这些模型通常在图形处理单元(GPU)上执行,GPU在商业上广泛可用,并且由于其深度并行架构而提供了比通用计算机更大的性能改进。随着尖端模型的日益复杂,由于昂贵的数据管理机制,gpu已经显示出性能限制。特别是,在高能物理或自动驾驶汽车等低延迟应用中,需要定制硬件来实现亚微秒级计算。现场可编程门阵列(FPGA)是一类集成电路,由于其可重构结构,能够很好地满足这些要求,并且已被证明可以实现高达10倍的延迟和吞吐量改进,超过GPU对应物,具有数量级更低的功耗。此外,由于fpga的可重构性,它提供了在网络实现中执行细粒度优化的灵活性。近年来,图神经网络(Graph Neural Networks, gnn)因其在社交网络、药物发现和推荐系统等非欧几里得数据上的分类性能而备受关注。FPGA加速被证明特别有利于gnn,因为它们的不规则内存访问模式是由图的稀疏结构造成的。这些独特的计算需求已经由几个FPGA加速器在文献中解决。尽管基于可重构逻辑的推理具有优势,但高端fpga仍然受到片上资源可用性的限制。这一挑战可以通过FPGA集群通过高速互连连接多个设备来解决。这提供了与网络中连接的设备数量近似线性扩展推理性能的能力。这种方法已经在文献中进行了探索,通过探索专用层划分方法来加速卷积神经网络(CNN)。虽然这种方法已被证明对CNN加速是有效的,但gnn提供了一个未探索的问题设置。由于层的数量对应于特征传播的邻居的数量,gnn显示出比cnn固有的更浅的结构。因此,我的研究旨在证明FPGA集群上的GNN推理从图中的划分而不是层维度中获益最多。文献中提出了几种图划分方法;naïve方法涉及将邻接矩阵拆分为规则的节点间隔。另外,基于动态滑动窗口的方法考虑图形数据,导致更密集的分区和更高的空间局部性。在实时应用程序中,这个预处理步骤的延迟需要与每层节点特征转换中增加的吞吐量进行权衡。对于任何给定的分区方案,分布式节点转换引擎都需要仔细考虑数据一致性,这是计算机体系结构中的一个经典问题。功能更新在多个具有专用内存组件的设备之间的分布表明,需要在设备之间建立“剩余”连接,以便可以计算消息。然后可以探索各种硬件优化,以限制设备内通信的开销。总之,随着对高效硬件加速的需求超越传统gpu, fpga提出了一个令人信服的解决方案。然而,高端FPGA的可扩展性挑战促使FPGA集群的探索。对于gnn,在FPGA集群中从层分区转向图分区的建议显示出希望,但改进分区策略和解决数据一致性对于释放全部潜力至关重要
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似国自然基金
Neural Process模型的多样化高保真技术研究
- 批准号:62306326
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Heterogeneous Graph Neural Network based Federated Mobile Crowdsensing
基于异构图神经网络的联合移动群智感知
- 批准号:
23K24829 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
CSR: Small: Processing-in-Memory enabled Manycore Systems to Accelerate Graph Neural Network-based Data Analytics
CSR:小型:启用内存处理的众核系统可加速基于图神经网络的数据分析
- 批准号:
2308530 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Deepening Graph Neural Network Technology
深化图神经网络技术
- 批准号:
23H03451 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
Construction of a deep graph neural network that prevents over-smoothing
构建防止过度平滑的深度图神经网络
- 批准号:
23K11241 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)
CRII: III: Self-Supervised Graph Neural Network Meta-Learning for Cancer Multi-Omics and Driver Discovery
CRII:III:用于癌症多组学和驱动发现的自监督图神经网络元学习
- 批准号:
2245805 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Characterization of network structure in random graph models and neural data
随机图模型和神经数据中网络结构的表征
- 批准号:
574654-2022 - 财政年份:2022
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
CNS Core: Small: Transparently Scaling Graph Neural Network Training to Large-Scale Models and Graphs
CNS 核心:小型:透明地将图神经网络训练扩展到大规模模型和图
- 批准号:
2224054 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Standard Grant
Heterogeneous Graph Neural Network based Federated Mobile Crowdsensing
基于异构图神经网络的联合移动群智感知
- 批准号:
22H03573 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (B)
Graph Neural Network model for prediction of treatment response in DLBCL
用于预测 DLBCL 治疗反应的图神经网络模型
- 批准号:
563543-2021 - 财政年份:2021
- 资助金额:
-- - 项目类别:
University Undergraduate Student Research Awards
Neural Network based Graph Learning: Model Evolution and Real-World Application
基于神经网络的图学习:模型演化和实际应用
- 批准号:
21K12042 - 财政年份:2021
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Scientific Research (C)