Residential Collegefalse
Status已發表Published
Redundancy-Free High-Performance Dynamic GNN Training with Hierarchical Pipeline Parallelism
Xia, Yaqi1; Zhang, Zheng1; Wang, Hulin1; Yang, Donglin2; Zhou, Xiaobo3; Cheng, Dazhao1
2023-08
Conference NameHPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
Pages17-13
Conference DateJune 16-23, 2023
Conference PlaceOrlando, FL
CountryUSA
Abstract

Temporal Graph Neural Networks (TGNNs) extend the success of Graph Neural Networks to dynamic graphs. Distributed TGNN training requires efficiently tackling temporal dependency, which often leads to excessive cross-device communication that generates significant redundant data. However, existing systems are unable to remove the redundancy in data reuse and transfer, and suffer from severe communication overhead in a distributed setting. This paper presents Sven, an algorithm and system co-designed TGNN training library for the end-to-end performance optimization on multi-node multi-GPU systems. Exploiting dependency patterns of TGNN models and characteristics of dynamic graph datasets, we design redundancy-free data organization and loadbalancing partitioning strategies that mitigate the redundant data communication and evenly partition dynamic graphs at the vertex level. Furthermore, we develop a hierarchical pipeline mechanism integrating data prefetching, micro-batch pipelining, and asynchronous pipelining to mitigate the communication overhead. As the first scaling study on the memory-based TGNNs training, experiments conducted on an HPC cluster of 64 GPUs show that Sven can achieve up to 1.7x-3.3x speedup over the state-of-art approaches and a factor of up to 5.26x communication efficiency improvement.

DOI10.1145/3588195.3592990
URLView the original
Language英語English
Scopus ID2-s2.0-85169535776
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionTHE STATE KEY LABORATORY OF INTERNET OF THINGS FOR SMART CITY (UNIVERSITY OF MACAU)
Affiliation1.Wuhan University
2.Nvidia Corp
3.University of Macau
Recommended Citation
GB/T 7714
Xia, Yaqi,Zhang, Zheng,Wang, Hulin,et al. Redundancy-Free High-Performance Dynamic GNN Training with Hierarchical Pipeline Parallelism[C], 2023, 17-13.
APA Xia, Yaqi., Zhang, Zheng., Wang, Hulin., Yang, Donglin., Zhou, Xiaobo., & Cheng, Dazhao (2023). Redundancy-Free High-Performance Dynamic GNN Training with Hierarchical Pipeline Parallelism. , 17-13.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Xia, Yaqi]'s Articles
[Zhang, Zheng]'s Articles
[Wang, Hulin]'s Articles
Baidu academic
Similar articles in Baidu academic
[Xia, Yaqi]'s Articles
[Zhang, Zheng]'s Articles
[Wang, Hulin]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Xia, Yaqi]'s Articles
[Zhang, Zheng]'s Articles
[Wang, Hulin]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.