Toyotaro Suzumura
ICPE 2014
In this paper we propose a highly optimized parallel and distributed BFS on GPU for Graph500 benchmark. We evaluate the performance of our implementation using TSUBAME2.0 supercomputer. We achieve 317 GTEPS (billion traversed edges per second) with scale 35 (a large graph with 34.4 billion vertices and 550 billion edges) using 1366 nodes and 4096 GPUs. With this score, TSUBAME2.0 supercomputer is ranked fourth in the ranking list announced in June 2012. We analyze the performance of our implementation and the result shows that inter-node communication limits the performance of our GPU implementation. We also propose SIMD Variable-Length Quantity (VLQ) encoding for compression of communication data with GPU. © 2013 IEEE.
Toyotaro Suzumura
ICPE 2014
Nguyen Thien Bao, Toyotaro Suzumura
WWW 2013
Toyotaro Suzumura
HPGP 2016
Anuradha Karunarathna, Dinika Senarath, et al.
CLOUD 2020