G. Almasi, G. Almasi, et al.
Digest of Technical Papers - IEEE International Solid-State Circuits Conference
In this paper we propose an efficient algorithm to implement the 3-D NAS FFT benchmark. The proposed algorithm overlaps the communication with the computation. On parallel machines supporting overlap of communication with computation, our proposed algorithm can outperform the non-overlapping version of this algorithm by a factor close to two.
G. Almasi, G. Almasi, et al.
Digest of Technical Papers - IEEE International Solid-State Circuits Conference
Dror G. Feitelson
SHPCC 1994
V.K. Naik
SHPCC 1994
R.C. Agarwal, F.G. Gustavson
ACM/IEEE SC 1989