Fast PGAS implementation of distributed graph algorithms
Guojing Cong, George Almasi, et al.
SC 2010
Today's multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors. Existing model-based heuristics for performance optimization used in compilers are limited in their ability to identify profitable parallelism/locality trade-offs and usually lead to sub-optimal performance. To address this problem, we distinguish optimizations for which effective model-based heuristics and profitability estimates exist, from optimizations that require empirical search to achieve good performance in a portable fashion. We have developed a completely automatic framework in which we focus the empirical search on the set of valid possibilities to perform fusion/code motion, and rely on model-based mechanisms to perform tiling, vectorization and parallelization on the transformed program. We demonstrate the effectiveness of this approach in terms of strong performance improvements on a single target as well as performance portability across different target architectures. © 2010 IEEE.
Guojing Cong, George Almasi, et al.
SC 2010
Mark Giampapa, Thomas Gooding, et al.
SC 2010
Rajesh Bordawekar, Uday Bondhugula, et al.
PACT 2010
Virat Agarwal, Fabrizio Petrini, et al.
SC 2010