Josep Lluis Berral, David Buchaca, et al.
CLOUD 2021
The energy efficiency of GPU architectures has emerged as an essential aspect of computer system design. In this article, we explore the energy benefits of reducing the GPU chip's voltage to the safe limit, i.e., V_{\min } point, using predictive software techniques. We perform such a study on several commercial off-the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if 'eliminated' entirely, can result in up to 25% energy savings on one of the studied GPU cards. Our measurement results unveil a program dependent V_{\min } behavior across the studied applications, and the exact improvement magnitude depends on the program's available guardband. We make fundamental observations about the program-dependent V_{\min } behavior. We experimentally determine that the voltage noise has a more substantial impact on V_{\min } compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use kernels' microarchitectural performance counters to predict its V_{\min } value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate V_{\min } prediction opens up new possibilities of a cross-layer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.
Josep Lluis Berral, David Buchaca, et al.
CLOUD 2021
Hazar Yueksel, Ramon Bertran, et al.
MLSys 2020
Jovan Stojkovic, Chloe Alverti, et al.
HPCA 2025
Mert Toslali, Srinivasan Parthasarathy, et al.
HotCloud 2020