Paper

Dynamic Guardband Features of the IBM z17 System

Abstract

Computer systems strive for higher performance, improved energy efficiency, reliability, fault tolerance, and sustainability. Dynamically optimizing guardbands can help achieve all of these goals with minimal design and chip area costs, leveraging on-chip sensors and targeted investments in test and firmware. Many chips use fixed voltage guardbands at each supported frequency to safeguard correct operation in the field from all threatening sources of variation, including VDD power-supply droops from sudden workload changes, temperature excursions, and device aging. Previously, advances in robust error recovery and power-supply droop mitigation techniques have been used independently to reduce required guardbands and save power. In this work, we describe an IBM z17 system that dynamically optimizes guardbands by synchronously leveraging robust droop mitigation and robust error recovery in tandem to deliver significant system power savings.