Conference paper

Spyre: An inference-optimized scalable AI accelerator for enterprise workloads

Abstract

Spyre is a scalable, power-efficient AI accelerator product for enterprise workloads. Featuring 32 AI cores, mixed-precision support, and LPDDR5 memory, it fits in a single-slot PCIe form factor and scales over a standard PCIE fabric. Optimized for inference workloads, Spyre achieves 2-to-3× better power/performance than GPUs on encoder-class models and scales up to 4 or more devices for large generative models.