Workshop paper

DeepTools: A Full-Stack Machine Learning Compiler for the IBM Spyre Accelerator

Abstract

The rapid growth of machine learning applications has placed unprecedented pressure on conventional computing platforms, exposing the limitations of general-purpose processors in delivering the performance, efficiency, and scalability required by modern AI workloads. Addressing these demands has led to a surge in domain-specialized hardware designs and programming paradigms that rethink how computation, memory, and parallelism are organized. This shift has catalyzed innovation across both industry and academia, with a broad ecosystem of AI accelerators emerging, including systems such as IBM’s Spyre, NVIDIA's DLA, Meta's MTIA, Google's TPU, AMD/Xilinx’s Versal ACAP, and SambaNova, alongside academic efforts like MAERI, Morph, and Extensor. In this extended abstract, we present an overview of the IBM Spyre accelerator and its compiler stack for efficiently mapping AI models to high-performance execution.