Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
The rapid growth of machine learning applications has placed unprecedented pressure on conventional computing platforms, exposing the limitations of general-purpose processors in delivering the performance, efficiency, and scalability required by modern AI workloads. Addressing these demands has led to a surge in domain-specialized hardware designs and programming paradigms that rethink how computation, memory, and parallelism are organized. This shift has catalyzed innovation across both industry and academia, with a broad ecosystem of AI accelerators emerging, including systems such as IBM’s Spyre, NVIDIA's DLA, Meta's MTIA, Google's TPU, AMD/Xilinx’s Versal ACAP, and SambaNova, alongside academic efforts like MAERI, Morph, and Extensor. In this extended abstract, we present an overview of the IBM Spyre accelerator and its compiler stack for efficiently mapping AI models to high-performance execution.
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Miao Guo, Yong Tao Pei, et al.
WCITS 2011