Architecture and Design Approaches towards Large-scale AI Hardware Acceleration

Ashish Ranjan

DAC 2025

Invited talk

22 Jun 2025

Architecture and Design Approaches towards Large-scale AI Hardware Acceleration

Abstract

Generative AI models such as LLMs have emerged as the state-of-the-art in various machine learning applications, including vision, speech recognition, code generation, and machine translation. These large transformer-based models surpass traditional machine learning methods, albeit at the expense of hundreds of ExaOps in computation. Hardware specialization and acceleration play a crucial role in enhancing the operational efficiency of these models, in turn necessitating synergistic cross-layer design across algorithms, hardware, and software.

In this talk, I will focus on the challenges and opportunities introduced by these models (e.g., trade-offs between compute vs memory BW). Recent advances in AI algorithms and reduced precision/quantization techniques have led to improvements in compute efficiency while maintaining the same level of accuracy. System architectures need to be tailor made to mimic the communication patterns of LLMs which the software can then leverage to feed data to the compute engines, achieving high sustained compute utilization. The talk will present this holistic approach adopted in the design of the recently announced IBM Spyre accelerator.

Conference paper