Design of Analog-AI Hardware Accelerators for Transformer-based Language Models (Invited)

Geoffrey Burr; Sidney Tsai; William Simon; Irem Boybat-Kara; Stefano Ambrogio; Chung-En Ho; Ze-Wei Liou; Malte Rasch; Julian Büchel; Pritish Narayanan; Tarl Gordon; Shubham Jain; Ted Levin; Kohji Hosokawa; Manuel Le Gallo; Hunter Smith; Masatoshi Ishii; Y. Kohda; An Chen; Charles Mackin; Andrea Fasoli; Kaoutar El Maghraoui; Ramachandran Muralidhar; Atsuya Okazaki; Ching-Tzu Chen; Martin Frank; Corey Liam Lammie; A. Vasilopoulos; Alexander Friz; Jose Luquin; Sean Teehan; Ishtiaq Ahsan; Abu Sebastian; Vijay Narayanan

doi:10.1109/IEDM45741.2023.10413767

IEDM 2023

Invited talk

12 Dec 2023

Design of Analog-AI Hardware Accelerators for Transformer-based Language Models (Invited)

View publication

Abstract

Analog Non-Volatile Memory-based accelerators offer high-throughput and energy-efficient Multiply-Accumulate operations for the large Fully-Connected layers that dominate Transformer-based Large Language Models. We describe architectural, wafer-scale testing, chip-demo, and hardware-aware training efforts towards such accelerators, and quantify the unique raw-throughput and latency benefits of Fully- (rather than Partially-) Weight-Stationary systems.

Conference paper