Paper

Demonstration of transformer-based ALBERT model on a 14nm analog AI inference chip

Abstract

ALite Bidirectional Encoder Representations from Transformers model is demonstrated on an analog inference chip fabricated at 14nm node with phase change memory. The 7.1 million unique analog weights shared across 12 layers are mapped to a single chip, accurately programmed into the conductance of 28.3 million devices, for this first analog hardware demonstration of a meaningfully large Transformer model. The implemented model achieved near iso-accuracy on the General Language Understanding Evaluation benchmark of seven tasks, despite the presence of weight-programming errors, hardware imperfections, readout noise, and error propagation. The average hardware accuracy was only 1.8% below that of the floating-point reference, with several tasks at full iso-accuracy. Careful fine-tuning of model weights using hardware-aware techniques contributes an average hardware accuracy improvement of 4.4%. Accuracy loss due to conductance drift – measured to be roughly 5% over 30 days – was reduced to less than 1% with a recalibration-based “drift compensation” technique.