Invited talk

Deep Neural Network Inference with Analog In-Memory Computing

Abstract

The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications directly within the network weights stored on a chip to execute an inference workload. In this talk, I will first present our latest multi-core AIMC chip in 14-nm complementary metal–oxide–semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. Experimental inference results on ResNet and LSTM networks will be presented, with all the computations associated with the weight layers and the activation functions implemented on-chip. Then, I will present our open-source toolkit (https://aihw-composer.draco.res.ibm.com/) to simulate inference and training of neural networks with AIMC. Finally, I will present our latest architectural solutions to increase the weight capacity of AIMC chips towards supporting large-language models, as well as alternative solutions suited for low-power edge computing applications.