Workshop paper

Heterogeneous neural processing units leveraging analog in-memory computing for edge AI

Abstract

The data-intensive and highly parallel compute demands of AI models have driven the integration of specialized Neural Processing Units (NPUs) into System-on-Chip devices for edge AI applications. Analog In-Memory Computing (AIMC) offers a promising approach by co-locating memory and computation, enabling notable energy efficiency improvements. This talk will present an embedded NPU architecture for deep learning inference, tailored to meet the stringent energy, area, and cost constraints of edge AI. The heterogeneous architecture combines digital and analog accelerator nodes to support diverse operation types and precision requirements. AIMC tiles leveraging Phase-Change Memory (PCM) are employed for energy-efficient matrix-vector multiplications while supporting a high non-volatile on-chip weight capacity. Complementing this, a digital data path and programmable software cluster provide flexibility and enable end-to-end inference across multiple precision levels. The discussion will also address the challenge of preserving high accuracy in AIMC-based acceleration, focusing on offline training techniques and efficient mapping strategies.