Stefano Ambrogio
MRS Spring Meeting 2022
The data-intensive and highly parallel compute demands of AI models have driven the integration of specialized Neural Processing Units (NPUs) into System-on-Chip devices for edge AI applications. Analog In-Memory Computing (AIMC) offers a promising approach by co-locating memory and computation, enabling notable energy efficiency improvements. This talk will present an embedded NPU architecture for deep learning inference, tailored to meet the stringent energy, area, and cost constraints of edge AI. The heterogeneous architecture combines digital and analog accelerator nodes to support diverse operation types and precision requirements. AIMC tiles leveraging Phase-Change Memory (PCM) are employed for energy-efficient matrix-vector multiplications while supporting a high non-volatile on-chip weight capacity. Complementing this, a digital data path and programmable software cluster provide flexibility and enable end-to-end inference across multiple precision levels. The discussion will also address the challenge of preserving high accuracy in AIMC-based acceleration, focusing on offline training techniques and efficient mapping strategies.
Stefano Ambrogio
MRS Spring Meeting 2022
Eric A. Joseph
AVS 2023
David Stutz, Nandhini Chandramoorthy, et al.
MLSys 2021
Juan Miguel De Haro, Rubén Cano, et al.
IPDPS 2022