Eric A. Joseph
AVS 2023
In today’s AI-driven landscape, designing hardware to accelerate deep learning inference faces challenges in meeting growing demands for speed, computational efficiency, and flexibility. Emerging technologies like Analog Compute-In-Memory (ACIM) hold significant promise for enhancing computational efficiency in these workloads, particularly for matrix-vector multiplications. However, architectural constraints hinder ACIM’s ability to support end-to-end workloads, necessitating complementary specialized digital units for essential operations like activation functions, pooling, and attention. These units are typically hardwired, lacking the adaptability needed in a rapidly evolving AI landscape. To address this challenge, we propose EAGLE, an ACIM-based architecture that integrates general-purpose Programmable Multi-Core Accelerators (PMCAs) as the sole digital accelerator, offering unprecedented flexibility and enabling seamless end-to-end execution across diverse workloads while maintaining competitive throughput. The architecture integrates state-of-the-art RISC-V Snitch clusters as PMCAs, extended with specialized instructions to enhance performance and energy efficiency. We introduce novel lightweight LUT-based ISA extensions to approximate commonly used transcendental functions and perform HW/SW co-optimization of key AI kernels, leveraging hardware extensions to eliminate control and memory instruction overhead. The flexibility of EAGLE is demonstrated across a range of model architectures, including encoder- and decoder-based transformers, a convolutional neural network, and a recurrent model (LSTM). Evaluated in 28 nm FD-SOI, EAGLE sustains 61.1/97.96 Inf/s on end-to-end MobileBERT/BERT-Large inference, achieving up to 3× energy efficiency improvements over state-of-the-art 8 nm GPUs.
Eric A. Joseph
AVS 2023
Minhua Lu, Joyce Liu, et al.
ECTC 2025
An Chen, Stefano Ambrogio, et al.
ECS Spring Meeting 2024
Alexandre Foucher, Baoming Wang, et al.
MRS Fall Meeting 2022