Efficient AI System Design with Cross-Layer Approximate Computing

Swagath Venkataramani; Xiao Sun; Naigang Wang; Chia-Yu Chen; Jungwook Choi; Mingu Kang; Ankur Agarwal; Jinwook Oh; Shubham Jain; Tina Babinsky; Nianzheng Cao; Thomas Fox; Bruce Fleischer; George Gristede; Michael Guillorn; Howard Haynie; Hiroshi Inoue; Kazuaki Ishizaki; Michael Klaiber; Shih-Hsien Lo; Gary Maier; Silvia Mueller; Michael Scheuermann; Eri Ogawa; Marcel Schaal; Mauricio Serrano; Joel Silberman; Christos Vezyrtzis; Wei Wang; Fanchieh Yee; Jintao Zhang; Matthew Ziegler; Ching Zhou; Moriyoshi Ohara; Pong-Fei Lu; Brian Curran; Sunil Shukla; Vijayalakshmi Srinivasan; Leland Chang; Kailash Gopalakrishnan

doi:10.1109/JPROC.2020.3029453

Proceedings of the IEEE

Paper

01 Dec 2020

Efficient AI System Design with Cross-Layer Approximate Computing

View publication

Abstract

Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels of accuracy on many AI tasks and ushered the explosive growth of AI workloads across the spectrum of computing devices. However, their superior accuracy comes at a high computational cost, which necessitates approaches beyond traditional computing paradigms to improve their operational efficiency. Leveraging the application-level insight of error resilience, we demonstrate how approximate computing (AxC) can significantly boost the efficiency of AI platforms and play a pivotal role in the broader adoption of AI-based applications and services. To this end, we present RaPiD, a multi-tera operations per second (TOPS) AI hardware accelerator core (fabricated at 14-nm technology) that we built from the ground-up using AxC techniques across the stack including algorithms, architecture, programmability, and hardware. We highlight the workload-guided systematic explorations of AxC techniques for AI, including custom number representations, quantization/pruning methodologies, mixed-precision architecture design, instruction sets, and compiler technologies with quality programmability, employed in the RaPiD accelerator.

Conference paper