Unfolded recurrent neural networks for speech recognition
George Saon, Hagen Soltau, et al.
INTERSPEECH 2014
While Deep Neural Networks (DNNs) have achieved tremendous success for LVCSR tasks, training these networks is slow. To date, the most common approach to train DNNs is via stochastic gradient descent (SGD), serially on a single GPU machine. Serial training, coupled with the large number of training parameters and speech data set sizes, makes DNN training very slow for LVCSR tasks. While 2nd order, data-parallel methods have also been explored, these methods are not always faster on CPU clusters due to the large communication cost between processors. In this work, we explore using a specialized hardware/software approach, utilizing a Blue Gene/Q (BG/Q) system, which has thousands of processors and excellent interprocessor communication. We explore using the 2nd order Hessian-free (HF) algorithm for DNN training with BG/Q, for both cross-entropy and sequence training of DNNs. Results on three LVCSR tasks indicate that using HF with BG/Q offers up to an 11x speedup, as well as an improved word error rate (WER), compared to SGD on a GPU.
George Saon, Hagen Soltau, et al.
INTERSPEECH 2014
Sören Bleikertz, Carsten Vogel, et al.
ACSAC 2014
Robert Moore, Eric Young Liu, et al.
CUI 2020
Jean M.R. Costa, Marcelo Cataldo, et al.
CHI 2011