Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Training neural network acoustic models with sequence-discriminative criteria, such as state-level minimum Bayes risk (sMBR), been shown to produce large improvements in performance over cross-entropy. However, because they entail the processing of lattices, sequence criteria are much more computationally intensive than cross-entropy. We describe a distributed neural network training algorithm, based on Hessian-free optimization, that scales to deep networks and large data sets. For the sMBR criterion, this training algorithm is faster than stochastic gradient descent by a factor of 5.5 and yields a 4.4% relative improvement in word error rate on a 50-hour broadcast news task. Distributed Hessian-free sMBR training yields relative reductions in word error rate of 7-13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic. Our best Switchboard DBN achieves a word error rate of 16.4% on rt03-FSH.
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Po-Sen Huang, Haim Avron, et al.
ICASSP 2014
Hagen Soltau, Lidia Mangu, et al.
ASRU 2011
Hagen Soltau, George Saon, et al.
IEEE Transactions on Audio, Speech and Language Processing