A language independent approach to audio search
Vikram Gupta, Jitendra Ajmera, et al.
INTERSPEECH 2011
Recently, exemplar-based sparse representation phone identification features (Spif ) have shown promising results on large vocabulary speech recognition tasks. However, one problem with exemplar-based techniques is that they are computationally expensive. In this paper, we present two methods to speed up the creation of Spif features. First, we explore a technique to quickly select a subset of informative exemplars among millions of training examples. Secondly, we make approximations to the sparse representation computation such that a matrix-matrix multiplication is reduced to a matrix-vector product. We present results on four large vocabulary tasks, including Broadcast News where acoustic models are trained with 50 and 400 hours, and a Voice Search task, where models are trained with 160 and 1000 hours. Results on all tasks indicate improvements in speedup by a factor of four relative to the original S pif features, as well as improvements in word error rate (WER) in combination with a baseline HMM system. Copyright © 2011 ISCA.
Vikram Gupta, Jitendra Ajmera, et al.
INTERSPEECH 2011
Christoph Tillmann, Sanjika Hewavitharana
INTERSPEECH 2011
Michelle Brachman, Zahra Ashktorab, et al.
PACM HCI
Tara N. Sainath, Avishy Carmi, et al.
ICASSP 2010