Low-Resource Speech Recognition of 500-Word Vocabularies
Sabine Deligne, Ellen Eide, et al.
INTERSPEECH - Eurospeech 2001
We consider a family of Gaussian mixture models for use in HMM based speech recognition system. These "SPAM" models have state independent choices of subspaces to which the precision (inverse covariance) matrices and means are restricted to belong. They provide a flexible tool for robust, compact, and fast acoustic modeling. The focus of this paper is on the case where the means are unconstrained. The models in the case already generalize the recently introduced EMLLT models, which themselves interpolate between MLLT and full covariance models. We describe an algorithm to train both the state-dependent and state-independent parameters. Results are reported on one speech recognition task. The SPAM models are seen to yield significant improvements in accuracy over EMLLT models with comparable model size and runtime speed. We find a 10% relative reduction in error rate over an MLLT model can be obtained while decreasing the acoustic modeling time by 20%.
Sabine Deligne, Ellen Eide, et al.
INTERSPEECH - Eurospeech 2001
Jennifer C. Lai, Kwan Min Lee
ICSLP 2002
Dan Chazan, Ron Hoory, et al.
ICSLP 2002
Peder A. Olsen, Scott Axelrod, et al.
ASRU 2003