Apostol Natsev, Milind R. Naphade, et al.
MM 2005
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Only model-based linear transforms are considered, since, for linear transforms, they subsume the appropriate feature-space transforms. The paper compares the two possible forms of model-based transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform. Re-estimation formulae for all appropriate cases of transform are given. This includes a new and efficient full variance transform and the extension of the constrained model-space transform from the simple diagonal case to the full or block-diagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two model-space transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained model-space transform for speaker adaptive training are detailed. © 1998 Academic Press Limited.
Apostol Natsev, Milind R. Naphade, et al.
MM 2005
Shumin Zhai, Per-Ola Kristensson
CHI 2003
Sugato Bagchi, Eugene Hung, et al.
ValueTools 2006
David Piorkowski, Inge Vejsbjerg, et al.
PACM HCI