Association control in mobile wireless networks
Minkyong Kim, Zhen Liu, et al.
INFOCOM 2008
Adaptation to a new speaker or environment is becoming very important as speech recognition systems are deployed in unpredictable real world situations. Constrained or Feature space Maximum Likelihood Regression (fMLLR) [1] has proved to be especially effective for this purpose, particularly when used for incremental unsupervised adaptation [2]. Unfortunately the standard implementation described in [1] and used by most authors since, requires statistics that require O(n3) operations to collect per frame. In addition the statistics require O(n3) space for storage and the estimation of the feature transform matrix requires O(n4) operations. This is an unacceptable cost for most embedded speech recognition systems. In this paper we show the fMLLR objective function can be optimized using stochastic gradient descent in a way that achieves almost the same results as the standard implementation. All this is accomplished with an algorithm that requires only O(n2) operations per frame and O(n2) storage requirements. This order of magnitude savings allows continuous adaptation to be implemented in most resource constrained embedded speech recognition applications.
Minkyong Kim, Zhen Liu, et al.
INFOCOM 2008
Daniel M. Bikel, Vittorio Castelli
ACL 2008
Nanda Kambhatla
ACL 2004
Stanley F. Chen
INTERSPEECH - Eurospeech 2003