Classifying words for improved statistical language models
F. Jelinek, R.L. Mercer, et al.
ICASSP 1990
We present an algorithm to adapt a n-gram language model to a document as it is dictated. The observed partial document is used to estimate a unigram distribution for the words that already occurred. Then, we find the closest ngram distribution to the static n-gram distribution (using the discrimination information distance measure) and that satisfies the marginal constraints derived from the document. The resulting minimum discrimination information model results in a perplexity of 208 instead of 290 for the static trigram model on a document of 321 words.
F. Jelinek, R.L. Mercer, et al.
ICASSP 1990
L.R. Bahl, R. Bakis, et al.
ICASSP 1989
Peter F. Brown, S. Chen, et al.
Computer Speech and Language
L.R. Bahl, S. De Gennaro, et al.
INTERSPEECH - Eurospeech 1989