Impact of word classing on shrinkage-based language models

Ruhi Sarikaya; Stanley F. Chen; Abhinav Sethy; Bhuvana Ramabhadran

INTERSPEECH 2010

Conference paper

26 Sep 2010

Impact of word classing on shrinkage-based language models

Abstract

This paper investigates the impact of word classing on a recently proposed shrinkage-based language model, Model M[5]. Model M, a class-based n-gram model, has been shown to significantly outperform word-based n-gram models on a variety of domains. In past work, word classes for Model M were induced automatically from unlabeled text using the algorithm of [2]. We take a closer look at the classing and attempt to find out whether improved classing would also translate to improved performance. In particular, we explore the use of manually-assigned classes, part-of-speech (POS) tags, and dialog state information, considering both hard classing and soft classing. In experiments with a conversational dialog system (human-machine dialog) and a speech-to-speech translation system (human-human dialog), we find that better classing can improve Model M performance by up to 3% absolute in word-error rate. © 2010 ISCA.

Workshop paper