Multiple reorderings in phrase-based machine translation
Niyu Ge, Abe Ittycheriah, et al.
SSST 2008
Inverse Document Frequency (IDF) is a popular measure of a word's importance. The IDF invariably appears in a host of heuristic measures used in information retrieval. However, so far the IDF has itself been a heuristic. In this paper, we show IDF to be optimal in a principled sense. We show that IDF is the optimal weight of a word with respect to minimization of a Kullback-Leibler distance suitably generalized to nonnegative functions which need not be probability distributions. This optimization problem is closely related to maximum entropy problem. We show that the IDF is the optimal weight associated with a word-feature in an information retrieval setting where we treat each document as the query that retrieves itself. That is, IDF is optimal for document self-retrieval.
Niyu Ge, Abe Ittycheriah, et al.
SSST 2008
Fei Huang, Kishore Papineni
EMNLP-CoNLL 2007
Ahmad Emami, Kishore Papineni, et al.
ICASSP 2007
Yaser Al-Onaizan, Kishore Papineni
COLING/ACL 2006