Agnostic active learning without constraints
Alina Beygelzimer, Daniel Hsu, et al.
NeurIPS 2010
This paper presents a novel online relevant set algorithm for a linearly scored block sequence translation model. The key component is a new procedure to directly optimize the global scoring function used by a statistical machine translation (SMT) decoder. This training procedure treats the decoder as a black-box, and thus can be used to optimize any decoding scheme. The novel algorithm is evaluated using different feature types: 1) commonly used probabilistic features, such as translation, language, or distortion model probabilities, and 2) binary features. In particular, encouraging results on a standard Arabic-English translation task are presented for a translation system that uses only binary feature functions. To further demonstrate the effectiveness of the novel training algorithm, a detailed comparison with the widely used minimum-error-rate (MER) training algorithm is presented using the same decoder and feature set. The online algorithm is simplified by introducing so-called "seed" block sequences which enable the training to be carried out without a gold standard block translation. While the online training algorithm is extremely fast, it also improves translation scores over the MER algorithm in some experiments. © 2008 IEEE.
Alina Beygelzimer, Daniel Hsu, et al.
NeurIPS 2010
Christoph Tillmann, Hermann Ney
Computational Linguistics
Fred J. Damerau, Tong Zhang, et al.
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
Christoph Tillmann
NAACL-HLT 2004