Adaptive parser-centric text normalization
Congle Zhang, Tyler Baldwin, et al.
ACL 2013
Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SenseSpotting, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains. © 2013 Association for Computational Linguistics.
Congle Zhang, Tyler Baldwin, et al.
ACL 2013
Bing Xiang, Xiaoqiang Luo, et al.
ACL 2013
Fei Huang, Cezar Pendus
ACL 2013
Lu Wang, Hema Raghavan, et al.
ACL 2013