Upendra Chaudhari, Hong-Kwang Jeff Kuo, et al.
INTERSPEECH 2008
Evaluation of accuracy of natural language processing (NLP) engines plays an important role in their development and improvement. Such evaluation usually takes place at a per-engine level. For example, there are evaluation methods for engines such as speech recognition, machine translation, story boundary detection, etc. Many real-world applications require combinations of these functions. This has become possible now with NLP engines attaining sufficient accuracy to be able to combine them for complex tasks. However, it is not evident how the accuracy of output of such aggregates of engines will be evaluated. We present an evaluation methodology to address this problem. The key contribution of our work is an extensible methodology that narrows down possible combinations of machine outputs and ground truths to be compared at various stages in an aggregate of interoperating engines. We also describe two example evaluation modules that we developed following this methodology. Copyright © 2008 ISCA.
Upendra Chaudhari, Hong-Kwang Jeff Kuo, et al.
INTERSPEECH 2008
Gang Wang, Fei Wang, et al.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Atsuyoshi Nakamura, Naoki Abe
Electronic Commerce Research
Kun Wang, Juwei Shi, et al.
PACT 2011