STORY SEGMENTATION AND TOPIC DETECTION FOR RECOGNIZED SPEECH
S. Dharanipragada, Martin Franz, et al.
INTERSPEECH - Eurospeech 1999
Previous work addressing the issue of word distribution in documents has shown the importance of word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks show useful performance improvements.
S. Dharanipragada, Martin Franz, et al.
INTERSPEECH - Eurospeech 1999
S. McCarley, Martin Franz
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
Y. Al-Onaizan, R. Florian, et al.
NAACL-HLT 2003
R. Donovan, A. Ittycheriah, et al.
SSW 2001