Statistical methods for topic segmentation
S. Dharanipragada, Martin Franz, et al.
ICSLP 2000
Previous work addressing the issue of word distribution in documents has shown the importance of word repetitiveness as an indicator of the word content-bearing characteristics. In this paper we propose a simple method using a measure of the tendency of words to repeat within a document to separate the words with similar document frequencies, but different topic discriminating characteristics. We describe the application of the new measure in query-document relevance scoring. Experiments on the TREC Ad Hoc and Spoken Document Retrieval tasks show useful performance improvements.
S. Dharanipragada, Martin Franz, et al.
ICSLP 2000
Martin Franz, T.J.C. Ward, et al.
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
S. McCarley, Martin Franz
SIGIR Forum (ACM Special Interest Group on Information Retrieval)
S. McCarley, Martin Franz
SIGIR Forum (ACM Special Interest Group on Information Retrieval)