Extracting user profiles from large scale data
Michal Shmueli-Scheuer, Haggai Roitman, et al.
WWW 2010
We introduce static index pruning methods that significantly reduce the index size in information retrieval systems. We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose contribution to relevance scores is bounded above by a given threshold are removed from the index. In term-based pruning, the cutoff threshold is determined for each term, and thus may vary from term to term. We give experimental evidence that for each level of compression, term-based pruning outperforms uniform pruning, under various measures of precision. We present theoretical and experimental evidence that under our term-based pruning scheme, it is possible to prune the index greatly and still get retrieval results that are almost as good as those based on the full index.
Michal Shmueli-Scheuer, Haggai Roitman, et al.
WWW 2010
Ronald Fagin, Phokion G. Kolaitis, et al.
ACM TODS
David Carmel, Naama Zwerdling, et al.
CIKM 2009
Elad Yom-Tov, Shai Fine, et al.
SIGIR 2005