A statistical modeling approach to content based video retrieval
Milind R. Naphade, Sankar Basu, et al.
ICPR 2008
We present a three-step post-processing method for increasing the precision of video shot labels in the domain of television news. First, we demonstrate that news shot sequences can be characterized by rhythms of alternation (due to dialogue), repetition (due to persistent background settings), or both. Thus a temporal model is necessarily third-order Markov. Second, we demonstrate that the output of feature detectors derived from machine learning methods (in particular, from SVMs) can be converted into probabilities in a more effective way than two suggested existing methods. This is particularly true when detectors are errorful due to sparse training sets, as is common in this domain. Third, we demonstrate that a straightforward application of the Viterbi algorithm on a third-order FSM, constructed from observed transition probabilities and converted feature detector outputs, can refine feature label precision at little cost. We show that on a test corpus of TREC-VID 2005 news videos annotated with 39 LSCOM-lite features, the mean increase in the measure of Average Precision (AP) was 4%, with some of the rarer and more difficult features having relative increases in AP of as much as 67%. © 2006 IEEE.
Milind R. Naphade, Sankar Basu, et al.
ICPR 2008
Jesus J. Caban, Noah Lee, et al.
ISBI 2009
John R. Kender, Rick Kjeldsen
IEEE Transactions on Pattern Analysis and Machine Intelligence
Timo Volkmer, Apostol Natsev
ICME 2006