Apostol Natsev, Alexander Haubold, et al.
MMSP 2007
We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audiovisual streams into mid-level labels, it also models the correlations in text with probabilistic latent semantic analysis. The upper layer fuses the statistical evidence across diverse modalities with a flexible meta-mixture model that assumes loose temporal correspondence. Evaluation on a large news database shows that multi-modal clusters have better correspondence to news topics than audio-visual clusters alone; novel analysis techniques suggest that meaningful clusters occur when the prediction of salient features by the model concurs with those shown in the story clusters. © 2005 IEEE.
Apostol Natsev, Alexander Haubold, et al.
MMSP 2007
Minjeong Shin, Joohee Kim, et al.
IEEE TVCG
Mandis Beigi, Shih-Fu Chang, et al.
ICME 2009
Guangnan Ye, Dong Liu, et al.
ICCV 2013