Semantic indexing of multimedia using audio, text and visual cues

G. Iyengar; H.J. Nock; C. Neti; Martin Franz

doi:10.1109/ICME.2002.1035607

ICME 2002

Conference paper

26 Aug 2002

Semantic indexing of multimedia using audio, text and visual cues

View publication

Abstract

We describe methods for automatic labeling of high-level semantic concepts in documentary style videos. The emphasis of this paper is on audio processing and on fusing information from multiple modalities. The work described represents initial work towards a trainable system that acquires a collection of generic "intermediate" semantic concepts across modalities (such as audio, video, text) and combines information from these modalities for automatic labeling of a "high-level" concept. Initial results suggest that multi-modal fusion achieves a 12.5% relative improvement over the best unimodal model.

Conference paper