Dynamic stream weight modeling for audio-visual speech recognition

Etienne Marcheret; Vit Libal; Gerasimos Potamianos

doi:10.1109/ICASSP.2007.367227

ICASSP 2007

Conference paper

06 Aug 2007

Dynamic stream weight modeling for audio-visual speech recognition

View publication

Abstract

To generate optimal multi-stream audio-visual speech recognition performance, appropriate dynamic weighting of each modality is desired. In this paper, we propose to estimate such weights based on a combination of acoustic signal space observations and singlemodality audio and visual speech model likelihoods. Two modeling approaches are investigated for such weight estimation: one based on a sigmoid fitting function, the other employing Gaussian mixture models. Reported experiments demonstrate that the later approach outperforms sigmoid based modeling, and is dramatically superior to the static weighting scheme. © 2007 IEEE.

Conference paper