Speech codec optimization based on cell broadband engine
Zhenbo Zhu, Qing Wang, et al.
ICASSP 2007
To generate optimal multi-stream audio-visual speech recognition performance, appropriate dynamic weighting of each modality is desired. In this paper, we propose to estimate such weights based on a combination of acoustic signal space observations and singlemodality audio and visual speech model likelihoods. Two modeling approaches are investigated for such weight estimation: one based on a sigmoid fitting function, the other employing Gaussian mixture models. Reported experiments demonstrate that the later approach outperforms sigmoid based modeling, and is dramatically superior to the static weighting scheme. © 2007 IEEE.
Zhenbo Zhu, Qing Wang, et al.
ICASSP 2007
Iain Matthews, Gerasimos Potamianos, et al.
ICME 2001
Vadim Sheinin, Da-Ke He
ICASSP 2007
Mohamed Kamal Omar, Lidia Mangu
ICASSP 2007