mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Andrew Rouditchenko; Samuel Thomas; Hilde Kuehne; Rogerio Feris; James Glass

doi:10.1109/LSP.2025.3569210

IEEE SPL

Paper

01 Jan 2025

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

View publication

Abstract

Audio-Visual Speech Recognition (AVSR) combines lip-based video with audio and can improve performance in noise, but most methods are trained only on English data. One limitation is the lack of large-scale multilingual video data, which makes it hard to train models from scratch. In this work, we propose mWhisper-Flamingo for multilingual AVSR which combines the strengths of a pre-trained audio model (Whisper) and video model (AV-HuBERT). To enable better multi-modal integration and improve the noisy multilingual performance, we introduce decoder modality dropout where the model is trained both on paired audio-visual inputs and separate audio/visual inputs. mWhisper-Flamingo achieves state-of-the-art WER on MuAViC, an AVSR dataset of 9 languages. Audio-visual mWhisper-Flamingo consistently outperforms audio-only Whisper on all languages in noisy conditions.

Paper

Bonsai trees, or how to delegate a lattice basis

David Cash, Dennis Hofheinz, et al.

Journal of Cryptology

Conference paper

Changes of T_c under epitaxial strain: Implications for the mechanism of superconductivity

J.P. Locquet, J. Perret, et al.

SPIE Optical Science, Engineering, and Instrumentation 1998

Paper

A Formal Treatment of Non-repudiation Protocols

Satoshi Hada

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

Conference paper

Growth and transport properties of multilayer superconducting films of Nd_1.83Ce_0.17CuO_x / YBa₂Cu₃O_7-δ

A. Gupta, R. Gross, et al.

SPIE Advances in Semiconductors and Superconductors 1990

View all publications

Abstract

Related

Bonsai trees, or how to delegate a lattice basis

Changes of Tc under epitaxial strain: Implications for the mechanism of superconductivity

A Formal Treatment of Non-repudiation Protocols

Growth and transport properties of multilayer superconducting films of Nd1.83Ce0.17CuOx / YBa2Cu3O7-δ

Changes of T_c under epitaxial strain: Implications for the mechanism of superconductivity

Growth and transport properties of multilayer superconducting films of Nd_1.83Ce_0.17CuO_x / YBa₂Cu₃O_7-δ