A multi-region deep neural network model in speech recognition
Jia Cui, George Saon, et al.
INTERSPEECH 2015
Multilingual acoustic models are often used to build automatic speech recognition (ASR) systems for low-resource languages. We propose a novel data augmentation technique to improve the performance of an end-to-end (E2E) multilingual acoustic model by transliterating data into the various languages that are part of the multilingual training set. Along with two metrics for data selection, this technique can also improve recognition performance of the model on unsupervised and cross-lingual data. On a set of four low-resource languages, we show that word error rates (WER) can be reduced by up to 12% and 5% relative compared to monolingual and multilingual baselines respectively. We also demonstrate how a multilingual network constructed within this framework can be extended to a new training language. With the proposed methods, the new model has WER reductions of up to 24% and 13% respectively compared to monolingual and multilingual baselines.
Jia Cui, George Saon, et al.
INTERSPEECH 2015
Hagai Aronowitz, Itai Gat, et al.
ICASSP 2022
Andrew Rouditchenko, Angie Boggust, et al.
INTERSPEECH 2021
Vishal Sunder, Samuel Thomas, et al.
ICASSP 2022