Bilevel Joint Unsupervised and Supervised Training for Automatic Speech RecognitionXiaodong CuiA.F.M. Saifet al.2024IEEE/ACM TASLP
Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word DictionariesAshish MittalSunita Sarawagiet al.2023EMNLP 2023
Improving RNN Transducer Acoustic Models for English Conversational Speech RecognitionXiaodong CuiGeorge Saonet al.2023INTERSPEECH 2023
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech RecognitionSamuel ThomasHong-Kwang J. Kuoet al.2023ICASSP 2023
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR SystemsTakuma UdagawaMasayuki Suzukiet al.2022INTERSPEECH 2022
VQ-T: RNN Transducers using Vector-Quantized Prediction Network StatesJiatong ShiGeorge Saonet al.2022INTERSPEECH 2022
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label SmoothingXiaodong CuiGeorge Saonet al.2022INTERSPEECH 2022
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit QuantizationAndrea FasoliChia-Yu Chenet al.2022INTERSPEECH 2022
Global RNN Transducer Models For Multi-dialect Speech RecognitionTakashi FukudaSamuel Thomaset al.2022INTERSPEECH 2022
Extending RNN-T-based speech recognition systems with emotion and language classificationZvi KonsHagai Aronowitzet al.2022INTERSPEECH 2022