John R. Kender, Rick Kjeldsen
IEEE Transactions on Pattern Analysis and Machine Intelligence
Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can lead to consistent improvement compared to transferring only a single representation.
John R. Kender, Rick Kjeldsen
IEEE Transactions on Pattern Analysis and Machine Intelligence
Rangachari Anand, Kishan Mehrotra, et al.
IEEE Transactions on Neural Networks
Michael Heck, Masayuki Suzuki, et al.
INTERSPEECH 2017
Heshan Fernando, Lisha Chen, et al.
ICASSP 2024