Cascaded multilingual audio-visual learning from videos
Andrew Rouditchenko, Angie Boggust, et al.
INTERSPEECH 2021
Style Transfer aims at transferring the artistic style from a reference image to a content image. While Deep Learning (DL) has achieved state-of-The-Art Style Transfer performance using Convolutional Neural Networks (CNN), its real-Time application still requires powerful hardware such as GPU-Accelerated systems. This paper leverages transformer-based models to accelerate real-Time Style Transfer on mobile and embedded hardware platforms. We designed a Neural Architecture Search (NAS) algorithm dedicated to vision transformers to find the best set of architecture hyperparameters that maximizes the Style Transfer performance, expressed in Frame/seconds (FPS). Our approach has been evaluated and validated on the Xiaomi Redmi 7 mobile phone and the Raspberry Pi 3 platform. Experimental evaluation shows that our approach allows to achieve a 3.5x and 2.1x speedups compared to CNN-based Style Transfer models and Transformer-based models respectively1.
Andrew Rouditchenko, Angie Boggust, et al.
INTERSPEECH 2021
Saiteja Utpala, Alex Gu, et al.
NAACL 2024
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Gabriele Picco, Lam Thanh Hoang, et al.
EMNLP 2021