Minimizing weighted ℓp-norm of flow-time in the rejection model
Anamitra R. Choudhury, Syamantak Das, et al.
FSTTCS 2015
Large number of weights in deep neural networks make the models difficult to be deployed in low memory environments such as, mobile phones, IOT edge devices as well as 'inferencing as a service' environments on the cloud. Prior work has considered reduction in the size of the models, through compression techniques like weight pruning, filter pruning, etc. or through low-rank decomposition of the convolution layers. In this paper, we demonstrate the use of multiple techniques to achieve not only higher model compression but also reduce the compute resources required during inferencing. We do filter pruning followed by low-rank decomposition using Tucker decomposition for model compression. We show that our approach achieves up to 57% higher model compression when compared to either Tucker Decomposition or Filter pruning alone at similar accuracy for GoogleNet. Also, it reduces the Flops by up to 48% thereby making the inferencing faster.
Anamitra R. Choudhury, Syamantak Das, et al.
FSTTCS 2015
Venkatesan T. Chakaravarthy, Anamitra R. Choudhury, et al.
Discrete Optimization
Anamitra R. Choudhury
e-Energy 2017
Subendhu Rongali, Anamitra R. Choudhury, et al.
ISGT ASIA 2015