Stable Tensor Neural Networks for Rapid Deep Learning

Elizabeth Newman; Lior Horesh; Haim Avron; Misha Kilmer

doi:10.48550/arXiv.1811.06569

arXiv

Paper

15 Nov 2018

Stable Tensor Neural Networks for Rapid Deep Learning

Download paper

Abstract

We propose a tensor neural network (t-NN) framework that offers an exciting new paradigm for designing neural networks with multidimensional (tensor) data. Our network architecture is based on the t-product (Kilmer and Martin, 2011), an algebraic formulation to multiply tensors via circulant convolution. In this t-product algebra, we interpret tensors as t-linear operators analogous to matrices as linear operators, and hence our framework inherits mimetic matrix properties. To exemplify the elegant, matrix-mimetic algebraic structure of our t-NNs, we expand on recent work (Haber and Ruthotto, 2017) which interprets deep neural networks as discretizations of non-linear differential equations and introduces stable neural networks which promote superior generalization. Motivated by this dynamic framework, we introduce a stable t-NN which facilitates more rapid learning because of its reduced, more powerful parameterization. Through our high-dimensional design, we create a more compact parameter space and extract multidimensional correlations otherwise latent in traditional algorithms. We further generalize our t-NN framework to a family of tensor-tensor products (Kernfeld, Kilmer, and Aeron, 2015) which still induce a matrix-mimetic algebraic structure. Through numerical experiments on the MNIST and CIFAR-10 datasets, we demonstrate the more powerful parameterizations and improved generalizability of stable t-NNs

Conference paper