Kernel methods match deep neural networks on TIMIT
Po-Sen Huang, Haim Avron, et al.
ICASSP 2014
We present a fast randomized least-squares solver for distributedmemory platforms. Our solver is based on the Blendenpik algorithm, but employs a batchwise randomized unitary transformation scheme. The batchwise transformation enables our algorithm to scale the distributed memory vanilla implementation of Blendenpik by up to×3 and provides up to×7.5 speedup over a state-of-the-art scalable least-squares solver based on the classic QR based algorithm. Experimental evaluations on terabyte scale matrices demonstrate excellent speedups on up to 16384 cores on a Blue Gene/Q supercomputer.
Po-Sen Huang, Haim Avron, et al.
ICASSP 2014
Haim Avron, Christos Boutsidis, et al.
ICML 2013
Jie Chen, Haim Avron, et al.
JMLR
Haim Avron, Sivan Toledo
Journal of the ACM