Inducing and exploiting activation sparsity for fast neural network inference

Kurtz M, Kopinsky J, Gelashvili R, Matveev A, Carr J, Goin M, Leiserson W, Moore S, Nell B, Shavit N, Alistarh D-A. 2020. Inducing and exploiting activation sparsity for fast neural network inference. 37th International Conference on Machine Learning, ICML 2020. ICML: International Conference on Machine Learning vol. 119, 5533–5543.

Download
OA 2020_PMLR_Kurtz.pdf 741.90 KB
Conference Paper | English

Scopus indexed
Author
Kurtz, Mark; Kopinsky, Justin; Gelashvili, Rati; Matveev, Alexander; Carr, John; Goin, Michael; Leiserson, William; Moore, Sage; Nell, Bill; Shavit, Nir; Alistarh, Dan-AdrianISTA
Department
Abstract
Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.
Publishing Year
Date Published
2020-07-12
Proceedings Title
37th International Conference on Machine Learning, ICML 2020
Volume
119
Page
5533-5543
Conference
ICML: International Conference on Machine Learning
Conference Location
Online
Conference Date
2020-07-12 – 2020-07-18
ISSN
IST-REx-ID

Cite this

Kurtz M, Kopinsky J, Gelashvili R, et al. Inducing and exploiting activation sparsity for fast neural network inference. In: 37th International Conference on Machine Learning, ICML 2020. Vol 119. ; 2020:5533-5543.
Kurtz, M., Kopinsky, J., Gelashvili, R., Matveev, A., Carr, J., Goin, M., … Alistarh, D.-A. (2020). Inducing and exploiting activation sparsity for fast neural network inference. In 37th International Conference on Machine Learning, ICML 2020 (Vol. 119, pp. 5533–5543). Online.
Kurtz, Mark, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” In 37th International Conference on Machine Learning, ICML 2020, 119:5533–43, 2020.
M. Kurtz et al., “Inducing and exploiting activation sparsity for fast neural network inference,” in 37th International Conference on Machine Learning, ICML 2020, Online, 2020, vol. 119, pp. 5533–5543.
Kurtz M, Kopinsky J, Gelashvili R, Matveev A, Carr J, Goin M, Leiserson W, Moore S, Nell B, Shavit N, Alistarh D-A. 2020. Inducing and exploiting activation sparsity for fast neural network inference. 37th International Conference on Machine Learning, ICML 2020. ICML: International Conference on Machine Learning vol. 119, 5533–5543.
Kurtz, Mark, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” 37th International Conference on Machine Learning, ICML 2020, vol. 119, 2020, pp. 5533–43.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2021-05-25
MD5 Checksum
2aaaa7d7226e49161311d91627cf783b


Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar