Inducing and exploiting activation sparsity for fast neural network inference

Kurtz, Mark; Kopinsky, Justin; Gelashvili, Rati; Matveev, Alexander; Carr, John; Goin, Michael; Leiserson, William; Moore, Sage; Nell, Bill; Shavit, Nir; Alistarh, Dan-Adrian

Inducing and exploiting activation sparsity for fast neural network inference

Kurtz M, Kopinsky J, Gelashvili R, Matveev A, Carr J, Goin M, Leiserson W, Moore S, Nell B, Shavit N, Alistarh D-A. 2020. Inducing and exploiting activation sparsity for fast neural network inference. 37th International Conference on Machine Learning, ICML 2020. ICML: International Conference on Machine Learning vol. 119, 5533–5543.

Download

2020_PMLR_Kurtz.pdf 741.90 KB [Published Version]

Conference Paper | English

Scopus indexed

Author

Kurtz, Mark; Kopinsky, Justin; Gelashvili, Rati; Matveev, Alexander; Carr, John; Goin, Michael; Leiserson, William; Moore, Sage; Nell, Bill; Shavit, Nir; Alistarh, Dan-Adrian^ISTA

Department

Alistarh Group

Abstract

Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.

Publishing Year

2020

Date Published

2020-07-12

Proceedings Title

37th International Conference on Machine Learning, ICML 2020

Volume

119

Page

5533-5543

Conference

ICML: International Conference on Machine Learning

Conference Location

Online

Conference Date

2020-07-12 – 2020-07-18

ISSN

2640-3498

IST-REx-ID

9415

Cite this

Kurtz M, Kopinsky J, Gelashvili R, et al. Inducing and exploiting activation sparsity for fast neural network inference. In: 37th International Conference on Machine Learning, ICML 2020. Vol 119. ; 2020:5533-5543.

Kurtz, M., Kopinsky, J., Gelashvili, R., Matveev, A., Carr, J., Goin, M., … Alistarh, D.-A. (2020). Inducing and exploiting activation sparsity for fast neural network inference. In 37th International Conference on Machine Learning, ICML 2020 (Vol. 119, pp. 5533–5543). Online.

Kurtz, Mark, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” In 37th International Conference on Machine Learning, ICML 2020, 119:5533–43, 2020.

M. Kurtz et al., “Inducing and exploiting activation sparsity for fast neural network inference,” in 37th International Conference on Machine Learning, ICML 2020, Online, 2020, vol. 119, pp. 5533–5543.

Kurtz, Mark, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” 37th International Conference on Machine Learning, ICML 2020, vol. 119, 2020, pp. 5533–43.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Main File(s)

File Name

2020_PMLR_Kurtz.pdf 741.90 KB

Access Level

Open Access

Date Uploaded

2021-05-25

MD5 Checksum

2aaaa7d7226e49161311d91627cf783b

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar