Adaptive gradient quantization for data-parallel SGD

Faghri, Fartash; Tabrizian, Iman; Markov, Ilia; Alistarh, Dan-Adrian; Roy, Daniel; Ramezani-Kebrya, Ali

Adaptive gradient quantization for data-parallel SGD

Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. 2020. Adaptive gradient quantization for data-parallel SGD. Advances in Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 33.

Download (ext.)

https://doi.org/10.48550/arXiv.2010.12460 [Preprint]

Conference Paper | Published | English

Author

Faghri, Fartash ; Tabrizian, Iman ; Markov, Ilia^ISTA; Alistarh, Dan-Adrian^ISTA ; Roy, Daniel ; Ramezani-Kebrya, Ali

Department

Alistarh Group

Grant

Elastic Coordination for Scalable Machine Learning

Series Title

NeurIPS

Abstract

Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.

Publishing Year

2020

Date Published

2020-12-10

Proceedings Title

Advances in Neural Information Processing Systems

Publisher

Neural Information Processing Systems Foundation

Acknowledgement

The authors would like to thank Blair Bilodeau, David Fleet, Mufan Li, and Jeffrey Negrea for helpful discussions. FF was supported by OGS Scholarship. DA and IM were supported the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). DMR was supported by an NSERC Discovery Grant. ARK was supported by NSERC Postdoctoral Fellowship. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Volume

Conference

NeurIPS: Neural Information Processing Systems

Conference Location

Vancouver, Canada

Conference Date

2020-12-06 – 2020-12-12

ISBN

9781713829546

IST-REx-ID

15086

Cite this

Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. Adaptive gradient quantization for data-parallel SGD. In: Advances in Neural Information Processing Systems. Vol 33. Neural Information Processing Systems Foundation; 2020.

Faghri, F., Tabrizian, I., Markov, I., Alistarh, D.-A., Roy, D., & Ramezani-Kebrya, A. (2020). Adaptive gradient quantization for data-parallel SGD. In Advances in Neural Information Processing Systems (Vol. 33). Vancouver, Canada: Neural Information Processing Systems Foundation.

Faghri, Fartash , Iman Tabrizian, Ilia Markov, Dan-Adrian Alistarh, Daniel Roy, and Ali Ramezani-Kebrya. “Adaptive Gradient Quantization for Data-Parallel SGD.” In Advances in Neural Information Processing Systems, Vol. 33. Neural Information Processing Systems Foundation, 2020.

F. Faghri, I. Tabrizian, I. Markov, D.-A. Alistarh, D. Roy, and A. Ramezani-Kebrya, “Adaptive gradient quantization for data-parallel SGD,” in Advances in Neural Information Processing Systems, Vancouver, Canada, 2020, vol. 33.

Faghri, Fartash, et al. “Adaptive Gradient Quantization for Data-Parallel SGD.” Advances in Neural Information Processing Systems, vol. 33, Neural Information Processing Systems Foundation, 2020.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

https://doi.org/10.48550/arXiv.2010.12460

Access Level

Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2010.12460

Search this title in

Google Scholar
ISBN Search