Adaptive gradient quantization for data-parallel SGD

Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. 2020. Adaptive gradient quantization for data-parallel SGD. Advances in Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 33.

Download (ext.)
Conference Paper | Published | English
Author
Faghri, Fartash ; Tabrizian, Iman ; Markov, IliaISTA; Alistarh, Dan-AdrianISTA ; Roy, Daniel ; Ramezani-Kebrya, Ali
Department
Series Title
NeurIPS
Abstract
Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.
Publishing Year
Date Published
2020-12-10
Proceedings Title
Advances in Neural Information Processing Systems
Publisher
Neural Information Processing Systems Foundation
Acknowledgement
The authors would like to thank Blair Bilodeau, David Fleet, Mufan Li, and Jeffrey Negrea for helpful discussions. FF was supported by OGS Scholarship. DA and IM were supported the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). DMR was supported by an NSERC Discovery Grant. ARK was supported by NSERC Postdoctoral Fellowship. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.
Volume
33
Conference
NeurIPS: Neural Information Processing Systems
Conference Location
Vancouver, Canada
Conference Date
2020-12-06 – 2020-12-12
IST-REx-ID

Cite this

Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. Adaptive gradient quantization for data-parallel SGD. In: Advances in Neural Information Processing Systems. Vol 33. Neural Information Processing Systems Foundation; 2020.
Faghri, F., Tabrizian, I., Markov, I., Alistarh, D.-A., Roy, D., & Ramezani-Kebrya, A. (2020). Adaptive gradient quantization for data-parallel SGD. In Advances in Neural Information Processing Systems (Vol. 33). Vancouver, Canada: Neural Information Processing Systems Foundation.
Faghri, Fartash , Iman Tabrizian, Ilia Markov, Dan-Adrian Alistarh, Daniel Roy, and Ali Ramezani-Kebrya. “Adaptive Gradient Quantization for Data-Parallel SGD.” In Advances in Neural Information Processing Systems, Vol. 33. Neural Information Processing Systems Foundation, 2020.
F. Faghri, I. Tabrizian, I. Markov, D.-A. Alistarh, D. Roy, and A. Ramezani-Kebrya, “Adaptive gradient quantization for data-parallel SGD,” in Advances in Neural Information Processing Systems, Vancouver, Canada, 2020, vol. 33.
Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. 2020. Adaptive gradient quantization for data-parallel SGD. Advances in Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 33.
Faghri, Fartash, et al. “Adaptive Gradient Quantization for Data-Parallel SGD.” Advances in Neural Information Processing Systems, vol. 33, Neural Information Processing Systems Foundation, 2020.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2010.12460

Search this title in

Google Scholar
ISBN Search