Adaptive gradient quantization for data-parallel SGD
Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. 2020. Adaptive gradient quantization for data-parallel SGD. Advances in Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 33.
Download (ext.)
https://doi.org/10.48550/arXiv.2010.12460
[Preprint]
Conference Paper
| Published
| English
Author
Faghri, Fartash ;
Tabrizian, Iman ;
Markov, IliaISTA;
Alistarh, Dan-AdrianISTA ;
Roy, Daniel ;
Ramezani-Kebrya, Ali
Department
Series Title
NeurIPS
Abstract
Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.
Publishing Year
Date Published
2020-12-10
Proceedings Title
Advances in Neural Information Processing Systems
Publisher
Neural Information Processing Systems Foundation
Acknowledgement
The authors would like to thank Blair Bilodeau, David Fleet, Mufan Li, and Jeffrey Negrea for
helpful discussions. FF was supported by OGS Scholarship. DA and IM were supported the
European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation
programme (grant agreement No 805223 ScaleML). DMR was supported by an NSERC Discovery
Grant. ARK was supported by NSERC Postdoctoral Fellowship. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.
Volume
33
Conference
NeurIPS: Neural Information Processing Systems
Conference Location
Vancouver, Canada
Conference Date
2020-12-06 – 2020-12-12
ISBN
IST-REx-ID
Cite this
Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. Adaptive gradient quantization for data-parallel SGD. In: Advances in Neural Information Processing Systems. Vol 33. Neural Information Processing Systems Foundation; 2020.
Faghri, F., Tabrizian, I., Markov, I., Alistarh, D.-A., Roy, D., & Ramezani-Kebrya, A. (2020). Adaptive gradient quantization for data-parallel SGD. In Advances in Neural Information Processing Systems (Vol. 33). Vancouver, Canada: Neural Information Processing Systems Foundation.
Faghri, Fartash , Iman Tabrizian, Ilia Markov, Dan-Adrian Alistarh, Daniel Roy, and Ali Ramezani-Kebrya. “Adaptive Gradient Quantization for Data-Parallel SGD.” In Advances in Neural Information Processing Systems, Vol. 33. Neural Information Processing Systems Foundation, 2020.
F. Faghri, I. Tabrizian, I. Markov, D.-A. Alistarh, D. Roy, and A. Ramezani-Kebrya, “Adaptive gradient quantization for data-parallel SGD,” in Advances in Neural Information Processing Systems, Vancouver, Canada, 2020, vol. 33.
Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. 2020. Adaptive gradient quantization for data-parallel SGD. Advances in Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 33.
Faghri, Fartash, et al. “Adaptive Gradient Quantization for Data-Parallel SGD.” Advances in Neural Information Processing Systems, vol. 33, Neural Information Processing Systems Foundation, 2020.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Open Access
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2010.12460