Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging
Li S, Tal Ben-Nun TB-N, Nadiradze G, Girolamo SD, Dryden N, Alistarh D-A, Hoefler T. 2021. Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. 32(7), 9271898.
Download (ext.)
https://arxiv.org/abs/2005.00124
[Preprint]
Journal Article
| Published
| English
Scopus indexed
Author
Li, Shigang;
Tal Ben-Nun, Tal Ben-Nun;
Nadiradze, GiorgiISTA;
Girolamo, Salvatore Di;
Dryden, Nikoli;
Alistarh, Dan-AdrianISTA ;
Hoefler, Torsten
Department
Abstract
Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates similar to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput (e.g., 2.1× on 1,024 GPUs for reinforcement learning), and achieves the fastest time-to-solution (e.g., the highest score using the shortest training time for Transformer).
Publishing Year
Date Published
2021-07-01
Journal Title
IEEE Transactions on Parallel and Distributed Systems
Acknowledgement
This project has received funding from the European Research Council (ERC) under the European Union’s Hori-
zon 2020 programme under Grant DAPP, Grant 678880; EPi-GRAM-HS, Grant 801039; and ERC Starting Grant ScaleML, Grant 805223. The work of Tal Ben-Nun is supported by the Swiss National Science Foundation (Ambizione Project No. 185778). The work of Nikoli Dryden is supported by the ETH Postdoctoral Fellowship. The authors would like to thank the Swiss National Supercomputing Center for providing the computing resources and technical support.
Volume
32
Issue
7
Article Number
9271898
ISSN
IST-REx-ID
Cite this
Li S, Tal Ben-Nun TB-N, Nadiradze G, et al. Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. 2021;32(7). doi:10.1109/TPDS.2020.3040606
Li, S., Tal Ben-Nun, T. B.-N., Nadiradze, G., Girolamo, S. D., Dryden, N., Alistarh, D.-A., & Hoefler, T. (2021). Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. IEEE. https://doi.org/10.1109/TPDS.2020.3040606
Li, Shigang, Tal Ben-Nun Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo, Nikoli Dryden, Dan-Adrian Alistarh, and Torsten Hoefler. “Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.” IEEE Transactions on Parallel and Distributed Systems. IEEE, 2021. https://doi.org/10.1109/TPDS.2020.3040606.
S. Li et al., “Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7. IEEE, 2021.
Li S, Tal Ben-Nun TB-N, Nadiradze G, Girolamo SD, Dryden N, Alistarh D-A, Hoefler T. 2021. Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. 32(7), 9271898.
Li, Shigang, et al. “Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, 9271898, IEEE, 2021, doi:10.1109/TPDS.2020.3040606.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Open Access
Export
Marked PublicationsOpen Data ISTA Research Explorer
Web of Science
View record in Web of Science®Sources
arXiv 2005.00124