Federated SGD with local asynchrony
Chatterjee B, Kungurtsev V, Alistarh D-A. 2024. Federated SGD with local asynchrony. Proceedings of the 44th International Conference on Distributed Computing Systems. ICDCS: International Conference on Distributed Computing Systems, 857–868.
Download
No fulltext has been uploaded. References only!
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Abstract
Parallel SGD in a shared-memory setting is oft-represented by the popular Hogwild! algorithm, in which lock-free updates are asynchronously performed by multiple computing processes. Unfortunately, scaling Hogwild! to distributed workers is largely unexplored. Specifically, it is unknown if any adaptation of Hogwild! to the popular decentralized multi-GPU setting offers any competitive speedup, either empirically or theoretically. In this work, we investigate the potential of decentralizing Hogwild! by incorporating simultaneously (a) asynchronous local gradient updates on the shared memory of GPUs, and (b) non-blocking asynchronous decentralized federated averaging. A naive direct implementation shows degradation in performance, arising from scheduling overheads and concurrent write conflicts on GPUs. To mitigate these drawbacks, we investigate and propose a new method, based on careful block selection rules, which update only portions of the parameter vectors. Our experiments show that the resulting decentralized training method exhibits improved throughput and competitive accuracy for standard image classification benchmarks on the CIFAR-10, CIFAR-100, and Imagenet datasets. On the theoretical side, we prove that our method guarantees sublinear ergodic convergence rates for non-convex objectives.
Publishing Year
Date Published
2024-07-26
Proceedings Title
Proceedings of the 44th International Conference on Distributed Computing Systems
Publisher
IEEE
Page
857-868
Conference
ICDCS: International Conference on Distributed Computing Systems
Conference Location
Jersey City, NJ, United States
Conference Date
2024-07-23 – 2024-07-26
ISBN
ISSN
eISSN
IST-REx-ID
Cite this
Chatterjee B, Kungurtsev V, Alistarh D-A. Federated SGD with local asynchrony. In: Proceedings of the 44th International Conference on Distributed Computing Systems. IEEE; 2024:857-868. doi:10.1109/ICDCS60910.2024.00084
Chatterjee, B., Kungurtsev, V., & Alistarh, D.-A. (2024). Federated SGD with local asynchrony. In Proceedings of the 44th International Conference on Distributed Computing Systems (pp. 857–868). Jersey City, NJ, United States: IEEE. https://doi.org/10.1109/ICDCS60910.2024.00084
Chatterjee, Bapi, Vyacheslav Kungurtsev, and Dan-Adrian Alistarh. “Federated SGD with Local Asynchrony.” In Proceedings of the 44th International Conference on Distributed Computing Systems, 857–68. IEEE, 2024. https://doi.org/10.1109/ICDCS60910.2024.00084.
B. Chatterjee, V. Kungurtsev, and D.-A. Alistarh, “Federated SGD with local asynchrony,” in Proceedings of the 44th International Conference on Distributed Computing Systems, Jersey City, NJ, United States, 2024, pp. 857–868.
Chatterjee B, Kungurtsev V, Alistarh D-A. 2024. Federated SGD with local asynchrony. Proceedings of the 44th International Conference on Distributed Computing Systems. ICDCS: International Conference on Distributed Computing Systems, 857–868.
Chatterjee, Bapi, et al. “Federated SGD with Local Asynchrony.” Proceedings of the 44th International Conference on Distributed Computing Systems, IEEE, 2024, pp. 857–68, doi:10.1109/ICDCS60910.2024.00084.