{"publication":"Proceedings of the 44th International Conference on Distributed Computing Systems","publication_status":"published","author":[{"last_name":"Chatterjee","orcid":"0000-0002-2742-4028","id":"3C41A08A-F248-11E8-B48F-1D18A9856A87","first_name":"Bapi","full_name":"Chatterjee, Bapi"},{"last_name":"Kungurtsev","full_name":"Kungurtsev, Vyacheslav","first_name":"Vyacheslav"},{"orcid":"0000-0003-3650-940X","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"oa_version":"None","scopus_import":"1","date_created":"2024-09-15T22:01:41Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","citation":{"ieee":"B. Chatterjee, V. Kungurtsev, and D.-A. Alistarh, “Federated SGD with local asynchrony,” in Proceedings of the 44th International Conference on Distributed Computing Systems, Jersey City, NJ, United States, 2024, pp. 857–868.","ista":"Chatterjee B, Kungurtsev V, Alistarh D-A. 2024. Federated SGD with local asynchrony. Proceedings of the 44th International Conference on Distributed Computing Systems. ICDCS: International Conference on Distributed Computing Systems, 857–868.","short":"B. Chatterjee, V. Kungurtsev, D.-A. Alistarh, in:, Proceedings of the 44th International Conference on Distributed Computing Systems, IEEE, 2024, pp. 857–868.","mla":"Chatterjee, Bapi, et al. “Federated SGD with Local Asynchrony.” Proceedings of the 44th International Conference on Distributed Computing Systems, IEEE, 2024, pp. 857–68, doi:10.1109/ICDCS60910.2024.00084.","apa":"Chatterjee, B., Kungurtsev, V., & Alistarh, D.-A. (2024). Federated SGD with local asynchrony. In Proceedings of the 44th International Conference on Distributed Computing Systems (pp. 857–868). Jersey City, NJ, United States: IEEE. https://doi.org/10.1109/ICDCS60910.2024.00084","ama":"Chatterjee B, Kungurtsev V, Alistarh D-A. Federated SGD with local asynchrony. In: Proceedings of the 44th International Conference on Distributed Computing Systems. IEEE; 2024:857-868. doi:10.1109/ICDCS60910.2024.00084","chicago":"Chatterjee, Bapi, Vyacheslav Kungurtsev, and Dan-Adrian Alistarh. “Federated SGD with Local Asynchrony.” In Proceedings of the 44th International Conference on Distributed Computing Systems, 857–68. IEEE, 2024. https://doi.org/10.1109/ICDCS60910.2024.00084."},"_id":"18070","department":[{"_id":"DaAl"}],"quality_controlled":"1","day":"26","doi":"10.1109/ICDCS60910.2024.00084","publication_identifier":{"issn":["1063-6927"],"isbn":["9798350386059"],"eissn":["2575-8411"]},"corr_author":"1","language":[{"iso":"eng"}],"type":"conference","date_updated":"2024-09-17T07:05:50Z","abstract":[{"lang":"eng","text":"Parallel SGD in a shared-memory setting is oft-represented by the popular Hogwild! algorithm, in which lock-free updates are asynchronously performed by multiple computing processes. Unfortunately, scaling Hogwild! to distributed workers is largely unexplored. Specifically, it is unknown if any adaptation of Hogwild! to the popular decentralized multi-GPU setting offers any competitive speedup, either empirically or theoretically. In this work, we investigate the potential of decentralizing Hogwild! by incorporating simultaneously (a) asynchronous local gradient updates on the shared memory of GPUs, and (b) non-blocking asynchronous decentralized federated averaging. A naive direct implementation shows degradation in performance, arising from scheduling overheads and concurrent write conflicts on GPUs. To mitigate these drawbacks, we investigate and propose a new method, based on careful block selection rules, which update only portions of the parameter vectors. Our experiments show that the resulting decentralized training method exhibits improved throughput and competitive accuracy for standard image classification benchmarks on the CIFAR-10, CIFAR-100, and Imagenet datasets. On the theoretical side, we prove that our method guarantees sublinear ergodic convergence rates for non-convex objectives."}],"publisher":"IEEE","date_published":"2024-07-26T00:00:00Z","conference":{"end_date":"2024-07-26","location":"Jersey City, NJ, United States","name":"ICDCS: International Conference on Distributed Computing Systems","start_date":"2024-07-23"},"year":"2024","status":"public","article_processing_charge":"No","month":"07","title":"Federated SGD with local asynchrony","page":"857-868"}