Communication-efficient distributed training of deep neural networks: An algorithms and systems perspective

Markov I. 2024. Communication-efficient distributed training of deep neural networks: An algorithms and systems perspective. Institute of Science and Technology Austria.

Download
OA Thesis_final_version_pdfa2.pdf 2.76 MB [Published Version]

Thesis | PhD | Published | English
Series Title
ISTA Thesis
Abstract
Deep learning is essential in numerous applications nowadays, with many recent advancements made possible by training very large models. Despite their broad applicability, training neural networks is often time-intensive, and it is usually impractical to manage large models and datasets on a single machine. To address these issues, distributed deep learning training has become increasingly important. However, distributed training requires synchronization among nodes, and the mini-batch stochastic gradient descent algorithm places a significant load on network connections. A possible solution to tackle the synchronization bottleneck is to reduce a message size by lossy compression. In this thesis, we investigate systems and algorithmic approaches to communication compression during training. From the systems perspective, we demonstrate that a common approach of expensive hardware overprovisioning can be replaced through a thorough system design. We introduce a framework that introduces efficient software support for compressed communication in machine learning applications, applicable to both multi-GPU single-node training and larger-scale multi-node training. Our framework integrates with popular ML frameworks, providing up to 3x speedups for multi-GPU nodes based on commodity hardware and order-of-magnitude improvements in the multi-node setting, with negligible impact on accuracy. Also, we consider an application of our framework to different communication schemes, such as Fully Sharded Data Parallel. We provide strong convergence guarantees for the compression in such a setup. Empirical validation shows that our method preserves model accuracy for GPT-family models with up to 1.3 billion parameters, while completely removing the communication bottlenecks of non-compressed alternatives, providing up to 2.2x speedups end-to-end. From the algorithmic side, we propose a general framework that dynamically adjusts the degree of compression across a model's layers during training. This approach enhances overall compression and results in significant speedups without compromising accuracy. Our algorithm utilizes an adaptive algorithm that automatically selects the optimal compression parameters for model layers, ensuring the best compression ratio while adhering to an error constraint. Our method is effective across all existing families of compression methods. It achieves up to 2.5x faster training and up to a 5x improvement in compression compared to efficient implementations of current approaches. Additionally, LGreCo can complement existing adaptive algorithms.
Publishing Year
Date Published
2024-09-04
Acknowledged SSUs
Page
102
ISSN
IST-REx-ID

Cite this

Markov I. Communication-efficient distributed training of deep neural networks: An algorithms and systems perspective. 2024. doi:10.15479/at:ista:17490
Markov, I. (2024). Communication-efficient distributed training of deep neural networks: An algorithms and systems perspective. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17490
Markov, Ilia. “Communication-Efficient Distributed Training of Deep Neural Networks: An Algorithms and Systems Perspective.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17490.
I. Markov, “Communication-efficient distributed training of deep neural networks: An algorithms and systems perspective,” Institute of Science and Technology Austria, 2024.
Markov I. 2024. Communication-efficient distributed training of deep neural networks: An algorithms and systems perspective. Institute of Science and Technology Austria.
Markov, Ilia. Communication-Efficient Distributed Training of Deep Neural Networks: An Algorithms and Systems Perspective. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17490.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0):
Main File(s)
Access Level
OA Open Access
Date Uploaded
2024-09-04
MD5 Checksum
9e68f7217570f756ceb8f70b980938cd

Source File
File Name
Thesis.zip 43.33 MB
Access Level
Restricted Closed Access
Date Uploaded
2024-09-04
MD5 Checksum
77609f4835d2730e46fa0d42d9134ed9

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar