Error feedback can accurately compress preconditioners
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. 2024. Error feedback can accurately compress preconditioners. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 35910–35933.
Download (ext.)
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Series Title
PMLR
Abstract
Leveraging second-order information about the loss at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to small-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via a novel and efficient error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression before it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks show that this approach can compress full-matrix preconditioners to up to 99% sparsity without accuracy loss, effectively removing the memory overhead of fullmatrix preconditioners such as GGT and M-FAC.
Publishing Year
Date Published
2024-07-30
Proceedings Title
41st International Conference on Machine Learning
Publisher
ML Research Press
Acknowledgement
The authors thank Adrian Vladu, Razvan Pascanu, Alexandra Peste, Mher Safaryan for their valuable feedback, the IT department from Institute of Science and Technology Austria for the hardware support and Weights and Biases for the infrastructure to track all our experiments.
Acknowledged SSUs
Volume
235
Page
35910-35933
Conference
ICML: International Conference on Machine Learning
Conference Location
Vienna, Austria
Conference Date
2024-07-21 – 2024-07-27
eISSN
IST-REx-ID
Cite this
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. Error feedback can accurately compress preconditioners. In: 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:35910-35933.
Modoranu, I.-V., Kalinov, A., Kurtic, E., Frantar, E., & Alistarh, D.-A. (2024). Error feedback can accurately compress preconditioners. In 41st International Conference on Machine Learning (Vol. 235, pp. 35910–35933). Vienna, Austria: ML Research Press.
Modoranu, Ionut-Vlad, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, and Dan-Adrian Alistarh. “Error Feedback Can Accurately Compress Preconditioners.” In 41st International Conference on Machine Learning, 235:35910–33. ML Research Press, 2024.
I.-V. Modoranu, A. Kalinov, E. Kurtic, E. Frantar, and D.-A. Alistarh, “Error feedback can accurately compress preconditioners,” in 41st International Conference on Machine Learning, Vienna, Austria, 2024, vol. 235, pp. 35910–35933.
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. 2024. Error feedback can accurately compress preconditioners. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 35910–35933.
Modoranu, Ionut-Vlad, et al. “Error Feedback Can Accurately Compress Preconditioners.” 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 35910–33.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level

Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2306.06098