LDAdam: Adaptive optimization from low-dimensional gradient statistics

Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. 2025. LDAdam: Adaptive optimization from low-dimensional gradient statistics. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 101877–101913.

Download
OA 2025_ICLR_Robert.pdf 1.35 MB [Published Version]
Conference Paper | Published | English

Scopus indexed

Corresponding author has ISTA affiliation

Department
Abstract
We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer's memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and provide empirical evidence that LDAdam allows for efficient fine-tuning and pre-training of language models.
Publishing Year
Date Published
2025-04-01
Proceedings Title
13th International Conference on Learning Representations
Publisher
ICLR
Page
101877-101913
Conference
ICLR: International Conference on Learning Representations
Conference Location
Singapore, Singapore
Conference Date
2025-04-24 – 2025-04-28
IST-REx-ID

Cite this

Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: 13th International Conference on Learning Representations. ICLR; 2025:101877-101913.
Robert, T., Safaryan, M., Modoranu, I.-V., & Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In 13th International Conference on Learning Representations (pp. 101877–101913). Singapore, Singapore: ICLR.
Robert, Thomas, Mher Safaryan, Ionut-Vlad Modoranu, and Dan-Adrian Alistarh. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” In 13th International Conference on Learning Representations, 101877–913. ICLR, 2025.
T. Robert, M. Safaryan, I.-V. Modoranu, and D.-A. Alistarh, “LDAdam: Adaptive optimization from low-dimensional gradient statistics,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 101877–101913.
Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. 2025. LDAdam: Adaptive optimization from low-dimensional gradient statistics. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 101877–101913.
Robert, Thomas, et al. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–913.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
Access Level
OA Open Access
Date Uploaded
2025-08-04
MD5 Checksum
9327d82569358d7bf1c3ec1a9952e721


Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2410.16103

Search this title in

Google Scholar
ISBN Search