LDAdam: Adaptive optimization from low-dimensional gradient statistics

Robert, Thomas; Safaryan, Mher; Modoranu, Ionut-Vlad; Alistarh, Dan-Adrian

LDAdam: Adaptive optimization from low-dimensional gradient statistics

Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. 2025. LDAdam: Adaptive optimization from low-dimensional gradient statistics. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 101877–101913.

Download

2025_ICLR_Robert.pdf 1.35 MB [Published Version]

Conference Paper | Published | English

Scopus indexed

Author

Robert, Thomas; Safaryan, Mher^ISTA; Modoranu, Ionut-Vlad^ISTA; Alistarh, Dan-Adrian^ISTA

Corresponding author has ISTA affiliation

Department

Alistarh Group

Abstract

We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer's memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and provide empirical evidence that LDAdam allows for efficient fine-tuning and pre-training of language models.

Publishing Year

2025

Date Published

2025-04-01

Proceedings Title

13th International Conference on Learning Representations

Publisher

ICLR

Page

101877-101913

Conference

ICLR: International Conference on Learning Representations

Conference Location

Singapore, Singapore

Conference Date

2025-04-24 – 2025-04-28

ISBN

9798331320850

IST-REx-ID

20034

Cite this

Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: 13th International Conference on Learning Representations. ICLR; 2025:101877-101913.

Robert, T., Safaryan, M., Modoranu, I.-V., & Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In 13th International Conference on Learning Representations (pp. 101877–101913). Singapore, Singapore: ICLR.

Robert, Thomas, Mher Safaryan, Ionut-Vlad Modoranu, and Dan-Adrian Alistarh. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” In 13th International Conference on Learning Representations, 101877–913. ICLR, 2025.

T. Robert, M. Safaryan, I.-V. Modoranu, and D.-A. Alistarh, “LDAdam: Adaptive optimization from low-dimensional gradient statistics,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 101877–101913.

Robert, Thomas, et al. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–913.

All files available under the following license(s):

Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):