LDAdam: Adaptive optimization from low-dimensional gradient statistics

conference paper LDAdam: Adaptive optimization from low-dimensional gradient statistics published yes Thomas Robert author Mher Safaryan author dd546b39-0804-11ed-9c55-ef075c39778d Ionut-Vlad Modoranu author 449f7a18-f128-11eb-9611-9b430c0c6333 Dan-Adrian Alistarh author 4A899BFC-F248-11E8-B48F-1D18A9856A870000-0003-3650-940X DaAl department ICLR: International Conference on Learning Representations We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer's memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and provide empirical evidence that LDAdam allows for efficient fine-tuning and pre-training of language models. https://research-explorer.ista.ac.at/download/20034/20113/2025_ICLR_Robert.pdf application/pdfno ICLR2025Singapore, Singapore eng 13th International Conference on Learning Representations 9798331320850 2410.16103 101877-101913 https://github.com/IST-DASLab/LDAdam Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. 2025. LDAdam: Adaptive optimization from low-dimensional gradient statistics. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 101877–101913. T. Robert, M. Safaryan, I.-V. Modoranu, and D.-A. Alistarh, “LDAdam: Adaptive optimization from low-dimensional gradient statistics,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 101877–101913. Robert, Thomas, et al. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–913. Robert, Thomas, Mher Safaryan, Ionut-Vlad Modoranu, and Dan-Adrian Alistarh. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” In 13th International Conference on Learning Representations, 101877–913. ICLR, 2025. Robert, T., Safaryan, M., Modoranu, I.-V., & Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In 13th International Conference on Learning Representations (pp. 101877–101913). Singapore, Singapore: ICLR. Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: 13th International Conference on Learning Representations. ICLR; 2025:101877-101913. T. Robert, M. Safaryan, I.-V. Modoranu, D.-A. Alistarh, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–101913. 200342025-07-20T22:02:02Z2025-08-04T08:41:10Z