---
res:
  bibo_abstract:
  - We introduce LDAdam, a memory-efficient optimizer for training large models, that
    performs adaptive optimization steps within lower dimensional subspaces, while
    consistently exploring the full parameter space during training. This strategy
    keeps the optimizer's memory footprint to a fraction of the model size. LDAdam
    relies on a new projection-aware update rule for the optimizer states that allows
    for transitioning between subspaces, i.e., estimation of the statistics of the
    projected gradients. To mitigate the errors due to low-rank projection, LDAdam
    integrates a new generalized error feedback mechanism, which explicitly accounts
    for both gradient and optimizer state compression. We prove the convergence of
    LDAdam under standard assumptions, and provide empirical evidence that LDAdam
    allows for efficient fine-tuning and pre-training of language models.@eng
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Thomas
      foaf_name: Robert, Thomas
      foaf_surname: Robert
  - foaf_Person:
      foaf_givenName: Mher
      foaf_name: Safaryan, Mher
      foaf_surname: Safaryan
      foaf_workInfoHomepage: http://www.librecat.org/personId=dd546b39-0804-11ed-9c55-ef075c39778d
  - foaf_Person:
      foaf_givenName: Ionut-Vlad
      foaf_name: Modoranu, Ionut-Vlad
      foaf_surname: Modoranu
      foaf_workInfoHomepage: http://www.librecat.org/personId=449f7a18-f128-11eb-9611-9b430c0c6333
  - foaf_Person:
      foaf_givenName: Dan-Adrian
      foaf_name: Alistarh, Dan-Adrian
      foaf_surname: Alistarh
      foaf_workInfoHomepage: http://www.librecat.org/personId=4A899BFC-F248-11E8-B48F-1D18A9856A87
    orcid: 0000-0003-3650-940X
  dct_date: 2025^xs_gYear
  dct_isPartOf:
  - http://id.crossref.org/issn/9798331320850
  dct_language: eng
  dct_publisher: ICLR@
  dct_title: 'LDAdam: Adaptive optimization from low-dimensional gradient statistics@'
...
