<?xml version="1.0" encoding="UTF-8"?>

<modsCollection xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods version="3.3">

<genre>conference paper</genre>

<titleInfo><title>LDAdam: Adaptive optimization from low-dimensional gradient statistics</title></titleInfo>


<note type="publicationStatus">published</note>


<note type="qualityControlled">yes</note>

<name type="personal">
  <namePart type="given">Thomas</namePart>
  <namePart type="family">Robert</namePart>
  <role><roleTerm type="text">author</roleTerm> </role></name>
<name type="personal">
  <namePart type="given">Mher</namePart>
  <namePart type="family">Safaryan</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">dd546b39-0804-11ed-9c55-ef075c39778d</identifier></name>
<name type="personal">
  <namePart type="given">Ionut-Vlad</namePart>
  <namePart type="family">Modoranu</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">449f7a18-f128-11eb-9611-9b430c0c6333</identifier></name>
<name type="personal">
  <namePart type="given">Dan-Adrian</namePart>
  <namePart type="family">Alistarh</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">4A899BFC-F248-11E8-B48F-1D18A9856A87</identifier><description xsi:type="identifierDefinition" type="orcid">0000-0003-3650-940X</description></name>







<name type="corporate">
  <namePart></namePart>
  <identifier type="local">DaAl</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>



<name type="conference">
  <namePart>ICLR: International Conference on Learning Representations</namePart>
</name>






<abstract lang="eng">We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer&apos;s memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and provide empirical evidence that LDAdam allows for efficient fine-tuning and pre-training of language models.</abstract>

<relatedItem type="constituent">
  <location>
    <url displayLabel="2025_ICLR_Robert.pdf">https://research-explorer.ista.ac.at/download/20034/20113/2025_ICLR_Robert.pdf</url>
  </location>
  <physicalDescription><internetMediaType>application/pdf</internetMediaType></physicalDescription><accessCondition type="restrictionOnAccess">no</accessCondition>
</relatedItem>
<originInfo><publisher>ICLR</publisher><dateIssued encoding="w3cdtf">2025</dateIssued><place><placeTerm type="text">Singapore, Singapore</placeTerm></place>
</originInfo>
<language><languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>



<relatedItem type="host"><titleInfo><title>13th International Conference on Learning Representations</title></titleInfo>
  <identifier type="isbn">9798331320850</identifier>
  <identifier type="arXiv">2410.16103</identifier>
<part><extent unit="pages">101877-101913</extent>
</part>
</relatedItem>


<relatedItem type="Supplementary material">
  <location>
  
     <url>https://github.com/IST-DASLab/LDAdam</url>
  
  </location>
</relatedItem>

<extension>
<bibliographicCitation>
<mla>Robert, Thomas, et al. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” &lt;i&gt;13th International Conference on Learning Representations&lt;/i&gt;, ICLR, 2025, pp. 101877–913.</mla>
<apa>Robert, T., Safaryan, M., Modoranu, I.-V., &amp;#38; Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In &lt;i&gt;13th International Conference on Learning Representations&lt;/i&gt; (pp. 101877–101913). Singapore, Singapore: ICLR.</apa>
<short>T. Robert, M. Safaryan, I.-V. Modoranu, D.-A. Alistarh, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–101913.</short>
<ama>Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: &lt;i&gt;13th International Conference on Learning Representations&lt;/i&gt;. ICLR; 2025:101877-101913.</ama>
<ista>Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. 2025. LDAdam: Adaptive optimization from low-dimensional gradient statistics. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 101877–101913.</ista>
<ieee>T. Robert, M. Safaryan, I.-V. Modoranu, and D.-A. Alistarh, “LDAdam: Adaptive optimization from low-dimensional gradient statistics,” in &lt;i&gt;13th International Conference on Learning Representations&lt;/i&gt;, Singapore, Singapore, 2025, pp. 101877–101913.</ieee>
<chicago>Robert, Thomas, Mher Safaryan, Ionut-Vlad Modoranu, and Dan-Adrian Alistarh. “LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics.” In &lt;i&gt;13th International Conference on Learning Representations&lt;/i&gt;, 101877–913. ICLR, 2025.</chicago>
</bibliographicCitation>
</extension>
<recordInfo><recordIdentifier>20034</recordIdentifier><recordCreationDate encoding="w3cdtf">2025-07-20T22:02:02Z</recordCreationDate><recordChangeDate encoding="w3cdtf">2025-08-04T08:41:10Z</recordChangeDate>
</recordInfo>
</mods>
</modsCollection>
