<?xml version="1.0" encoding="UTF-8"?>

<modsCollection xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods version="3.3">

<genre>conference paper</genre>

<titleInfo><title>L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning</title></titleInfo>


<note type="publicationStatus">published</note>


<note type="qualityControlled">yes</note>

<name type="personal">
  <namePart type="given">Ilia</namePart>
  <namePart type="family">Markov</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">D0CF4148-C985-11E9-8066-0BDEE5697425</identifier></name>
<name type="personal">
  <namePart type="given">Kaveh</namePart>
  <namePart type="family">Alimohammadi</namePart>
  <role><roleTerm type="text">author</roleTerm> </role></name>
<name type="personal">
  <namePart type="given">Elias</namePart>
  <namePart type="family">Frantar</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">09a8f98d-ec99-11ea-ae11-c063a7b7fe5f</identifier></name>
<name type="personal">
  <namePart type="given">Dan-Adrian</namePart>
  <namePart type="family">Alistarh</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">4A899BFC-F248-11E8-B48F-1D18A9856A87</identifier><description xsi:type="identifierDefinition" type="orcid">0000-0003-3650-940X</description></name>



<name type="personal"><namePart type="given">P.</namePart><namePart type="family">Gibbons</namePart>
  <role> <roleTerm type="text">editor</roleTerm> </role></name>
<name type="personal"><namePart type="given">G.</namePart><namePart type="family">Pekhimenko</namePart>
  <role> <roleTerm type="text">editor</roleTerm> </role></name>
<name type="personal"><namePart type="given">C.</namePart><namePart type="family">De Sa</namePart>
  <role> <roleTerm type="text">editor</roleTerm> </role></name>




<name type="corporate">
  <namePart></namePart>
  <identifier type="local">DaAl</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>



<name type="conference">
  <namePart>MLSys: Machine Learning and Systems</namePart>
</name>






<abstract lang="eng">Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy.In this work, we provide a general framework for adapting the degree of compression across the model&apos;s layers dynamically during training, improving the overall compression, while leading to substantial speedups, without sacrificing accuracy. Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while satisfying an error constraint. Extensive experiments over image classification and language modeling tasks shows that L-GreCo is effective across all existing families of compression methods, and achieves up to 2.5
×
 training speedup and up to 5
×
 compression improvement over efficient implementations of existing approaches, while recovering full accuracy. Moreover, L-GreCo is complementary to existing adaptive algorithms, improving their compression ratio by 50\% and practical throughput by 66\%. An anonymized implementation is available at https://github.com/LGrCo/L-GreCo.</abstract>

<originInfo><publisher>Association for Computing Machinery</publisher><dateIssued encoding="w3cdtf">2024</dateIssued><place><placeTerm type="text">Athens, Greece</placeTerm></place>
</originInfo>
<language><languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>



<relatedItem type="host"><titleInfo><title>Proceedings of Machine Learning and Systems </title></titleInfo>
  <identifier type="arXiv">2210.17357</identifier>
<part><detail type="volume"><number>6</number></detail>
</part>
</relatedItem>
<relatedItem type="Supplementary material">
  <location>     <url>https://research-explorer.ista.ac.at/record/17490</url>  </location>
</relatedItem>

<extension>
<bibliographicCitation>
<ama>Markov I, Alimohammadi K, Frantar E, Alistarh D-A. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In: Gibbons P, Pekhimenko G, De Sa C, eds. &lt;i&gt;Proceedings of Machine Learning and Systems &lt;/i&gt;. Vol 6. Association for Computing Machinery; 2024.</ama>
<short>I. Markov, K. Alimohammadi, E. Frantar, D.-A. Alistarh, in:, P. Gibbons, G. Pekhimenko, C. De Sa (Eds.), Proceedings of Machine Learning and Systems , Association for Computing Machinery, 2024.</short>
<ieee>I. Markov, K. Alimohammadi, E. Frantar, and D.-A. Alistarh, “L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning,” in &lt;i&gt;Proceedings of Machine Learning and Systems &lt;/i&gt;, Athens, Greece, 2024, vol. 6.</ieee>
<chicago>Markov, Ilia, Kaveh Alimohammadi, Elias Frantar, and Dan-Adrian Alistarh. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” In &lt;i&gt;Proceedings of Machine Learning and Systems &lt;/i&gt;, edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6. Association for Computing Machinery, 2024.</chicago>
<apa>Markov, I., Alimohammadi, K., Frantar, E., &amp;#38; Alistarh, D.-A. (2024). L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In P. Gibbons, G. Pekhimenko, &amp;#38; C. De Sa (Eds.), &lt;i&gt;Proceedings of Machine Learning and Systems &lt;/i&gt; (Vol. 6). Athens, Greece: Association for Computing Machinery.</apa>
<ista>Markov I, Alimohammadi K, Frantar E, Alistarh D-A. 2024. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. Proceedings of Machine Learning and Systems . MLSys: Machine Learning and Systems vol. 6.</ista>
<mla>Markov, Ilia, et al. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” &lt;i&gt;Proceedings of Machine Learning and Systems &lt;/i&gt;, edited by P. Gibbons et al., vol. 6, Association for Computing Machinery, 2024.</mla>
</bibliographicCitation>
</extension>
<recordInfo><recordIdentifier>17456</recordIdentifier><recordCreationDate encoding="w3cdtf">2024-08-22T08:29:25Z</recordCreationDate><recordChangeDate encoding="w3cdtf">2026-04-07T13:00:54Z</recordChangeDate>
</recordInfo>
</mods>
</modsCollection>
