<?xml version="1.0" encoding="UTF-8"?>

<modsCollection xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods version="3.3">

<genre>conference paper</genre>

<titleInfo><title>CGX: Adaptive system support for communication-efficient deep learning</title></titleInfo>


<note type="publicationStatus">published</note>


<note type="qualityControlled">yes</note>

<name type="personal">
  <namePart type="given">Ilia</namePart>
  <namePart type="family">Markov</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">D0CF4148-C985-11E9-8066-0BDEE5697425</identifier></name>
<name type="personal">
  <namePart type="given">Hamidreza</namePart>
  <namePart type="family">Ramezanikebrya</namePart>
  <role><roleTerm type="text">author</roleTerm> </role></name>
<name type="personal">
  <namePart type="given">Dan-Adrian</namePart>
  <namePart type="family">Alistarh</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">4A899BFC-F248-11E8-B48F-1D18A9856A87</identifier><description xsi:type="identifierDefinition" type="orcid">0000-0003-3650-940X</description></name>







<name type="corporate">
  <namePart></namePart>
  <identifier type="local">DaAl</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>



<name type="conference">
  <namePart>Middleware: International Middleware Conference</namePart>
</name>






<abstract lang="eng">The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth over-provisioning. Overprovisioning comes at a cost: there is an order of magnitude price difference between &quot;cloud-grade&quot; servers with such support, relative to their popular &quot;consumer-grade&quot; counterparts, although single server-grade and consumer-grade GPUs can have similar computational envelopes.

In this paper, we show that the costly hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for compressed communication in ML applications, for both multi-GPU single-node training, as well as larger-scale multi-node training. CGX is based on two technical advances: At the system level, it relies on a re-developed communication stack for ML frameworks, which provides flexible, highly-efficient support for compressed communication. At the application level, it provides seamless, parameter-free integration with popular frameworks, so that end-users do not have to modify training recipes, nor significant training code. This is complemented by a layer-wise adaptive compression technique which dynamically balances compression gains with accuracy preservation. CGX integrates with popular ML frameworks, providing up to 3X speedups for multi-GPU nodes based on commodity hardware, and order-of-magnitude improvements in the multi-node setting, with negligible impact on accuracy.</abstract>

<relatedItem type="constituent">
  <location>
    <url displayLabel="2022_ACMMiddleware_Markov.pdf">https://research-explorer.ista.ac.at/download/12780/12795/2022_ACMMiddleware_Markov.pdf</url>
  </location>
  <physicalDescription><internetMediaType>application/pdf</internetMediaType></physicalDescription><accessCondition type="restrictionOnAccess">no</accessCondition>
</relatedItem>
<originInfo><publisher>Association for Computing Machinery</publisher><dateIssued encoding="w3cdtf">2022</dateIssued><place><placeTerm type="text">Quebec, QC, Canada</placeTerm></place>
</originInfo>
<language><languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>



<relatedItem type="host"><titleInfo><title>Proceedings of the 23rd ACM/IFIP International Middleware Conference</title></titleInfo>
  <identifier type="isbn">9781450393409</identifier>
  <identifier type="arXiv">2111.08617</identifier>
  <identifier type="ISI">001061556200024</identifier><identifier type="doi">10.1145/3528535.3565248</identifier>
<part><extent unit="pages">241-254</extent>
</part>
</relatedItem>
<relatedItem type="Supplementary material">
  <location>     <url>https://research-explorer.ista.ac.at/record/17490</url>  </location>
</relatedItem>

<extension>
<bibliographicCitation>
<short>I. Markov, H. Ramezanikebrya, D.-A. Alistarh, in:, Proceedings of the 23rd ACM/IFIP International Middleware Conference, Association for Computing Machinery, 2022, pp. 241–254.</short>
<ista>Markov I, Ramezanikebrya H, Alistarh D-A. 2022. CGX: Adaptive system support for communication-efficient deep learning. Proceedings of the 23rd ACM/IFIP International Middleware Conference. Middleware: International Middleware Conference, 241–254.</ista>
<ama>Markov I, Ramezanikebrya H, Alistarh D-A. CGX: Adaptive system support for communication-efficient deep learning. In: &lt;i&gt;Proceedings of the 23rd ACM/IFIP International Middleware Conference&lt;/i&gt;. Association for Computing Machinery; 2022:241-254. doi:&lt;a href=&quot;https://doi.org/10.1145/3528535.3565248&quot;&gt;10.1145/3528535.3565248&lt;/a&gt;</ama>
<chicago>Markov, Ilia, Hamidreza Ramezanikebrya, and Dan-Adrian Alistarh. “CGX: Adaptive System Support for Communication-Efficient Deep Learning.” In &lt;i&gt;Proceedings of the 23rd ACM/IFIP International Middleware Conference&lt;/i&gt;, 241–54. Association for Computing Machinery, 2022. &lt;a href=&quot;https://doi.org/10.1145/3528535.3565248&quot;&gt;https://doi.org/10.1145/3528535.3565248&lt;/a&gt;.</chicago>
<ieee>I. Markov, H. Ramezanikebrya, and D.-A. Alistarh, “CGX: Adaptive system support for communication-efficient deep learning,” in &lt;i&gt;Proceedings of the 23rd ACM/IFIP International Middleware Conference&lt;/i&gt;, Quebec, QC, Canada, 2022, pp. 241–254.</ieee>
<mla>Markov, Ilia, et al. “CGX: Adaptive System Support for Communication-Efficient Deep Learning.” &lt;i&gt;Proceedings of the 23rd ACM/IFIP International Middleware Conference&lt;/i&gt;, Association for Computing Machinery, 2022, pp. 241–54, doi:&lt;a href=&quot;https://doi.org/10.1145/3528535.3565248&quot;&gt;10.1145/3528535.3565248&lt;/a&gt;.</mla>
<apa>Markov, I., Ramezanikebrya, H., &amp;#38; Alistarh, D.-A. (2022). CGX: Adaptive system support for communication-efficient deep learning. In &lt;i&gt;Proceedings of the 23rd ACM/IFIP International Middleware Conference&lt;/i&gt; (pp. 241–254). Quebec, QC, Canada: Association for Computing Machinery. &lt;a href=&quot;https://doi.org/10.1145/3528535.3565248&quot;&gt;https://doi.org/10.1145/3528535.3565248&lt;/a&gt;</apa>
</bibliographicCitation>
</extension>
<recordInfo><recordIdentifier>12780</recordIdentifier><recordCreationDate encoding="w3cdtf">2023-03-31T06:17:00Z</recordCreationDate><recordChangeDate encoding="w3cdtf">2026-04-07T13:00:54Z</recordChangeDate>
</recordInfo>
</mods>
</modsCollection>
