---
res:
  bibo_abstract:
  - "The ability to scale out training workloads has been one of the key performance
    enablers of deep learning. The main scaling approach is data-parallel GPU-based
    training, which has been boosted by hardware and software support for highly efficient
    point-to-point communication, and in particular via hardware bandwidth over-provisioning.
    Overprovisioning comes at a cost: there is an order of magnitude price difference
    between \"cloud-grade\" servers with such support, relative to their popular \"consumer-grade\"
    counterparts, although single server-grade and consumer-grade GPUs can have similar
    computational envelopes.\r\n\r\nIn this paper, we show that the costly hardware
    overprovisioning approach can be supplanted via algorithmic and system design,
    and propose a framework called CGX, which provides efficient software support
    for compressed communication in ML applications, for both multi-GPU single-node
    training, as well as larger-scale multi-node training. CGX is based on two technical
    advances: At the system level, it relies on a re-developed communication stack
    for ML frameworks, which provides flexible, highly-efficient support for compressed
    communication. At the application level, it provides seamless, parameter-free
    integration with popular frameworks, so that end-users do not have to modify training
    recipes, nor significant training code. This is complemented by a layer-wise adaptive
    compression technique which dynamically balances compression gains with accuracy
    preservation. CGX integrates with popular ML frameworks, providing up to 3X speedups
    for multi-GPU nodes based on commodity hardware, and order-of-magnitude improvements
    in the multi-node setting, with negligible impact on accuracy.@eng"
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Ilia
      foaf_name: Markov, Ilia
      foaf_surname: Markov
      foaf_workInfoHomepage: http://www.librecat.org/personId=D0CF4148-C985-11E9-8066-0BDEE5697425
  - foaf_Person:
      foaf_givenName: Hamidreza
      foaf_name: Ramezanikebrya, Hamidreza
      foaf_surname: Ramezanikebrya
  - foaf_Person:
      foaf_givenName: Dan-Adrian
      foaf_name: Alistarh, Dan-Adrian
      foaf_surname: Alistarh
      foaf_workInfoHomepage: http://www.librecat.org/personId=4A899BFC-F248-11E8-B48F-1D18A9856A87
    orcid: 0000-0003-3650-940X
  bibo_doi: 10.1145/3528535.3565248
  dct_date: 2022^xs_gYear
  dct_identifier:
  - UT:001061556200024
  dct_isPartOf:
  - http://id.crossref.org/issn/9781450393409
  dct_language: eng
  dct_publisher: Association for Computing Machinery@
  dct_title: 'CGX: Adaptive system support for communication-efficient deep learning@'
...
