---
res:
  bibo_abstract:
  - Deep learning at scale is dominated by communication time. Distributing samples
    across nodes usually yields the best performance, but poses scaling challenges
    due to global information dissemination and load imbalance across uneven sample
    lengths. State-of-the-art decentralized optimizers mitigate the problem, but require
    more iterations to achieve the same accuracy as their globally-communicating counterparts.
    We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic
    optimizer that reduces global communication via subgroup weight exchange. The
    key insight is a combination of algorithmic changes to the averaging scheme and
    the use of a group allreduce operation. We prove the convergence of WAGMA-SGD,
    and empirically show that it retains convergence rates similar to Allreduce-SGD.
    For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation;
    and deep reinforcement learning for navigation at scale. Compared with state-of-the-art
    decentralized SGD variants, WAGMA-SGD significantly improves training throughput
    (e.g., 2.1× on 1,024 GPUs for reinforcement learning), and achieves the fastest
    time-to-solution (e.g., the highest score using the shortest training time for
    Transformer).@eng
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Shigang
      foaf_name: Li, Shigang
      foaf_surname: Li
  - foaf_Person:
      foaf_givenName: Tal Ben-Nun
      foaf_name: Tal Ben-Nun, Tal Ben-Nun
      foaf_surname: Tal Ben-Nun
  - foaf_Person:
      foaf_givenName: Giorgi
      foaf_name: Nadiradze, Giorgi
      foaf_surname: Nadiradze
      foaf_workInfoHomepage: http://www.librecat.org/personId=3279A00C-F248-11E8-B48F-1D18A9856A87
  - foaf_Person:
      foaf_givenName: Salvatore Di
      foaf_name: Girolamo, Salvatore Di
      foaf_surname: Girolamo
  - foaf_Person:
      foaf_givenName: Nikoli
      foaf_name: Dryden, Nikoli
      foaf_surname: Dryden
  - foaf_Person:
      foaf_givenName: Dan-Adrian
      foaf_name: Alistarh, Dan-Adrian
      foaf_surname: Alistarh
      foaf_workInfoHomepage: http://www.librecat.org/personId=4A899BFC-F248-11E8-B48F-1D18A9856A87
    orcid: 0000-0003-3650-940X
  - foaf_Person:
      foaf_givenName: Torsten
      foaf_name: Hoefler, Torsten
      foaf_surname: Hoefler
  bibo_doi: 10.1109/TPDS.2020.3040606
  bibo_issue: '7'
  bibo_volume: 32
  dct_date: 2021^xs_gYear
  dct_identifier:
  - UT:000621405200019
  dct_isPartOf:
  - http://id.crossref.org/issn/10459219
  dct_language: eng
  dct_publisher: IEEE@
  dct_title: Breaking (global) barriers in parallel stochastic optimization with wait-avoiding
    group averaging@
...