---
res:
  bibo_abstract:
  - In the last few years, various communication compression techniques have emerged
    as an indispensable tool helping to alleviate the communication bottleneck in
    distributed learning. However, despite the fact biased compressors often show
    superior performance in practice when compared to the much more studied and understood
    unbiased compressors, very little is known about them. In this work we study three
    classes of biased compression operators, two of which are new, and their performance
    when applied to (stochastic) gradient descent and distributed (stochastic) gradient
    descent. We show for the first time that biased compressors can lead to linear
    convergence rates both in the single node and distributed settings. We prove that
    distributed compressed SGD method, employed with error feedback mechanism, enjoys
    the ergodic rate O(δLexp[−μKδL]+(C+δD)Kμ), where δ≥1 is a compression parameter
    which grows when more compression is applied, L and μ are the smoothness and strong
    convexity constants, C captures stochastic gradient noise (C=0 if full gradients
    are computed on each node) and D captures the variance of the gradients at the
    optimum (D=0 for over-parameterized models). Further, via a theoretical study
    of several synthetic and empirical distributions of communicated gradients, we
    shed light on why and by how much biased compressors outperform their unbiased
    variants. Finally, we propose several new biased compressors with promising theoretical
    guarantees and practical performance.@eng
  bibo_authorlist:
  - foaf_Person:
      foaf_givenName: Aleksandr
      foaf_name: Beznosikov, Aleksandr
      foaf_surname: Beznosikov
  - foaf_Person:
      foaf_givenName: Samuel
      foaf_name: Horvath, Samuel
      foaf_surname: Horvath
  - foaf_Person:
      foaf_givenName: Peter
      foaf_name: Richtarik, Peter
      foaf_surname: Richtarik
  - foaf_Person:
      foaf_givenName: Mher
      foaf_name: Safaryan, Mher
      foaf_surname: Safaryan
      foaf_workInfoHomepage: http://www.librecat.org/personId=dd546b39-0804-11ed-9c55-ef075c39778d
  bibo_volume: 24
  dct_date: 2023^xs_gYear
  dct_identifier:
  - UT:001111578500001
  dct_isPartOf:
  - http://id.crossref.org/issn/1533-7928
  dct_language: eng
  dct_publisher: Journal of Machine Learning Research@
  dct_title: On biased compression for distributed learning@
...
