---
_id: '14815'
abstract:
- lang: eng
  text: In the last few years, various communication compression techniques have emerged
    as an indispensable tool helping to alleviate the communication bottleneck in
    distributed learning. However, despite the fact biased compressors often show
    superior performance in practice when compared to the much more studied and understood
    unbiased compressors, very little is known about them. In this work we study three
    classes of biased compression operators, two of which are new, and their performance
    when applied to (stochastic) gradient descent and distributed (stochastic) gradient
    descent. We show for the first time that biased compressors can lead to linear
    convergence rates both in the single node and distributed settings. We prove that
    distributed compressed SGD method, employed with error feedback mechanism, enjoys
    the ergodic rate O(δLexp[−μKδL]+(C+δD)Kμ), where δ≥1 is a compression parameter
    which grows when more compression is applied, L and μ are the smoothness and strong
    convexity constants, C captures stochastic gradient noise (C=0 if full gradients
    are computed on each node) and D captures the variance of the gradients at the
    optimum (D=0 for over-parameterized models). Further, via a theoretical study
    of several synthetic and empirical distributions of communicated gradients, we
    shed light on why and by how much biased compressors outperform their unbiased
    variants. Finally, we propose several new biased compressors with promising theoretical
    guarantees and practical performance.
acknowledgement: 'The work in Sections 1-5 was conducted while A. Beznosikov was a
  research intern in the Optimizationand Machine Learning Lab of Peter Richtárik at
  KAUST; this visit was funded by the KAUST Baseline Research Funding Scheme. The
  work of A. Beznosikov in Section 6 was conducted in Skoltech and was supported by
  Ministry of Science and Higher Education grant No. 075-10-2021-068. '
article_processing_charge: Yes (in subscription journal)
article_type: original
author:
- first_name: Aleksandr
  full_name: Beznosikov, Aleksandr
  last_name: Beznosikov
- first_name: Samuel
  full_name: Horvath, Samuel
  last_name: Horvath
- first_name: Peter
  full_name: Richtarik, Peter
  last_name: Richtarik
- first_name: Mher
  full_name: Safaryan, Mher
  id: dd546b39-0804-11ed-9c55-ef075c39778d
  last_name: Safaryan
citation:
  ama: Beznosikov A, Horvath S, Richtarik P, Safaryan M. On biased compression for
    distributed learning. <i>Journal of Machine Learning Research</i>. 2023;24:1-50.
  apa: Beznosikov, A., Horvath, S., Richtarik, P., &#38; Safaryan, M. (2023). On biased
    compression for distributed learning. <i>Journal of Machine Learning Research</i>.
    Journal of Machine Learning Research.
  chicago: Beznosikov, Aleksandr, Samuel Horvath, Peter Richtarik, and Mher Safaryan.
    “On Biased Compression for Distributed Learning.” <i>Journal of Machine Learning
    Research</i>. Journal of Machine Learning Research, 2023.
  ieee: A. Beznosikov, S. Horvath, P. Richtarik, and M. Safaryan, “On biased compression
    for distributed learning,” <i>Journal of Machine Learning Research</i>, vol. 24.
    Journal of Machine Learning Research, pp. 1–50, 2023.
  ista: Beznosikov A, Horvath S, Richtarik P, Safaryan M. 2023. On biased compression
    for distributed learning. Journal of Machine Learning Research. 24, 1–50.
  mla: Beznosikov, Aleksandr, et al. “On Biased Compression for Distributed Learning.”
    <i>Journal of Machine Learning Research</i>, vol. 24, Journal of Machine Learning
    Research, 2023, pp. 1–50.
  short: A. Beznosikov, S. Horvath, P. Richtarik, M. Safaryan, Journal of Machine
    Learning Research 24 (2023) 1–50.
date_created: 2024-01-16T12:13:36Z
date_published: 2023-10-01T00:00:00Z
date_updated: 2024-01-17T09:14:13Z
day: '01'
ddc:
- '000'
department:
- _id: DaAl
external_id:
  arxiv:
  - '2002.12410'
  isi:
  - '001111578500001'
file:
- access_level: open_access
  checksum: c50f2b9db53938b755e30a085f464059
  content_type: application/pdf
  creator: dernst
  date_created: 2024-01-16T12:13:27Z
  date_updated: 2024-01-16T12:13:27Z
  file_id: '14816'
  file_name: 2023_JMLR_Beznosikov.pdf
  file_size: 1510993
  relation: main_file
  success: 1
file_date_updated: 2024-01-16T12:13:27Z
has_accepted_license: '1'
intvolume: '        24'
isi: 1
language:
- iso: eng
month: '10'
oa: 1
oa_version: Published Version
page: 1-50
publication: Journal of Machine Learning Research
publication_identifier:
  eissn:
  - 1533-7928
publication_status: published
publisher: Journal of Machine Learning Research
quality_controlled: '1'
status: public
title: On biased compression for distributed learning
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 24
year: '2023'
...