---
_id: '7201'
abstract:
- lang: eng
  text: Applying machine learning techniques to the quickly growing data in science
    and industry requires highly-scalable algorithms. Large datasets are most commonly
    processed "data parallel" distributed across many nodes. Each node's contribution
    to the overall gradient is summed using a global allreduce. This allreduce is
    the single communication and thus scalability bottleneck for most machine learning
    workloads. We observe that frequently, many gradient values are (close to) zero,
    leading to sparse of sparsifyable communications. To exploit this insight, we
    analyze, design, and implement a set of communication-efficient protocols for
    sparse input data, in conjunction with efficient machine learning algorithms which
    can leverage these primitives. Our communication protocols generalize standard
    collective operations, by allowing processes to contribute arbitrary sparse input
    data vectors. Our generic communication library, SparCML1, extends MPI to support
    additional features, such as non-blocking (asynchronous) operations and low-precision
    data representations. As such, SparCML and its techniques will form the basis
    of future highly-scalable machine learning frameworks.
article_number: a11
article_processing_charge: No
arxiv: 1
author:
- first_name: Cedric
  full_name: Renggli, Cedric
  last_name: Renggli
- first_name: Saleh
  full_name: Ashkboos, Saleh
  id: 0D0A9058-257B-11EA-A937-9341C3D8BC8A
  last_name: Ashkboos
- first_name: Mehdi
  full_name: Aghagolzadeh, Mehdi
  last_name: Aghagolzadeh
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
citation:
  ama: 'Renggli C, Ashkboos S, Aghagolzadeh M, Alistarh D-A, Hoefler T. SparCML: High-performance
    sparse communication for machine learning. In: <i>International Conference for
    High Performance Computing, Networking, Storage and Analysis, SC</i>. ACM; 2019.
    doi:<a href="https://doi.org/10.1145/3295500.3356222">10.1145/3295500.3356222</a>'
  apa: 'Renggli, C., Ashkboos, S., Aghagolzadeh, M., Alistarh, D.-A., &#38; Hoefler,
    T. (2019). SparCML: High-performance sparse communication for machine learning.
    In <i>International Conference for High Performance Computing, Networking, Storage
    and Analysis, SC</i>. Denver, CO, Unites States: ACM. <a href="https://doi.org/10.1145/3295500.3356222">https://doi.org/10.1145/3295500.3356222</a>'
  chicago: 'Renggli, Cedric, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan-Adrian Alistarh,
    and Torsten Hoefler. “SparCML: High-Performance Sparse Communication for Machine
    Learning.” In <i>International Conference for High Performance Computing, Networking,
    Storage and Analysis, SC</i>. ACM, 2019. <a href="https://doi.org/10.1145/3295500.3356222">https://doi.org/10.1145/3295500.3356222</a>.'
  ieee: 'C. Renggli, S. Ashkboos, M. Aghagolzadeh, D.-A. Alistarh, and T. Hoefler,
    “SparCML: High-performance sparse communication for machine learning,” in <i>International
    Conference for High Performance Computing, Networking, Storage and Analysis, SC</i>,
    Denver, CO, Unites States, 2019.'
  ista: 'Renggli C, Ashkboos S, Aghagolzadeh M, Alistarh D-A, Hoefler T. 2019. SparCML:
    High-performance sparse communication for machine learning. International Conference
    for High Performance Computing, Networking, Storage and Analysis, SC. SC: Conference
    for High Performance Computing, Networking, Storage and Analysis, a11.'
  mla: 'Renggli, Cedric, et al. “SparCML: High-Performance Sparse Communication for
    Machine Learning.” <i>International Conference for High Performance Computing,
    Networking, Storage and Analysis, SC</i>, a11, ACM, 2019, doi:<a href="https://doi.org/10.1145/3295500.3356222">10.1145/3295500.3356222</a>.'
  short: C. Renggli, S. Ashkboos, M. Aghagolzadeh, D.-A. Alistarh, T. Hoefler, in:,
    International Conference for High Performance Computing, Networking, Storage and
    Analysis, SC, ACM, 2019.
conference:
  end_date: 2019-11-19
  location: Denver, CO, Unites States
  name: 'SC: Conference for High Performance Computing, Networking, Storage and Analysis'
  start_date: 2019-11-17
date_created: 2019-12-22T23:00:42Z
date_published: 2019-11-17T00:00:00Z
date_updated: 2025-07-10T11:54:21Z
day: '17'
department:
- _id: DaAl
doi: 10.1145/3295500.3356222
ec_funded: 1
external_id:
  arxiv:
  - '1802.08021'
  isi:
  - '000545976800011'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/1802.08021
month: '11'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: International Conference for High Performance Computing, Networking,
  Storage and Analysis, SC
publication_identifier:
  eissn:
  - 2167-4337
  isbn:
  - '9781450362290'
  issn:
  - 2167-4329
publication_status: published
publisher: ACM
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SparCML: High-performance sparse communication for machine learning'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2019'
...