---
OA_place: publisher
OA_type: free access
_id: '19713'
abstract:
- lang: eng
  text: Distributed optimization is the standard way of speeding up machine learning
    training, and most of the research in the area focuses on distributed first-order,
    gradient-based methods. Yet, there are settings where some computationally-bounded
    nodes may not be able to implement first-order, gradient-based optimization, while
    they could still contribute to joint optimization tasks. In this paper, we initiate
    the study of hybrid decentralized optimization, studying settings where nodes
    with zeroth-order and first-order optimization capabilities co-exist in a distributed
    system, and attempt to jointly solve an optimization task over some data distribution.
    We essentially show that, under reasonable parameter settings, such a system can
    not only withstand noisier zeroth-order agents but can even benefit from integrating
    such agents into the optimization process, rather than ignoring their information.
    At the core of our approach is a new analysis of distributed optimization with
    noisy and possibly-biased gradient estimators, which may be of independent interest.
    Our results hold for both convex and non-convex objectives. Experimental results
    on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth
    order optimization can be practical, even when training deep neural networks.
acknowledgement: "This project has received funding from the European Research Council
  (ERC) under the European Union’s Horizon 2020 research and innovation programme
  (grant agreement\r\nNo 805223 ScaleML). The authors would like to acknowledge Eugenia
  Iofinova for useful discussions during the inception of this project."
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Shayan
  full_name: Talaei, Shayan
  last_name: Talaei
- first_name: Matin
  full_name: Ansaripour, Matin
  last_name: Ansaripour
- first_name: Giorgi
  full_name: Nadiradze, Giorgi
  id: 3279A00C-F248-11E8-B48F-1D18A9856A87
  last_name: Nadiradze
  orcid: 0000-0001-5634-0731
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Talaei S, Ansaripour M, Nadiradze G, Alistarh D-A. Hybrid decentralized optimization:
    Leveraging both first- and zeroth-order optimizers for faster convergence. <i>Proceedings
    of the 39th AAAI Conference on Artificial Intelligence</i>. 2025;39(19):20778-20786.
    doi:<a href="https://doi.org/10.1609/aaai.v39i19.34290">10.1609/aaai.v39i19.34290</a>'
  apa: 'Talaei, S., Ansaripour, M., Nadiradze, G., &#38; Alistarh, D.-A. (2025). Hybrid
    decentralized optimization: Leveraging both first- and zeroth-order optimizers
    for faster convergence. <i>Proceedings of the 39th AAAI Conference on Artificial
    Intelligence</i>. Association for the Advancement of Artificial Intelligence.
    <a href="https://doi.org/10.1609/aaai.v39i19.34290">https://doi.org/10.1609/aaai.v39i19.34290</a>'
  chicago: 'Talaei, Shayan, Matin Ansaripour, Giorgi Nadiradze, and Dan-Adrian Alistarh.
    “Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers
    for Faster Convergence.” <i>Proceedings of the 39th AAAI Conference on Artificial
    Intelligence</i>. Association for the Advancement of Artificial Intelligence,
    2025. <a href="https://doi.org/10.1609/aaai.v39i19.34290">https://doi.org/10.1609/aaai.v39i19.34290</a>.'
  ieee: 'S. Talaei, M. Ansaripour, G. Nadiradze, and D.-A. Alistarh, “Hybrid decentralized
    optimization: Leveraging both first- and zeroth-order optimizers for faster convergence,”
    <i>Proceedings of the 39th AAAI Conference on Artificial Intelligence</i>, vol.
    39, no. 19. Association for the Advancement of Artificial Intelligence, pp. 20778–20786,
    2025.'
  ista: 'Talaei S, Ansaripour M, Nadiradze G, Alistarh D-A. 2025. Hybrid decentralized
    optimization: Leveraging both first- and zeroth-order optimizers for faster convergence.
    Proceedings of the 39th AAAI Conference on Artificial Intelligence. 39(19), 20778–20786.'
  mla: 'Talaei, Shayan, et al. “Hybrid Decentralized Optimization: Leveraging Both
    First- and Zeroth-Order Optimizers for Faster Convergence.” <i>Proceedings of
    the 39th AAAI Conference on Artificial Intelligence</i>, vol. 39, no. 19, Association
    for the Advancement of Artificial Intelligence, 2025, pp. 20778–86, doi:<a href="https://doi.org/10.1609/aaai.v39i19.34290">10.1609/aaai.v39i19.34290</a>.'
  short: S. Talaei, M. Ansaripour, G. Nadiradze, D.-A. Alistarh, Proceedings of the
    39th AAAI Conference on Artificial Intelligence 39 (2025) 20778–20786.
corr_author: '1'
date_created: 2025-05-19T14:15:35Z
date_published: 2025-04-11T00:00:00Z
date_updated: 2026-02-16T12:34:44Z
day: '11'
department:
- _id: DaAl
doi: 10.1609/aaai.v39i19.34290
ec_funded: 1
external_id:
  arxiv:
  - '2210.07703'
intvolume: '        39'
issue: '19'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.1609/aaai.v39i19.34290
month: '04'
oa: 1
oa_version: Preprint
page: 20778-20786
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 39th AAAI Conference on Artificial Intelligence
publication_identifier:
  eissn:
  - 2374-3468
  issn:
  - 2159-5399
publication_status: published
publisher: Association for the Advancement of Artificial Intelligence
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/ShayanTalaei/HDO
scopus_import: '1'
status: public
title: 'Hybrid decentralized optimization: Leveraging both first- and zeroth-order
  optimizers for faster convergence'
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 39
year: '2025'
...
---
OA_place: publisher
OA_type: hybrid
PlanS_conform: '1'
_id: '19969'
abstract:
- lang: eng
  text: "In the stochastic population protocol model, we are given a connected graph
    with n nodes, and in every time step, a scheduler samples an edge of the graph
    uniformly at random and the nodes connected by this edge interact. A fundamental
    task in this model is stable leader election, in which all nodes start in an identical
    state and the aim is to reach a configuration in which (1)\r\nexactly one node
    is elected as leader and (2) this node remains as the unique leader no matter
    what sequence of interactions follows. On cliques, the complexity of this problem
    has recently been settled: time-optimal protocols stabilize in (n log n) expected
    steps using (log log n) states, whereas protocols that use O(1) states require
    (n2) expected steps. In this work, we investigate the complexity of stable leader
    election on graphs. We provide the first non-trivial time lower bounds on general
    graphs, showing that, when moving beyond cliques, the complexity of stable leader
    election can range from O(1) to (n3) expected steps. We describe a protocol that
    is time-optimal on many graph families, but uses polynomially-many states. In
    contrast, we give a near-time-optimal protocol that uses only O(log2 n) states
    that is at most a factor O(log n) slower. Finally, we observe that for many graphs
    the constant-state protocol of Beauquier et al. [OPODIS 2013] is at most a factor
    O(n log n) slower than the fast polynomial-state protocol, and among constant-state
    protocols, this protocol has near-optimal average case complexity on dense random
    graphs."
acknowledgement: We thank all anonymous reviewers for their helpful comments. We would
  also like to thank Jakob Solnerzik and Olivier Stietel for catching some errors
  in the proofs. Open Access funding enabled and organized by Projekt DEAL. We gratefully
  acknowledge funding from the European Research Council (ERC) under the European
  Union’s Horizon 2020 research and innovation programme (grant agreement No 805223
  ScaleML).
article_processing_charge: Yes (via OA deal)
article_type: original
arxiv: 1
author:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Joel
  full_name: Rybicki, Joel
  id: 334EFD2E-F248-11E8-B48F-1D18A9856A87
  last_name: Rybicki
  orcid: 0000-0002-6432-6646
- first_name: Sasha
  full_name: Voitovych, Sasha
  last_name: Voitovych
citation:
  ama: Alistarh D-A, Rybicki J, Voitovych S. Near-optimal leader election in population
    protocols on graphs. <i>Distributed Computing</i>. 2025;38:207-245. doi:<a href="https://doi.org/10.1007/s00446-025-00487-7">10.1007/s00446-025-00487-7</a>
  apa: Alistarh, D.-A., Rybicki, J., &#38; Voitovych, S. (2025). Near-optimal leader
    election in population protocols on graphs. <i>Distributed Computing</i>. Springer
    Nature. <a href="https://doi.org/10.1007/s00446-025-00487-7">https://doi.org/10.1007/s00446-025-00487-7</a>
  chicago: Alistarh, Dan-Adrian, Joel Rybicki, and Sasha Voitovych. “Near-Optimal
    Leader Election in Population Protocols on Graphs.” <i>Distributed Computing</i>.
    Springer Nature, 2025. <a href="https://doi.org/10.1007/s00446-025-00487-7">https://doi.org/10.1007/s00446-025-00487-7</a>.
  ieee: D.-A. Alistarh, J. Rybicki, and S. Voitovych, “Near-optimal leader election
    in population protocols on graphs,” <i>Distributed Computing</i>, vol. 38. Springer
    Nature, pp. 207–245, 2025.
  ista: Alistarh D-A, Rybicki J, Voitovych S. 2025. Near-optimal leader election in
    population protocols on graphs. Distributed Computing. 38, 207–245.
  mla: Alistarh, Dan-Adrian, et al. “Near-Optimal Leader Election in Population Protocols
    on Graphs.” <i>Distributed Computing</i>, vol. 38, Springer Nature, 2025, pp.
    207–45, doi:<a href="https://doi.org/10.1007/s00446-025-00487-7">10.1007/s00446-025-00487-7</a>.
  short: D.-A. Alistarh, J. Rybicki, S. Voitovych, Distributed Computing 38 (2025)
    207–245.
corr_author: '1'
date_created: 2025-07-06T22:01:24Z
date_published: 2025-09-01T00:00:00Z
date_updated: 2025-12-30T09:04:18Z
day: '01'
ddc:
- '510'
department:
- _id: DaAl
doi: 10.1007/s00446-025-00487-7
ec_funded: 1
external_id:
  arxiv:
  - '2205.12597'
  isi:
  - '001518300400001'
file:
- access_level: open_access
  checksum: 2789c0fdfb58f64930f05f6ac2b3ca61
  content_type: application/pdf
  creator: dernst
  date_created: 2025-12-30T09:03:55Z
  date_updated: 2025-12-30T09:03:55Z
  file_id: '20900'
  file_name: 2025_DistributedComp_Alistarh.pdf
  file_size: 770705
  relation: main_file
  success: 1
file_date_updated: 2025-12-30T09:03:55Z
has_accepted_license: '1'
intvolume: '        38'
isi: 1
language:
- iso: eng
month: '09'
oa: 1
oa_version: Published Version
page: 207-245
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Distributed Computing
publication_identifier:
  eissn:
  - 1432-0452
  issn:
  - 0178-2770
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
related_material:
  record:
  - id: '11844'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: Near-optimal leader election in population protocols on graphs
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 38
year: '2025'
...
---
OA_place: publisher
_id: '17485'
abstract:
- lang: eng
  text: "Large language models (LLMs) have made tremendous progress in the past few
    years, from being able to generate coherent text to matching or surpassing humans
    in a wide variety of creative, knowledge or reasoning tasks. Much of this can
    be attributed to massively increased scale, both in the size of the model as well
    as the amount of training data, from 100s of millions to 100s of billions, or
    even trillions. This trend is expected to continue, which, although exciting,
    also raises major practical concerns. Already today's 100+ billion parameter LLMs
    require top-of-the-line hardware just to run. Hence, it is clear that sustaining
    these developments will require significant efficiency advances.\r\n\r\nHistorically,
    one of the most practical ways of improving model efficiency has been compression,
    especially in the form of sparsity or quantization. While this has been studied
    extensively in the past, existing accurate methods are all designed for models
    around 100 million parameters; scaling them up to ones literally 1000x larger
    is highly challenging. In this thesis, we introduce a new unified sparsification
    and quantization approach OBC, which through additional algorithmic enhancements
    leads to GPTQ and SparseGPT, the first techniques fast and accurate enough to
    compress 100+ billion parameter models to 4- or even 3-bit precision and 50% weight-sparsity,
    respectively. Additionally, we show how weight-only quantizion does not just bring
    space savings but also up to 4.5x faster generation speed, via custom GPU kernels.\r\n\r\nIn
    fact, we show for the first time that it is possible to develop an FP16 times
    INT4 mixed-precision matrix multiplication kernel, called Marlin, which comes
    close to simultaneously maximizing both memory and compute utilization, making
    weight-only quantization highly practical even for multi-user serving. Further,
    we demonstrate that GPTQ can be scaled to widely overparametrized trillion-parameter
    models, where extreme sub-1-bit compression rates can be achieved without any
    inference slow-down, by co-designing a bespoke entropy coding scheme together
    with an efficient kernel.\r\n\r\nFinally, we also study compression from the perspective
    of someone with access to massive amounts of compute resources for training large
    models completely from scratch. Here the key questions evolve around the joint
    scaling behavior between compression, model size, and amount of training data
    used. Based on extensive experimental results for both vision and text models,
    we introduce the first scaling law which accurately captures the relationship
    between weight-sparsity, number of non-zero weights and data. This further allows
    us to characterize the optimal sparsity, which we find to increase the longer
    a fixed cost model is being trained.\r\n\r\nOverall, this thesis presents contributions
    to three different angles of large model efficiency: affordable but accurate algorithms,
    highly efficient systems implementations, and fundamental scaling laws for compressed
    training."
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
citation:
  ama: 'Frantar E. Compressing large neural networks : Algorithms, systems and scaling
    laws. 2024. doi:<a href="https://doi.org/10.15479/at:ista:17485">10.15479/at:ista:17485</a>'
  apa: 'Frantar, E. (2024). <i>Compressing large neural networks : Algorithms, systems
    and scaling laws</i>. Institute of Science and Technology Austria. <a href="https://doi.org/10.15479/at:ista:17485">https://doi.org/10.15479/at:ista:17485</a>'
  chicago: 'Frantar, Elias. “Compressing Large Neural Networks : Algorithms, Systems
    and Scaling Laws.” Institute of Science and Technology Austria, 2024. <a href="https://doi.org/10.15479/at:ista:17485">https://doi.org/10.15479/at:ista:17485</a>.'
  ieee: 'E. Frantar, “Compressing large neural networks : Algorithms, systems and
    scaling laws,” Institute of Science and Technology Austria, 2024.'
  ista: 'Frantar E. 2024. Compressing large neural networks : Algorithms, systems
    and scaling laws. Institute of Science and Technology Austria.'
  mla: 'Frantar, Elias. <i>Compressing Large Neural Networks : Algorithms, Systems
    and Scaling Laws</i>. Institute of Science and Technology Austria, 2024, doi:<a
    href="https://doi.org/10.15479/at:ista:17485">10.15479/at:ista:17485</a>.'
  short: 'E. Frantar, Compressing Large Neural Networks : Algorithms, Systems and
    Scaling Laws, Institute of Science and Technology Austria, 2024.'
corr_author: '1'
date_created: 2024-09-02T11:01:48Z
date_published: 2024-09-05T00:00:00Z
date_updated: 2026-06-18T17:58:39Z
day: '05'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
doi: 10.15479/at:ista:17485
ec_funded: 1
file:
- access_level: closed
  checksum: 5d785645805a78c5b4ce7cc3df557b09
  content_type: application/zip
  creator: efrantar
  date_created: 2024-09-05T12:04:11Z
  date_updated: 2024-09-05T12:04:11Z
  file_id: '17570'
  file_name: thesis-final.zip
  file_size: 1615167
  relation: source_file
- access_level: open_access
  checksum: a9dd1c2d23734986924eb44ebb55fd8f
  content_type: application/pdf
  creator: efrantar
  date_created: 2024-09-06T16:24:59Z
  date_updated: 2024-09-06T16:24:59Z
  file_id: '17880'
  file_name: frantar_thesis_final.pdf
  file_size: 2376611
  relation: main_file
  success: 1
file_date_updated: 2024-09-06T16:24:59Z
has_accepted_license: '1'
language:
- iso: eng
month: '09'
oa: 1
oa_version: Published Version
page: '129'
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '17378'
    relation: part_of_dissertation
    status: public
  - id: '17087'
    relation: part_of_dissertation
    status: public
  - id: '14458'
    relation: part_of_dissertation
    status: public
  - id: '18061'
    relation: part_of_dissertation
    status: public
  - id: '18062'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: 'Compressing large neural networks : Algorithms, systems and scaling laws'
type: dissertation
user_id: ba8df636-2132-11f1-aed0-ed93e2281fdd
year: '2024'
...
---
OA_place: publisher
_id: '17490'
abstract:
- lang: eng
  text: "Deep learning is essential in numerous applications nowadays, with many recent
    advancements made possible by training very large models. Despite their broad
    applicability, training neural networks is often time-intensive, and it is usually
    impractical to manage large models and datasets on a single machine. To address
    these issues, distributed deep learning training has become increasingly important.
    However, distributed training requires synchronization among nodes, and the mini-batch
    stochastic gradient descent algorithm places a significant load on network connections.
    A possible solution to tackle the synchronization bottleneck is to reduce a message
    size by lossy compression.\r\n\r\nIn this thesis, we investigate systems and algorithmic
    approaches to communication compression during training. From the systems perspective,
    we demonstrate that a common approach of expensive hardware overprovisioning can
    be replaced through a thorough system design. We introduce a framework that introduces
    efficient software support for compressed communication in machine learning applications,
    applicable to both multi-GPU single-node training and larger-scale multi-node
    training. Our framework integrates with popular ML frameworks, providing up to
    3x speedups for multi-GPU nodes based on commodity hardware and order-of-magnitude
    improvements in the multi-node setting, with negligible impact on accuracy.\r\n\r\nAlso,
    we consider an application of our framework to different communication schemes,
    such as Fully Sharded Data Parallel. We provide strong convergence guarantees
    for the compression in such a setup. Empirical validation shows that our method
    preserves model accuracy for GPT-family models with up to 1.3 billion parameters,
    while completely removing the communication bottlenecks of non-compressed alternatives,
    providing up to 2.2x speedups end-to-end.\r\n\r\nFrom the algorithmic side, we
    propose a general framework that dynamically adjusts the degree of compression
    across a model's layers during training. This approach enhances overall compression
    and results in significant speedups without compromising accuracy. Our algorithm
    utilizes an adaptive algorithm that automatically selects the optimal compression
    parameters for model layers, ensuring the best compression ratio while adhering
    to an error constraint. Our method is effective across all existing families of
    compression methods. It achieves up to 2.5x faster training and up to a 5x improvement
    in compression compared to efficient implementations of current approaches. Additionally,
    LGreCo can complement existing adaptive algorithms.\r\n"
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Ilia
  full_name: Markov, Ilia
  id: D0CF4148-C985-11E9-8066-0BDEE5697425
  last_name: Markov
citation:
  ama: 'Markov I. Communication-efficient distributed training of deep neural networks :
    An algorithms and systems perspective. 2024. doi:<a href="https://doi.org/10.15479/at:ista:17490">10.15479/at:ista:17490</a>'
  apa: 'Markov, I. (2024). <i>Communication-efficient distributed training of deep
    neural networks : An algorithms and systems perspective</i>. Institute of Science
    and Technology Austria. <a href="https://doi.org/10.15479/at:ista:17490">https://doi.org/10.15479/at:ista:17490</a>'
  chicago: 'Markov, Ilia. “Communication-Efficient Distributed Training of Deep Neural
    Networks : An Algorithms and Systems Perspective.” Institute of Science and Technology
    Austria, 2024. <a href="https://doi.org/10.15479/at:ista:17490">https://doi.org/10.15479/at:ista:17490</a>.'
  ieee: 'I. Markov, “Communication-efficient distributed training of deep neural networks :
    An algorithms and systems perspective,” Institute of Science and Technology Austria,
    2024.'
  ista: 'Markov I. 2024. Communication-efficient distributed training of deep neural
    networks : An algorithms and systems perspective. Institute of Science and Technology
    Austria.'
  mla: 'Markov, Ilia. <i>Communication-Efficient Distributed Training of Deep Neural
    Networks : An Algorithms and Systems Perspective</i>. Institute of Science and
    Technology Austria, 2024, doi:<a href="https://doi.org/10.15479/at:ista:17490">10.15479/at:ista:17490</a>.'
  short: 'I. Markov, Communication-Efficient Distributed Training of Deep Neural Networks :
    An Algorithms and Systems Perspective, Institute of Science and Technology Austria,
    2024.'
corr_author: '1'
date_created: 2024-09-04T08:51:11Z
date_published: 2024-09-04T00:00:00Z
date_updated: 2026-06-18T17:55:23Z
day: '04'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
doi: 10.15479/at:ista:17490
ec_funded: 1
file:
- access_level: closed
  checksum: 77609f4835d2730e46fa0d42d9134ed9
  content_type: application/x-zip-compressed
  creator: imarkov
  date_created: 2024-09-04T08:35:35Z
  date_updated: 2024-09-04T08:35:35Z
  file_id: '17491'
  file_name: Thesis.zip
  file_size: 43327753
  relation: source_file
- access_level: open_access
  checksum: 9e68f7217570f756ceb8f70b980938cd
  content_type: application/pdf
  creator: imarkov
  date_created: 2024-09-04T08:36:06Z
  date_updated: 2024-09-04T08:36:06Z
  file_id: '17492'
  file_name: Thesis_final_version_pdfa2.pdf
  file_size: 2756082
  relation: main_file
  success: 1
file_date_updated: 2024-09-04T08:36:06Z
has_accepted_license: '1'
language:
- iso: eng
month: '09'
oa: 1
oa_version: Published Version
page: '102'
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '14461'
    relation: part_of_dissertation
    status: public
  - id: '12780'
    relation: part_of_dissertation
    status: public
  - id: '17456'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: 'Communication-efficient distributed training of deep neural networks : An
  algorithms and systems perspective'
tmp:
  image: /images/cc_by_nc_sa.png
  legal_code_url: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
  name: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC
    BY-NC-SA 4.0)
  short: CC BY-NC-SA (4.0)
type: dissertation
user_id: ba8df636-2132-11f1-aed0-ed93e2281fdd
year: '2024'
...
---
_id: '12566'
abstract:
- lang: eng
  text: "Approximate agreement is one of the few variants of consensus that can be
    solved in a wait-free manner in asynchronous systems where processes communicate
    by reading and writing to shared memory. In this work, we consider a natural generalisation
    of approximate agreement on arbitrary undirected connected graphs. Each process
    is given a node of the graph as input and, if non-faulty, must output a node such
    that\r\n– all the outputs are within distance 1 of one another, and\r\n– each
    output value lies on a shortest path between two input values.\r\nFrom prior work,
    it is known that there is no wait-free algorithm among  processes for this problem
    on any cycle of length , by reduction from 2-set agreement (Castañeda et al.,
    2018).\r\n\r\nIn this work, we investigate the solvability of this task on general
    graphs. We give a new, direct proof of the impossibility of approximate agreement
    on cycles of length , via a generalisation of Sperner's Lemma to convex polygons.
    We also extend the reduction from 2-set agreement to a larger class of graphs,
    showing that approximate agreement on these graphs is unsolvable. On the positive
    side, we present a wait-free algorithm for a different class of graphs, which
    properly contains the class of chordal graphs."
acknowledgement: This project has received funding from the European Research Council
  (ERC) under the European Union’s Horizon 2020 research and innovation programme
  (grant agreement No. 805223 ScaleML) and under the Marie Skłodowska-Curie grant
  agreement No. 840605 and from the Natural Sciences and Engineering Research Council
  of Canada grant RGPIN-2020-04178. Part of this work was done while Faith Ellen was
  visiting IST Austria.
article_number: '113733'
article_processing_charge: Yes (via OA deal)
article_type: original
author:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Faith
  full_name: Ellen, Faith
  last_name: Ellen
- first_name: Joel
  full_name: Rybicki, Joel
  id: 334EFD2E-F248-11E8-B48F-1D18A9856A87
  last_name: Rybicki
  orcid: 0000-0002-6432-6646
citation:
  ama: Alistarh D-A, Ellen F, Rybicki J. Wait-free approximate agreement on graphs.
    <i>Theoretical Computer Science</i>. 2023;948(2). doi:<a href="https://doi.org/10.1016/j.tcs.2023.113733">10.1016/j.tcs.2023.113733</a>
  apa: Alistarh, D.-A., Ellen, F., &#38; Rybicki, J. (2023). Wait-free approximate
    agreement on graphs. <i>Theoretical Computer Science</i>. Elsevier. <a href="https://doi.org/10.1016/j.tcs.2023.113733">https://doi.org/10.1016/j.tcs.2023.113733</a>
  chicago: Alistarh, Dan-Adrian, Faith Ellen, and Joel Rybicki. “Wait-Free Approximate
    Agreement on Graphs.” <i>Theoretical Computer Science</i>. Elsevier, 2023. <a
    href="https://doi.org/10.1016/j.tcs.2023.113733">https://doi.org/10.1016/j.tcs.2023.113733</a>.
  ieee: D.-A. Alistarh, F. Ellen, and J. Rybicki, “Wait-free approximate agreement
    on graphs,” <i>Theoretical Computer Science</i>, vol. 948, no. 2. Elsevier, 2023.
  ista: Alistarh D-A, Ellen F, Rybicki J. 2023. Wait-free approximate agreement on
    graphs. Theoretical Computer Science. 948(2), 113733.
  mla: Alistarh, Dan-Adrian, et al. “Wait-Free Approximate Agreement on Graphs.” <i>Theoretical
    Computer Science</i>, vol. 948, no. 2, 113733, Elsevier, 2023, doi:<a href="https://doi.org/10.1016/j.tcs.2023.113733">10.1016/j.tcs.2023.113733</a>.
  short: D.-A. Alistarh, F. Ellen, J. Rybicki, Theoretical Computer Science 948 (2023).
corr_author: '1'
date_created: 2023-02-19T23:00:55Z
date_published: 2023-02-28T00:00:00Z
date_updated: 2025-04-14T07:49:16Z
day: '28'
ddc:
- '000'
department:
- _id: DaAl
doi: 10.1016/j.tcs.2023.113733
ec_funded: 1
external_id:
  isi:
  - '000934262700001'
file:
- access_level: open_access
  checksum: b27c5290f2f1500c403494364ee39c9f
  content_type: application/pdf
  creator: dernst
  date_created: 2023-02-20T07:30:20Z
  date_updated: 2023-02-20T07:30:20Z
  file_id: '12570'
  file_name: 2023_TheoreticalCompScience_Alistarh.pdf
  file_size: 602333
  relation: main_file
  success: 1
file_date_updated: 2023-02-20T07:30:20Z
has_accepted_license: '1'
intvolume: '       948'
isi: 1
issue: '2'
language:
- iso: eng
month: '02'
oa: 1
oa_version: Published Version
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
- _id: 26A5D39A-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '840605'
  name: Coordination in constrained and natural distributed systems
publication: Theoretical Computer Science
publication_identifier:
  issn:
  - 0304-3975
publication_status: published
publisher: Elsevier
quality_controlled: '1'
scopus_import: '1'
status: public
title: Wait-free approximate agreement on graphs
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 948
year: '2023'
...
---
_id: '13053'
abstract:
- lang: eng
  text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or
    quantization, before they can be deployed in practical settings. In this work
    we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization
    step in a principled way, in order to produce models whose local loss behavior
    is stable under compression operations such as pruning. Thus, dense models trained
    via CrAM should be compressible post-training, in a single step, without significant
    accuracy loss. Experimental results on standard benchmarks, such as residual networks
    for ImageNet classification and BERT models for language modelling, show that
    CrAM produces dense models that can be more accurate than the standard SGD/Adam-based
    baselines, but which are stable under weight pruning: specifically, we can prune
    models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%
    with reasonable (∼1%) accuracy loss, which is competitive with gradual compression
    methods. Additionally, CrAM can produce sparse models which perform well for transfer
    learning, and it also works for semi-structured 2:4 pruning patterns supported
    by GPU hardware. The code for reproducing the results is available at this https
    URL .'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC)
  under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant
  agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale
  de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further
  acknowledge the support from the Scientific Service Units (SSU) of ISTA through
  resources provided by Scientific Computing (SciComp)."
article_processing_charge: No
arxiv: 1
author:
- first_name: Elena-Alexandra
  full_name: Peste, Elena-Alexandra
  id: 32D78294-F248-11E8-B48F-1D18A9856A87
  last_name: Peste
- first_name: Adrian
  full_name: Vladu, Adrian
  last_name: Vladu
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Krumes A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
    Minimizer. In: <i>11th International Conference on Learning Representations </i>.
    OpenReview; 2023.'
  apa: 'Krumes, A., Vladu, A., Kurtic, E., Lampert, C., &#38; Alistarh, D.-A. (2023).
    CrAM: A Compression-Aware Minimizer. In <i>11th International Conference on Learning
    Representations </i>. Kigali, Rwanda : OpenReview.'
  chicago: 'Krumes, Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert, and
    Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In <i>11th International
    Conference on Learning Representations </i>. OpenReview, 2023.'
  ieee: 'A. Krumes, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM: A
    Compression-Aware Minimizer,” in <i>11th International Conference on Learning
    Representations </i>, Kigali, Rwanda , 2023.'
  ista: 'Krumes A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. 2023. CrAM: A Compression-Aware
    Minimizer. 11th International Conference on Learning Representations . ICLR: International
    Conference on Learning Representations.'
  mla: 'Krumes, Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” <i>11th International
    Conference on Learning Representations </i>, OpenReview, 2023.'
  short: A. Krumes, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International
    Conference on Learning Representations , OpenReview, 2023.
conference:
  end_date: 2023-05-05
  location: 'Kigali, Rwanda '
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2023-05-01
corr_author: '1'
date_created: 2023-05-23T11:36:18Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2026-04-07T13:30:19Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
ec_funded: 1
external_id:
  arxiv:
  - '2207.14200'
file:
- access_level: open_access
  checksum: a6eec897e13a91cdc3eeaf309801752c
  content_type: application/pdf
  creator: dernst
  date_created: 2024-07-22T09:09:45Z
  date_updated: 2024-07-22T09:09:45Z
  file_id: '17294'
  file_name: 2023_ICLR_Peste.pdf
  file_size: 458201
  relation: main_file
  success: 1
file_date_updated: 2024-07-22T09:09:45Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/pdf?id=_eTZBs-yedr
month: '05'
oa: 1
oa_version: Published Version
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: published
publisher: OpenReview
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/IST-DASLab/CrAM
  record:
  - id: '13074'
    relation: dissertation_contains
    status: public
status: public
title: 'CrAM: A Compression-Aware Minimizer'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
OA_place: publisher
_id: '13074'
abstract:
- lang: eng
  text: "Deep learning has become an integral part of a large number of important
    applications, and many of the recent breakthroughs have been enabled by the ability
    to train very large models, capable to capture complex patterns and relationships
    from the data. At the same time, the massive sizes of modern deep learning models
    have made their deployment to smaller devices more challenging; this is particularly
    important, as in many applications the users rely on accurate deep learning predictions,
    but they only have access to devices with limited memory and compute power. One
    solution to this problem is to prune neural networks, by setting as many of their
    parameters as possible to zero, to obtain accurate sparse models with lower memory
    footprint. Despite the great research progress in obtaining sparse models that
    preserve accuracy, while satisfying memory and computational constraints, there
    are still many challenges associated with efficiently training sparse models,
    as well as understanding their generalization properties.\r\n\r\nThe focus of
    this thesis is to investigate how the training process of sparse models can be
    made more efficient, and to understand the differences between sparse and dense
    models in terms of how well they can generalize to changes in the data distribution.
    We first study a method for co-training sparse and dense models, at a lower cost
    compared to regular training. With our method we can obtain very accurate sparse
    networks, and dense models that can recover the baseline accuracy. Furthermore,
    we are able to more easily analyze the differences, at prediction level, between
    the sparse-dense model pairs. Next, we investigate the generalization properties
    of sparse neural networks in more detail, by studying how well different sparse
    models trained on a larger task can adapt to smaller, more specialized tasks,
    in a transfer learning scenario. Our analysis across multiple pruning methods
    and sparsity levels reveals that sparse models provide features that can transfer
    similarly to or better than the dense baseline. However, the choice of the pruning
    method plays an important role, and can influence the results when the features
    are fixed (linear finetuning), or when they are allowed to adapt to the new task
    (full finetuning). Using sparse models with fixed masks for finetuning on new
    tasks has an important practical advantage, as it enables training neural networks
    on smaller devices. However, one drawback of current pruning methods is that the
    entire training cycle has to be repeated to obtain the initial sparse model, for
    every sparsity target; in consequence, the entire training process is costly and
    also multiple models need to be stored. In the last part of the thesis we propose
    a method that can train accurate dense models that are compressible in a single
    step, to multiple sparsity levels, without additional finetuning. Our method results
    in sparse models that can be competitive with existing pruning methods, and which
    can also successfully generalize to new tasks."
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Elena-Alexandra
  full_name: Peste, Elena-Alexandra
  id: 32D78294-F248-11E8-B48F-1D18A9856A87
  last_name: Peste
citation:
  ama: Krumes A. Efficiency and generalization of sparse neural networks. 2023. doi:<a
    href="https://doi.org/10.15479/at:ista:13074">10.15479/at:ista:13074</a>
  apa: Krumes, A. (2023). <i>Efficiency and generalization of sparse neural networks</i>.
    Institute of Science and Technology Austria. <a href="https://doi.org/10.15479/at:ista:13074">https://doi.org/10.15479/at:ista:13074</a>
  chicago: Krumes, Alexandra. “Efficiency and Generalization of Sparse Neural Networks.”
    Institute of Science and Technology Austria, 2023. <a href="https://doi.org/10.15479/at:ista:13074">https://doi.org/10.15479/at:ista:13074</a>.
  ieee: A. Krumes, “Efficiency and generalization of sparse neural networks,” Institute
    of Science and Technology Austria, 2023.
  ista: Krumes A. 2023. Efficiency and generalization of sparse neural networks. Institute
    of Science and Technology Austria.
  mla: Krumes, Alexandra. <i>Efficiency and Generalization of Sparse Neural Networks</i>.
    Institute of Science and Technology Austria, 2023, doi:<a href="https://doi.org/10.15479/at:ista:13074">10.15479/at:ista:13074</a>.
  short: A. Krumes, Efficiency and Generalization of Sparse Neural Networks, Institute
    of Science and Technology Austria, 2023.
corr_author: '1'
date_created: 2023-05-23T17:07:53Z
date_published: 2023-05-23T00:00:00Z
date_updated: 2026-06-18T17:18:20Z
day: '23'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
doi: 10.15479/at:ista:13074
ec_funded: 1
file:
- access_level: open_access
  checksum: 6b3354968403cb9d48cc5a83611fb571
  content_type: application/pdf
  creator: epeste
  date_created: 2023-05-24T16:11:16Z
  date_updated: 2023-05-24T16:11:16Z
  file_id: '13087'
  file_name: PhD_Thesis_Alexandra_Peste_final.pdf
  file_size: 2152072
  relation: main_file
  success: 1
- access_level: closed
  checksum: 8d0df94bbcf4db72c991f22503b3fd60
  content_type: application/zip
  creator: epeste
  date_created: 2023-05-24T16:12:59Z
  date_updated: 2023-05-24T16:12:59Z
  file_id: '13088'
  file_name: PhD_Thesis_APeste.zip
  file_size: 1658293
  relation: source_file
file_date_updated: 2023-05-24T16:12:59Z
has_accepted_license: '1'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: '147'
project:
- _id: 2564DBCA-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '665385'
  name: International IST Doctoral Program
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '13053'
    relation: part_of_dissertation
    status: public
  - id: '12299'
    relation: part_of_dissertation
    status: public
  - id: '11458'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: Efficiency and generalization of sparse neural networks
type: dissertation
user_id: ba8df636-2132-11f1-aed0-ed93e2281fdd
year: '2023'
...
---
_id: '14364'
abstract:
- lang: eng
  text: We introduce extension-based proofs, a class of impossibility proofs that
    includes valency arguments. They are modelled as an interaction between a prover
    and a protocol. Using proofs based on combinatorial topology, it has been shown
    that it is impossible to deterministically solve -set agreement among  processes
    or approximate agreement on a cycle of length 4 among  processes in a wait-free
    manner in asynchronous models where processes communicate using objects that can
    be constructed from shared registers. However, it was unknown whether proofs based
    on simpler techniques were possible. We show that these impossibility results
    cannot be obtained by extension-based proofs in the iterated snapshot model and,
    hence, extension-based proofs are limited in power.
acknowledgement: "We would like to thank Valerie King, Toniann Pitassi, and Michael
  Saks for helpful discussions and Shi Hao Liu for his useful feedback.\r\nThis research
  was supported by the Natural Science and Engineering Research Council of Canada
  under grants RGPIN-2015-05080 and RGPIN-2020-04178, a postgraduate scholarship,
  and a postdoctoral fellowship; a University of Toronto postdoctoral fellowship;
  the National Science Foundation under grants CCF-1217921, CCF-1301926, CCF-1637385,
  CCF-1650596, and IIS-1447786; the U.S. Department of Energy under grant ER26116/DE-SC0008923;
  the European Research Council (ERC) under the European Union’s Horizon 2020 research
  and innovation programme grant agreement 805223 ScaleML; and the Oracle and Intel
  corporations. Some of the work on this paper was done while Faith Ellen was visiting
  IST Austria."
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: James
  full_name: Aspnes, James
  last_name: Aspnes
- first_name: Faith
  full_name: Ellen, Faith
  last_name: Ellen
- first_name: Rati
  full_name: Gelashvili, Rati
  last_name: Gelashvili
- first_name: Leqi
  full_name: Zhu, Leqi
  id: a2117c59-cee4-11ed-b9d0-874ecf0f8ac5
  last_name: Zhu
citation:
  ama: Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. Why extension-based proofs
    fail. <i>SIAM Journal on Computing</i>. 2023;52(4):913-944. doi:<a href="https://doi.org/10.1137/20M1375851">10.1137/20M1375851</a>
  apa: Alistarh, D.-A., Aspnes, J., Ellen, F., Gelashvili, R., &#38; Zhu, L. (2023).
    Why extension-based proofs fail. <i>SIAM Journal on Computing</i>. Society for
    Industrial and Applied Mathematics. <a href="https://doi.org/10.1137/20M1375851">https://doi.org/10.1137/20M1375851</a>
  chicago: Alistarh, Dan-Adrian, James Aspnes, Faith Ellen, Rati Gelashvili, and Leqi
    Zhu. “Why Extension-Based Proofs Fail.” <i>SIAM Journal on Computing</i>. Society
    for Industrial and Applied Mathematics, 2023. <a href="https://doi.org/10.1137/20M1375851">https://doi.org/10.1137/20M1375851</a>.
  ieee: D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, and L. Zhu, “Why extension-based
    proofs fail,” <i>SIAM Journal on Computing</i>, vol. 52, no. 4. Society for Industrial
    and Applied Mathematics, pp. 913–944, 2023.
  ista: Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. 2023. Why extension-based
    proofs fail. SIAM Journal on Computing. 52(4), 913–944.
  mla: Alistarh, Dan-Adrian, et al. “Why Extension-Based Proofs Fail.” <i>SIAM Journal
    on Computing</i>, vol. 52, no. 4, Society for Industrial and Applied Mathematics,
    2023, pp. 913–44, doi:<a href="https://doi.org/10.1137/20M1375851">10.1137/20M1375851</a>.
  short: D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, L. Zhu, SIAM Journal
    on Computing 52 (2023) 913–944.
date_created: 2023-09-24T22:01:11Z
date_published: 2023-07-25T00:00:00Z
date_updated: 2025-05-14T11:26:06Z
day: '25'
department:
- _id: DaAl
doi: 10.1137/20M1375851
ec_funded: 1
external_id:
  arxiv:
  - '1811.01421'
  isi:
  - '001082972300004'
intvolume: '        52'
isi: 1
issue: '4'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/1811.01421
month: '07'
oa: 1
oa_version: Preprint
page: 913-944
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: SIAM Journal on Computing
publication_identifier:
  eissn:
  - 1095-7111
  issn:
  - 0097-5397
publication_status: published
publisher: Society for Industrial and Applied Mathematics
quality_controlled: '1'
related_material:
  record:
  - id: '6676'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: Why extension-based proofs fail
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 52
year: '2023'
...
---
_id: '14458'
abstract:
- lang: eng
  text: 'We show for the first time that large-scale generative pretrained transformer
    (GPT) family models can be pruned to at least 50% sparsity in one-shot, without
    any retraining, at minimal loss of accuracy. This is achieved via a new pruning
    method called SparseGPT, specifically designed to work efficiently and accurately
    on massive GPT-family models. We can execute SparseGPT on the largest available
    open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach
    60% unstructured sparsity with negligible increase in perplexity: remarkably,
    more than 100 billion weights from these models can be ignored at inference time.
    SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible
    with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: The authors gratefully acknowledge funding from the European Research
  Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement
  No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from
  the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and
  Alois Schloegl.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately
    pruned in one-shot. In: <i>Proceedings of the 40th International Conference on
    Machine Learning</i>. Vol 202. ML Research Press; 2023:10323-10337.'
  apa: 'Frantar, E., &#38; Alistarh, D.-A. (2023). SparseGPT: Massive language models
    can be accurately pruned in one-shot. In <i>Proceedings of the 40th International
    Conference on Machine Learning</i> (Vol. 202, pp. 10323–10337). Honolulu, Hawaii,
    HI, United States: ML Research Press.'
  chicago: 'Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language
    Models Can Be Accurately Pruned in One-Shot.” In <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, 202:10323–37. ML Research Press, 2023.'
  ieee: 'E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be
    accurately pruned in one-shot,” in <i>Proceedings of the 40th International Conference
    on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202,
    pp. 10323–10337.'
  ista: 'Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be
    accurately pruned in one-shot. Proceedings of the 40th International Conference
    on Machine Learning. ICML: International Conference on Machine Learning, PMLR,
    vol. 202, 10323–10337.'
  mla: 'Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models
    Can Be Accurately Pruned in One-Shot.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 10323–37.'
  short: E. Frantar, D.-A. Alistarh, in:, Proceedings of the 40th International Conference
    on Machine Learning, ML Research Press, 2023, pp. 10323–10337.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
corr_author: '1'
date_created: 2023-10-29T23:01:16Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2026-04-07T12:43:03Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2301.00774'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2301.00774
month: '07'
oa: 1
oa_version: Preprint
page: 10323-10337
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17485'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: 'SparseGPT: Massive language models can be accurately pruned in one-shot'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14460'
abstract:
- lang: eng
  text: We provide an efficient implementation of the backpropagation algorithm, specialized
    to the case where the weights of the neural network being trained are sparse.
    Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and
    common layer types (e.g., convolutional or linear). We provide a fast vectorized
    implementation on commodity CPUs, and show that it can yield speedups in end-to-end
    runtime experiments, both in transfer learning using already-sparsified networks,
    and in training sparse networks from scratch. Thus, our results provide the first
    support for sparse training on commodity hardware.
acknowledgement: 'We would like to thank Elias Frantar for his valuable assistance
  and support at the outset of this project, and the anonymous ICML and SNN reviewers
  for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant
  agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant
  805223 ScaleML. '
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Mahdi
  full_name: Nikdan, Mahdi
  id: 66374281-f394-11eb-9cf6-869147deecc0
  last_name: Nikdan
- first_name: Tommaso
  full_name: Pegolotti, Tommaso
  last_name: Pegolotti
- first_name: Eugenia B
  full_name: Iofinova, Eugenia B
  id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
  last_name: Iofinova
  orcid: 0000-0002-7778-3221
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient
    sparse backpropagation for faster training of neural networks at the edge. In:
    <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol
    202. ML Research Press; 2023:26215-26227.'
  apa: 'Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., &#38; Alistarh, D.-A.
    (2023). SparseProp: Efficient sparse backpropagation for faster training of neural
    networks at the edge. In <i>Proceedings of the 40th International Conference on
    Machine Learning</i> (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United
    States: ML Research Press.'
  chicago: 'Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and
    Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster
    Training of Neural Networks at the Edge.” In <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, 202:26215–27. ML Research Press, 2023.'
  ieee: 'M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp:
    Efficient sparse backpropagation for faster training of neural networks at the
    edge,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227.'
  ista: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp:
    Efficient sparse backpropagation for faster training of neural networks at the
    edge. Proceedings of the 40th International Conference on Machine Learning. ICML:
    International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.'
  mla: 'Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster
    Training of Neural Networks at the Edge.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 26215–27.'
  short: M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings
    of the 40th International Conference on Machine Learning, ML Research Press, 2023,
    pp. 26215–26227.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
corr_author: '1'
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2025-04-14T07:49:12Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2302.04852'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2302.04852
month: '07'
oa: 1
oa_version: Preprint
page: 26215-26227
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SparseProp: Efficient sparse backpropagation for faster training of neural
  networks at the edge'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14461'
abstract:
- lang: eng
  text: 'Communication-reduction techniques are a popular way to improve scalability
    in data-parallel training of deep neural networks (DNNs). The recent emergence
    of large language models such as GPT has created the need for new approaches to
    exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training
    is highly popular, yet it still encounters scalability bottlenecks. One reason
    is that applying compression techniques to FSDP is challenging: as the vast majority
    of the communication involves the model’s weights, direct compression alters convergence
    and leads to accuracy loss. We present QSDP, a variant of FSDP which supports
    both gradient and weight quantization with theoretical guarantees, is simple to
    implement and has essentially no overheads. To derive QSDP we prove that a natural
    modification of SGD achieves convergence even when we only maintain quantized
    weights, and thus the domain over which we train consists of quantized points
    and is, therefore, highly non-convex. We validate this approach by training GPT-family
    models with up to 1.3 billion parameters on a multi-node cluster. Experiments
    show that QSDP preserves model accuracy, while completely removing the communication
    bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x.'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: The authors gratefully acknowledge funding from the European Research
  Council (ERC) under the European Union’s Horizon 2020 research and innovation programme
  (grant agreement No 805223 ScaleML), as well as experimental support from the IST
  Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois
  Schloegl. AV acknowledges the support of the French Agence Nationale de la Recherche
  (ANR), under grant ANR-21-CE48-0016 (project COMCOPT), the support of Fondation
  Hadamard with a PRMO grant, and the support of CNRS with a CoopIntEER IEA grant
  (project ALFRED).
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Ilia
  full_name: Markov, Ilia
  id: D0CF4148-C985-11E9-8066-0BDEE5697425
  last_name: Markov
- first_name: Adrian
  full_name: Vladu, Adrian
  last_name: Vladu
- first_name: Qi
  full_name: Guo, Qi
  last_name: Guo
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Markov I, Vladu A, Guo Q, Alistarh D-A. Quantized distributed training of
    large models with convergence guarantees. In: <i>Proceedings of the 40th International
    Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:24020-24044.'
  apa: 'Markov, I., Vladu, A., Guo, Q., &#38; Alistarh, D.-A. (2023). Quantized distributed
    training of large models with convergence guarantees. In <i>Proceedings of the
    40th International Conference on Machine Learning</i> (Vol. 202, pp. 24020–24044).
    Honolulu, Hawaii, HI, United States: ML Research Press.'
  chicago: Markov, Ilia, Adrian Vladu, Qi Guo, and Dan-Adrian Alistarh. “Quantized
    Distributed Training of Large Models with Convergence Guarantees.” In <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, 202:24020–44. ML
    Research Press, 2023.
  ieee: I. Markov, A. Vladu, Q. Guo, and D.-A. Alistarh, “Quantized distributed training
    of large models with convergence guarantees,” in <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023,
    vol. 202, pp. 24020–24044.
  ista: 'Markov I, Vladu A, Guo Q, Alistarh D-A. 2023. Quantized distributed training
    of large models with convergence guarantees. Proceedings of the 40th International
    Conference on Machine Learning. ICML: International Conference on Machine Learning,
    PMLR, vol. 202, 24020–24044.'
  mla: Markov, Ilia, et al. “Quantized Distributed Training of Large Models with Convergence
    Guarantees.” <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    vol. 202, ML Research Press, 2023, pp. 24020–44.
  short: I. Markov, A. Vladu, Q. Guo, D.-A. Alistarh, in:, Proceedings of the 40th
    International Conference on Machine Learning, ML Research Press, 2023, pp. 24020–24044.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
corr_author: '1'
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2026-04-07T13:00:54Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2302.02390'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2302.02390
month: '07'
oa: 1
oa_version: Preprint
page: 24020-24044
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17490'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: Quantized distributed training of large models with convergence guarantees
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14771'
abstract:
- lang: eng
  text: Pruning—that is, setting a significant subset of the parameters of a neural
    network to zero—is one of the most popular methods of model compression. Yet,
    several recent works have raised the issue that pruning may induce or exacerbate
    bias in the output of the compressed model. Despite existing evidence for this
    phenomenon, the relationship between neural network pruning and induced bias is
    not well-understood. In this work, we systematically investigate and characterize
    this phenomenon in Convolutional Neural Networks for computer vision. First, we
    show that it is in fact possible to obtain highly-sparse models, e.g. with less
    than 10% remaining weights, which do not decrease in accuracy nor substantially
    increase in bias when compared to dense models. At the same time, we also find
    that, at higher sparsities, pruned models exhibit higher uncertainty in their
    outputs, as well as increased correlations, which we directly link to increased
    bias. We propose easy-to-use criteria which, based only on the uncompressed model,
    establish whether bias will increase with pruning, and identify the samples most
    susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias.
acknowledgement: The authors would like to sincerely thank Sara Hooker for her feedback
  during the development of this work. EI was supported in part by the FWF DK VGSCO,
  grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via
  Starting Grant 805223 ScaleML.
article_processing_charge: No
arxiv: 1
author:
- first_name: Eugenia B
  full_name: Iofinova, Eugenia B
  id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
  last_name: Iofinova
  orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
  full_name: Peste, Elena-Alexandra
  id: 32D78294-F248-11E8-B48F-1D18A9856A87
  last_name: Peste
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Iofinova EB, Krumes A, Alistarh D-A. Bias in pruned vision models: In-depth
    analysis and countermeasures. In: <i>2023 IEEE/CVF Conference on Computer Vision
    and Pattern Recognition</i>. IEEE; 2023:24364-24373. doi:<a href="https://doi.org/10.1109/cvpr52729.2023.02334">10.1109/cvpr52729.2023.02334</a>'
  apa: 'Iofinova, E. B., Krumes, A., &#38; Alistarh, D.-A. (2023). Bias in pruned
    vision models: In-depth analysis and countermeasures. In <i>2023 IEEE/CVF Conference
    on Computer Vision and Pattern Recognition</i> (pp. 24364–24373). Vancouver, BC,
    Canada: IEEE. <a href="https://doi.org/10.1109/cvpr52729.2023.02334">https://doi.org/10.1109/cvpr52729.2023.02334</a>'
  chicago: 'Iofinova, Eugenia B, Alexandra Krumes, and Dan-Adrian Alistarh. “Bias
    in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In <i>2023 IEEE/CVF
    Conference on Computer Vision and Pattern Recognition</i>, 24364–73. IEEE, 2023.
    <a href="https://doi.org/10.1109/cvpr52729.2023.02334">https://doi.org/10.1109/cvpr52729.2023.02334</a>.'
  ieee: 'E. B. Iofinova, A. Krumes, and D.-A. Alistarh, “Bias in pruned vision models:
    In-depth analysis and countermeasures,” in <i>2023 IEEE/CVF Conference on Computer
    Vision and Pattern Recognition</i>, Vancouver, BC, Canada, 2023, pp. 24364–24373.'
  ista: 'Iofinova EB, Krumes A, Alistarh D-A. 2023. Bias in pruned vision models:
    In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision
    and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition,
    24364–24373.'
  mla: 'Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis
    and Countermeasures.” <i>2023 IEEE/CVF Conference on Computer Vision and Pattern
    Recognition</i>, IEEE, 2023, pp. 24364–73, doi:<a href="https://doi.org/10.1109/cvpr52729.2023.02334">10.1109/cvpr52729.2023.02334</a>.'
  short: E.B. Iofinova, A. Krumes, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference on
    Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373.
conference:
  end_date: 2023-06-24
  location: Vancouver, BC, Canada
  name: 'CVPR: Conference on Computer Vision and Pattern Recognition'
  start_date: 2023-06-17
corr_author: '1'
date_created: 2024-01-10T08:42:40Z
date_published: 2023-08-22T00:00:00Z
date_updated: 2026-05-19T11:20:27Z
day: '22'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52729.2023.02334
ec_funded: 1
external_id:
  arxiv:
  - '2304.12622'
  isi:
  - '001062531308068'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2304.12622
month: '08'
oa: 1
oa_version: Preprint
page: 24364-24373
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
  grant_number: W1260-N35
  name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
  eisbn:
  - '9798350301298'
  eissn:
  - 2575-7075
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/IST-DASLab/pruned-vision-model-bias
  record:
  - id: '21854'
    relation: dissertation_contains
    status: public
status: public
title: 'Bias in pruned vision models: In-depth analysis and countermeasures'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
OA_place: repository
OA_type: green
_id: '19983'
abstract:
- lang: eng
  text: 'The sinkless orientation problem plays a key role in understanding the foundations
    of distributed computing. The problem can be used to separate two fundamental
    models of distributed graph algorithms, LOCAL and SLOCAL: the locality of sinkless
    orientation is Ω(log n) in the deterministic LOCAL model and O(log log n) in the
    deterministic SLOCAL model. Both of these results are known by prior work, but
    here we give new simple, self-contained proofs for them.'
acknowledgement: We thank the anonymous reviewers for their helpful feedback on previous
  versions of this work. Parts ofthis work appeared in DISC 2021 as a brief announcement
  [ 21]. This work was supported in part by theEuropean Research Council (ERC) under
  the European Union’s Horizon 2020 research and innovationprogramme (grant agreement
  No 805223 ScaleML), the Academy of Finland (grant agreement No 333837),the Austrian
  Science Fund (FWF) and netIDEE (grant agreement No P 33775-N), and the AustrianScience
  Fund (FWF) project DELTA (grant agreement No I 5025-N).
article_processing_charge: No
arxiv: 1
author:
- first_name: Alkida
  full_name: Balliu, Alkida
  last_name: Balliu
- first_name: Janne
  full_name: Korhonen, Janne
  id: C5402D42-15BC-11E9-A202-CA2BE6697425
  last_name: Korhonen
- first_name: Fabian
  full_name: Kuhn, Fabian
  last_name: Kuhn
- first_name: Henrik
  full_name: Lievonen, Henrik
  last_name: Lievonen
- first_name: Dennis
  full_name: Olivetti, Dennis
  last_name: Olivetti
- first_name: Shreyas
  full_name: Pai, Shreyas
  last_name: Pai
- first_name: Ami
  full_name: Paz, Ami
  last_name: Paz
- first_name: Joel
  full_name: Rybicki, Joel
  id: 334EFD2E-F248-11E8-B48F-1D18A9856A87
  last_name: Rybicki
  orcid: 0000-0002-6432-6646
- first_name: Stefan
  full_name: Schmid, Stefan
  last_name: Schmid
- first_name: Jan
  full_name: Studený, Jan
  last_name: Studený
- first_name: Jukka
  full_name: Suomela, Jukka
  last_name: Suomela
- first_name: Jara
  full_name: Uitto, Jara
  last_name: Uitto
citation:
  ama: 'Balliu A, Korhonen J, Kuhn F, et al. Sinkless Orientation Made Simple. In:
    <i>Symposium on Simplicity in Algorithms</i>. 2023 Society for Industrial and
    Applied Mathematics; 2023:175-191. doi:<a href="https://doi.org/10.1137/1.9781611977585.ch17">10.1137/1.9781611977585.ch17</a>'
  apa: 'Balliu, A., Korhonen, J., Kuhn, F., Lievonen, H., Olivetti, D., Pai, S., …
    Uitto, J. (2023). Sinkless Orientation Made Simple. In <i>Symposium on Simplicity
    in Algorithms</i> (pp. 175–191). Florence, Italy: 2023 Society for Industrial
    and Applied Mathematics. <a href="https://doi.org/10.1137/1.9781611977585.ch17">https://doi.org/10.1137/1.9781611977585.ch17</a>'
  chicago: Balliu, Alkida, Janne Korhonen, Fabian Kuhn, Henrik Lievonen, Dennis Olivetti,
    Shreyas Pai, Ami Paz, et al. “Sinkless Orientation Made Simple.” In <i>Symposium
    on Simplicity in Algorithms</i>, 175–91. 2023 Society for Industrial and Applied
    Mathematics, 2023. <a href="https://doi.org/10.1137/1.9781611977585.ch17">https://doi.org/10.1137/1.9781611977585.ch17</a>.
  ieee: A. Balliu <i>et al.</i>, “Sinkless Orientation Made Simple,” in <i>Symposium
    on Simplicity in Algorithms</i>, 2023 Society for Industrial and Applied Mathematics,
    2023, pp. 175–191.
  ista: 'Balliu A, Korhonen J, Kuhn F, Lievonen H, Olivetti D, Pai S, Paz A, Rybicki
    J, Schmid S, Studený J, Suomela J, Uitto J. 2023.Sinkless Orientation Made Simple.
    In: Symposium on Simplicity in Algorithms. , 175–191.'
  mla: Balliu, Alkida, et al. “Sinkless Orientation Made Simple.” <i>Symposium on
    Simplicity in Algorithms</i>, 2023 Society for Industrial and Applied Mathematics,
    2023, pp. 175–91, doi:<a href="https://doi.org/10.1137/1.9781611977585.ch17">10.1137/1.9781611977585.ch17</a>.
  short: A. Balliu, J. Korhonen, F. Kuhn, H. Lievonen, D. Olivetti, S. Pai, A. Paz,
    J. Rybicki, S. Schmid, J. Studený, J. Suomela, J. Uitto, in:, Symposium on Simplicity
    in Algorithms, 2023 Society for Industrial and Applied Mathematics, 2023, pp.
    175–191.
conference:
  end_date: 2023-01-25
  location: Florence, Italy
  name: 'SOSA: Symposium on Simplicity in Algorithms'
  start_date: 2023-01-23
date_created: 2025-07-10T13:11:47Z
date_published: 2023-01-12T00:00:00Z
date_updated: 2025-09-24T09:24:20Z
day: '12'
department:
- _id: DaAl
doi: 10.1137/1.9781611977585.ch17
ec_funded: 1
external_id:
  arxiv:
  - '2108.02655'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2108.02655
month: '01'
oa: 1
oa_version: Preprint
page: 175-191
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
- _id: bd9e3a2e-d553-11ed-ba76-8aa684ce17fe
  grant_number: P33775
  name: Fast Algorithms for a Reactive Network Layer
publication: Symposium on Simplicity in Algorithms
publication_identifier:
  eisbn:
  - '9781611977585'
publication_status: published
publisher: 2023 Society for Industrial and Applied Mathematics
quality_controlled: '1'
status: public
title: Sinkless Orientation Made Simple
type: book_chapter
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '17378'
abstract:
- lang: eng
  text: Generative Pre-trained Transformer models, known as GPT or OPT, set themselves
    apart through breakthrough performance across complex language modelling tasks,
    but also by their extremely high computational and storage costs. Specifically,
    due to their massive size, even inference for large, highly-accurate GPT models
    may require multiple performant GPUs, which limits the usability of such models.
    While there is emerging work on relieving this pressure via model compression,
    the applicability and performance of existing compression techniques is limited
    by the scale and complexity of GPT models. In this paper, we address this challenge,
    and propose OPTQ, a new one-shot weight quantization method based on approximate
    second-order information, that is both highly-accurate and highly-efficient. Specifically,
    OPTQ can quantize GPT models with 175 billion parameters in approximately four
    GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible
    accuracy degradation relative to the uncompressed baseline. Our method more than
    doubles the compression gains relative to previously-proposed one-shot quantization
    methods, preserving accuracy, allowing us for the first time to execute an 175
    billion-parameter model inside a single GPU for generative inference. Moreover,
    we also show that our method can still provide reasonable accuracy in the extreme
    quantization regime, in which weights are quantized to 2-bit or even ternary quantization
    levels. We show experimentally that these improvements can be leveraged for end-to-end
    inference speedups over FP16, of around 3.25x when using high-end GPUs (NVIDIA
    A100) and 4.5x when using more cost-effective ones (NVIDIA A6000). The implementation
    is available at https://github.com/IST-DASLab/gptq.
acknowledged_ssus:
- _id: ScienComp
acknowledgement: Elias Frantar and Dan Alistarh gratefully acknowledge funding from
  the European Research Council (ERC) under the European Union’s Horizon 2020 programme
  (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar
  Kurtic, and from the IST Austria IT department, in particular Stefano Elefante,
  Andrei Hornoiu, and Alois Schloegl. The work of Saleh Ashkboos and Torsten Hoefler
  was supported by the PASC DaCeMI project, received EuroHPC-JU funding under grant
  MAELSTROM, No. 955513. We thank the Swiss National Supercomputing Center (CSCS)
  for supporting us with compute infrastructure.
article_processing_charge: No
author:
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Saleh
  full_name: Ashkboos, Saleh
  last_name: Ashkboos
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. OPTQ: Accurate post-training
    quantization for generative pre-trained transformers. In: <i>11th International
    Conference on Learning Representations </i>. International Conference on Learning
    Representations; 2023.'
  apa: 'Frantar, E., Ashkboos, S., Hoefler, T., &#38; Alistarh, D.-A. (2023). OPTQ:
    Accurate post-training quantization for generative pre-trained transformers. In
    <i>11th International Conference on Learning Representations </i>. Kigali, Rwanda:
    International Conference on Learning Representations.'
  chicago: 'Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh.
    “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.”
    In <i>11th International Conference on Learning Representations </i>. International
    Conference on Learning Representations, 2023.'
  ieee: 'E. Frantar, S. Ashkboos, T. Hoefler, and D.-A. Alistarh, “OPTQ: Accurate
    post-training quantization for generative pre-trained transformers,” in <i>11th
    International Conference on Learning Representations </i>, Kigali, Rwanda, 2023.'
  ista: 'Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. 2023. OPTQ: Accurate post-training
    quantization for generative pre-trained transformers. 11th International Conference
    on Learning Representations . ICLR: International Conference on Learning Representations.'
  mla: 'Frantar, Elias, et al. “OPTQ: Accurate Post-Training Quantization for Generative
    Pre-Trained Transformers.” <i>11th International Conference on Learning Representations
    </i>, International Conference on Learning Representations, 2023.'
  short: E. Frantar, S. Ashkboos, T. Hoefler, D.-A. Alistarh, in:, 11th International
    Conference on Learning Representations , International Conference on Learning
    Representations, 2023.
conference:
  end_date: 2023-05-05
  location: Kigali, Rwanda
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2023-05-01
corr_author: '1'
date_created: 2024-08-04T22:01:22Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2026-04-07T12:43:03Z
day: '01'
ddc:
- '000'
department:
- _id: DaAl
ec_funded: 1
file:
- access_level: open_access
  checksum: aacbf11dbd8b02a3e0bfd942a33e0593
  content_type: application/pdf
  creator: dernst
  date_created: 2024-08-05T07:52:44Z
  date_updated: 2024-08-05T07:52:44Z
  file_id: '17385'
  file_name: 2023_ICLR_Frantar.pdf
  file_size: 437492
  relation: main_file
  success: 1
file_date_updated: 2024-08-05T07:52:44Z
has_accepted_license: '1'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: published
publisher: International Conference on Learning Representations
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/IST-DASLab/gptq
  record:
  - id: '17485'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: 'OPTQ: Accurate post-training quantization for generative pre-trained transformers'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '11180'
abstract:
- lang: eng
  text: "Designing and implementing efficient parallel priority schedulers is an active
    research area. An intriguing proposed design is the Multi-Queue: given n threads
    and m ≥ n distinct priority queues, task insertions are performed uniformly at
    random, while, to delete, a thread picks two queues uniformly at random, and removes
    the observed task of higher priority. This approach scales well, and has probabilistic
    rank guarantees: roughly, the rank of each task removed, relative to remaining
    tasks in all other queues, is O (m) in expectation. Yet, the performance of this
    pattern is below that of well-engineered schedulers, which eschew theoretical
    guarantees for practical efficiency.\r\n\r\nWe investigate whether it is possible
    to design and implement a Multi-Queue-based task scheduler that is both highly-efficient
    and has analytical guarantees. We propose a new variant called the Stealing Multi-Queue
    (SMQ), a cache-efficient variant of the Multi-Queue, which leverages both queue
    affinity---each thread has a local queue, from which tasks are usually removed;
    but, with some probability, threads also attempt to steal higher-priority tasks
    from the other queues---and task batching, that is, the processing of several
    tasks in a single insert / remove step. These ideas are well-known for task scheduling
    without priorities; our theoretical contribution is showing that, despite relaxations,
    this design can still provide rank guarantees, which in turn implies bounds on
    total work performed. We provide a general SMQ implementation which can surpass
    state-of-the-art schedulers such as OBIM and PMOD in terms of performance on popular
    graph-processing benchmarks. Notably, the performance improvement comes mainly
    from the superior rank guarantees provided by our scheduler, confirming that analytically-reasoned
    approaches can still provide performance improvements for priority task scheduling."
acknowledgement: We would like to thank the anonymous reviewers for their useful comments.
  This project has received funding from the European Research Council (ERC) under
  the European Union’s Horizon 2020 research and innovation programme (grant agreement
  No 805223 ScaleML).
article_processing_charge: No
arxiv: 1
author:
- first_name: Anastasiia
  full_name: Postnikova, Anastasiia
  last_name: Postnikova
- first_name: Nikita
  full_name: Koval, Nikita
  id: 2F4DB10C-F248-11E8-B48F-1D18A9856A87
  last_name: Koval
- first_name: Giorgi
  full_name: Nadiradze, Giorgi
  id: 3279A00C-F248-11E8-B48F-1D18A9856A87
  last_name: Nadiradze
  orcid: 0000-0001-5634-0731
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Postnikova A, Koval N, Nadiradze G, Alistarh D-A. Multi-queues can be state-of-the-art
    priority schedulers. In: <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles
    and Practice of Parallel Programming</i>. Association for Computing Machinery;
    2022:353-367. doi:<a href="https://doi.org/10.1145/3503221.3508432">10.1145/3503221.3508432</a>'
  apa: 'Postnikova, A., Koval, N., Nadiradze, G., &#38; Alistarh, D.-A. (2022). Multi-queues
    can be state-of-the-art priority schedulers. In <i>Proceedings of the 27th ACM
    SIGPLAN Symposium on Principles and Practice of Parallel Programming</i> (pp.
    353–367). Seoul, Republic of Korea: Association for Computing Machinery. <a href="https://doi.org/10.1145/3503221.3508432">https://doi.org/10.1145/3503221.3508432</a>'
  chicago: Postnikova, Anastasiia, Nikita Koval, Giorgi Nadiradze, and Dan-Adrian
    Alistarh. “Multi-Queues Can Be State-of-the-Art Priority Schedulers.” In <i>Proceedings
    of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</i>,
    353–67. Association for Computing Machinery, 2022. <a href="https://doi.org/10.1145/3503221.3508432">https://doi.org/10.1145/3503221.3508432</a>.
  ieee: A. Postnikova, N. Koval, G. Nadiradze, and D.-A. Alistarh, “Multi-queues can
    be state-of-the-art priority schedulers,” in <i>Proceedings of the 27th ACM SIGPLAN
    Symposium on Principles and Practice of Parallel Programming</i>, Seoul, Republic
    of Korea, 2022, pp. 353–367.
  ista: 'Postnikova A, Koval N, Nadiradze G, Alistarh D-A. 2022. Multi-queues can
    be state-of-the-art priority schedulers. Proceedings of the 27th ACM SIGPLAN Symposium
    on Principles and Practice of Parallel Programming. PPoPP: Sympopsium on Principles
    and Practice of Parallel Programming, 353–367.'
  mla: Postnikova, Anastasiia, et al. “Multi-Queues Can Be State-of-the-Art Priority
    Schedulers.” <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles and
    Practice of Parallel Programming</i>, Association for Computing Machinery, 2022,
    pp. 353–67, doi:<a href="https://doi.org/10.1145/3503221.3508432">10.1145/3503221.3508432</a>.
  short: A. Postnikova, N. Koval, G. Nadiradze, D.-A. Alistarh, in:, Proceedings of
    the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
    Association for Computing Machinery, 2022, pp. 353–367.
conference:
  end_date: 2022-04-06
  location: Seoul, Republic of Korea
  name: 'PPoPP: Sympopsium on Principles and Practice of Parallel Programming'
  start_date: 2022-04-02
corr_author: '1'
date_created: 2022-04-17T22:01:46Z
date_published: 2022-04-02T00:00:00Z
date_updated: 2025-04-14T07:49:13Z
day: '02'
department:
- _id: DaAl
doi: 10.1145/3503221.3508432
ec_funded: 1
external_id:
  arxiv:
  - '2109.00657'
  isi:
  - '000883318200025'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: ' https://doi.org/10.48550/arXiv.2109.00657'
month: '04'
oa: 1
oa_version: Preprint
page: 353-367
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice
  of Parallel Programming
publication_identifier:
  isbn:
  - '9781450392044'
publication_status: published
publisher: Association for Computing Machinery
quality_controlled: '1'
related_material:
  record:
  - id: '13076'
    relation: research_data
    status: public
scopus_import: '1'
status: public
title: Multi-queues can be state-of-the-art priority schedulers
type: conference
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
year: '2022'
...
---
_id: '11183'
abstract:
- lang: eng
  text: "Subgraph detection has recently been one of the most studied problems in
    the CONGEST model of distributed computing. In this work, we study the distributed
    complexity of problems closely related to subgraph detection, mainly focusing
    on induced subgraph detection. The main line of this work presents lower bounds
    and parameterized algorithms w.r.t structural parameters of the input graph:\r\n-
    On general graphs, we give unconditional lower bounds for induced detection of
    cycles and patterns of treewidth 2 in CONGEST. Moreover, by adapting reductions
    from centralized parameterized complexity, we prove lower bounds in CONGEST for
    detecting patterns with a 4-clique, and for induced path detection conditional
    on the hardness of triangle detection in the congested clique.\r\n- On graphs
    of bounded degeneracy, we show that induced paths can be detected fast in CONGEST
    using techniques from parameterized algorithms, while detecting cycles and patterns
    of treewidth 2 is hard.\r\n- On graphs of bounded vertex cover number, we show
    that induced subgraph detection is easy in CONGEST for any pattern graph. More
    specifically, we adapt a centralized parameterized algorithm for a more general
    maximum common induced subgraph detection problem to the distributed setting.
    In addition to these induced subgraph detection results, we study various related
    problems in the CONGEST and congested clique models, including for multicolored
    versions of subgraph-detection-like problems."
acknowledgement: "Amir Nikabadi: Supported by the LABEX MILYON (ANR-10-LABX-0070)
  of Université de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX-0007)
  operated by the French National Research Agency (ANR). Janne H. Korhonen: Supported
  by the European Research Council (ERC) under the European Union’s Horizon 2020 research
  and innovation programme (grant agreement No 805223 ScaleML).\r\nWe thank François
  Le Gall and Masayuki Miyamoto for sharing their work on lower bounds for induced
  subgraph detection [36]."
alternative_title:
- LIPIcs
article_number: '15'
article_processing_charge: No
author:
- first_name: Amir
  full_name: Nikabadi, Amir
  last_name: Nikabadi
- first_name: Janne
  full_name: Korhonen, Janne
  id: C5402D42-15BC-11E9-A202-CA2BE6697425
  last_name: Korhonen
citation:
  ama: 'Nikabadi A, Korhonen J. Beyond distributed subgraph detection: Induced subgraphs,
    multicolored problems and graph parameters. In: Bramas Q, Gramoli V, Milani A,
    eds. <i>25th International Conference on Principles of Distributed Systems</i>.
    Vol 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2022. doi:<a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.15">10.4230/LIPIcs.OPODIS.2021.15</a>'
  apa: 'Nikabadi, A., &#38; Korhonen, J. (2022). Beyond distributed subgraph detection:
    Induced subgraphs, multicolored problems and graph parameters. In Q. Bramas, V.
    Gramoli, &#38; A. Milani (Eds.), <i>25th International Conference on Principles
    of Distributed Systems</i> (Vol. 217). Strasbourg, France: Schloss Dagstuhl -
    Leibniz-Zentrum für Informatik. <a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.15">https://doi.org/10.4230/LIPIcs.OPODIS.2021.15</a>'
  chicago: 'Nikabadi, Amir, and Janne Korhonen. “Beyond Distributed Subgraph Detection:
    Induced Subgraphs, Multicolored Problems and Graph Parameters.” In <i>25th International
    Conference on Principles of Distributed Systems</i>, edited by Quentin Bramas,
    Vincent Gramoli, and Alessia Milani, Vol. 217. Schloss Dagstuhl - Leibniz-Zentrum
    für Informatik, 2022. <a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.15">https://doi.org/10.4230/LIPIcs.OPODIS.2021.15</a>.'
  ieee: 'A. Nikabadi and J. Korhonen, “Beyond distributed subgraph detection: Induced
    subgraphs, multicolored problems and graph parameters,” in <i>25th International
    Conference on Principles of Distributed Systems</i>, Strasbourg, France, 2022,
    vol. 217.'
  ista: 'Nikabadi A, Korhonen J. 2022. Beyond distributed subgraph detection: Induced
    subgraphs, multicolored problems and graph parameters. 25th International Conference
    on Principles of Distributed Systems. OPODIS, LIPIcs, vol. 217, 15.'
  mla: 'Nikabadi, Amir, and Janne Korhonen. “Beyond Distributed Subgraph Detection:
    Induced Subgraphs, Multicolored Problems and Graph Parameters.” <i>25th International
    Conference on Principles of Distributed Systems</i>, edited by Quentin Bramas
    et al., vol. 217, 15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022,
    doi:<a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.15">10.4230/LIPIcs.OPODIS.2021.15</a>.'
  short: A. Nikabadi, J. Korhonen, in:, Q. Bramas, V. Gramoli, A. Milani (Eds.), 25th
    International Conference on Principles of Distributed Systems, Schloss Dagstuhl
    - Leibniz-Zentrum für Informatik, 2022.
conference:
  end_date: 2021-12-15
  location: Strasbourg, France
  name: OPODIS
  start_date: 2021-12-13
corr_author: '1'
date_created: 2022-04-17T22:01:47Z
date_published: 2022-02-01T00:00:00Z
date_updated: 2025-04-14T07:49:13Z
day: '01'
ddc:
- '510'
department:
- _id: DaAl
doi: 10.4230/LIPIcs.OPODIS.2021.15
ec_funded: 1
editor:
- first_name: Quentin
  full_name: Bramas, Quentin
  last_name: Bramas
- first_name: Vincent
  full_name: Gramoli, Vincent
  last_name: Gramoli
- first_name: Alessia
  full_name: Milani, Alessia
  last_name: Milani
file:
- access_level: open_access
  checksum: 626551c14de5d4091573200ed0535752
  content_type: application/pdf
  creator: dernst
  date_created: 2022-05-02T07:53:00Z
  date_updated: 2022-05-02T07:53:00Z
  file_id: '11345'
  file_name: 2022_LIPICs_Nikabadi.pdf
  file_size: 790396
  relation: main_file
  success: 1
file_date_updated: 2022-05-02T07:53:00Z
has_accepted_license: '1'
intvolume: '       217'
language:
- iso: eng
month: '02'
oa: 1
oa_version: Published Version
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: 25th International Conference on Principles of Distributed Systems
publication_identifier:
  isbn:
  - '9783959772198'
  issn:
  - 1868-8969
publication_status: published
publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Beyond distributed subgraph detection: Induced subgraphs, multicolored problems
  and graph parameters'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 217
year: '2022'
...
---
_id: '11184'
abstract:
- lang: eng
  text: "Let G be a graph on n nodes. In the stochastic population protocol model,
    a collection of n indistinguishable, resource-limited nodes collectively solve
    tasks via pairwise interactions. In each interaction, two randomly chosen neighbors
    first read each other’s states, and then update their local states. A rich line
    of research has established tight upper and lower bounds on the complexity of
    fundamental tasks, such as majority and leader election, in this model, when G
    is a clique. Specifically, in the clique, these tasks can be solved fast, i.e.,
    in n polylog n pairwise interactions, with high probability, using at most polylog
    n states per node.\r\nIn this work, we consider the more general setting where
    G is an arbitrary regular graph, and present a technique for simulating protocols
    designed for fully-connected networks in any connected regular graph. Our main
    result is a simulation that is efficient on many interesting graph families: roughly,
    the simulation overhead is polylogarithmic in the number of nodes, and quadratic
    in the conductance of the graph. As a sample application, we show that, in any
    regular graph with conductance φ, both leader election and exact majority can
    be solved in φ^{-2} ⋅ n polylog n pairwise interactions, with high probability,
    using at most φ^{-2} ⋅ polylog n states per node. This shows that there are fast
    and space-efficient population protocols for leader election and exact majority
    on graphs with good expansion properties. We believe our results will prove generally
    useful, as they allow efficient technology transfer between the well-mixed (clique)
    case, and the under-explored spatial setting."
acknowledgement: "Dan Alistarh: This project has received funding from the European
  Research Council (ERC)\r\nunder the European Union’s Horizon 2020 research and innovation
  programme (grant agreement No.805223 ScaleML).\r\nJoel Rybicki: This project has
  received from the European Union’s Horizon 2020 research and\r\ninnovation programme
  under the Marie Skłodowska-Curie grant agreement No. 840605.\r\nAcknowledgements
  We grateful to Giorgi Nadiradze for pointing out a generalisation of the phase clock
  construction to non-regular graphs. We also thank anonymous reviewers for their
  useful comments on earlier versions of this manuscript."
alternative_title:
- LIPIcs
article_number: '14'
article_processing_charge: No
arxiv: 1
author:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Rati
  full_name: Gelashvili, Rati
  last_name: Gelashvili
- first_name: Joel
  full_name: Rybicki, Joel
  id: 334EFD2E-F248-11E8-B48F-1D18A9856A87
  last_name: Rybicki
  orcid: 0000-0002-6432-6646
citation:
  ama: 'Alistarh D-A, Gelashvili R, Rybicki J. Fast graphical population protocols.
    In: Bramas Q, Gramoli V, Milani A, eds. <i>25th International Conference on Principles
    of Distributed Systems</i>. Vol 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik;
    2022. doi:<a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.14">10.4230/LIPIcs.OPODIS.2021.14</a>'
  apa: 'Alistarh, D.-A., Gelashvili, R., &#38; Rybicki, J. (2022). Fast graphical
    population protocols. In Q. Bramas, V. Gramoli, &#38; A. Milani (Eds.), <i>25th
    International Conference on Principles of Distributed Systems</i> (Vol. 217).
    Strasbourg, France: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. <a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.14">https://doi.org/10.4230/LIPIcs.OPODIS.2021.14</a>'
  chicago: Alistarh, Dan-Adrian, Rati Gelashvili, and Joel Rybicki. “Fast Graphical
    Population Protocols.” In <i>25th International Conference on Principles of Distributed
    Systems</i>, edited by Quentin Bramas, Vincent Gramoli, and Alessia Milani, Vol.
    217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. <a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.14">https://doi.org/10.4230/LIPIcs.OPODIS.2021.14</a>.
  ieee: D.-A. Alistarh, R. Gelashvili, and J. Rybicki, “Fast graphical population
    protocols,” in <i>25th International Conference on Principles of Distributed Systems</i>,
    Strasbourg, France, 2022, vol. 217.
  ista: Alistarh D-A, Gelashvili R, Rybicki J. 2022. Fast graphical population protocols.
    25th International Conference on Principles of Distributed Systems. OPODIS, LIPIcs,
    vol. 217, 14.
  mla: Alistarh, Dan-Adrian, et al. “Fast Graphical Population Protocols.” <i>25th
    International Conference on Principles of Distributed Systems</i>, edited by Quentin
    Bramas et al., vol. 217, 14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
    2022, doi:<a href="https://doi.org/10.4230/LIPIcs.OPODIS.2021.14">10.4230/LIPIcs.OPODIS.2021.14</a>.
  short: D.-A. Alistarh, R. Gelashvili, J. Rybicki, in:, Q. Bramas, V. Gramoli, A.
    Milani (Eds.), 25th International Conference on Principles of Distributed Systems,
    Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.
conference:
  end_date: 2021-12-15
  location: Strasbourg, France
  name: OPODIS
  start_date: 2021-12-13
corr_author: '1'
date_created: 2022-04-17T22:01:47Z
date_published: 2022-02-01T00:00:00Z
date_updated: 2025-04-14T07:49:13Z
day: '01'
ddc:
- '510'
department:
- _id: DaAl
doi: 10.4230/LIPIcs.OPODIS.2021.14
ec_funded: 1
editor:
- first_name: Quentin
  full_name: Bramas, Quentin
  last_name: Bramas
- first_name: Vincent
  full_name: Gramoli, Vincent
  last_name: Gramoli
- first_name: Alessia
  full_name: Milani, Alessia
  last_name: Milani
external_id:
  arxiv:
  - '2102.08808'
file:
- access_level: open_access
  checksum: 2c7c982174c6f98c4ca6e92539d15086
  content_type: application/pdf
  creator: dernst
  date_created: 2022-05-02T08:06:33Z
  date_updated: 2022-05-02T08:06:33Z
  file_id: '11346'
  file_name: 2022_LIPICs_Alistarh.pdf
  file_size: 959406
  relation: main_file
  success: 1
file_date_updated: 2022-05-02T08:06:33Z
has_accepted_license: '1'
intvolume: '       217'
language:
- iso: eng
month: '02'
oa: 1
oa_version: Published Version
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
- _id: 26A5D39A-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '840605'
  name: Coordination in constrained and natural distributed systems
publication: 25th International Conference on Principles of Distributed Systems
publication_identifier:
  isbn:
  - '9783959772198'
  issn:
  - 1868-8969
publication_status: published
publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
quality_controlled: '1'
scopus_import: '1'
status: public
title: Fast graphical population protocols
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 217
year: '2022'
...
---
_id: '11844'
abstract:
- lang: eng
  text: "In the stochastic population protocol model, we are given a connected graph
    with n nodes, and in every time step, a scheduler samples an edge of the graph
    uniformly at random and the nodes connected by this edge interact. A fundamental
    task in this model is stable leader election, in which all nodes start in an identical
    state and the aim is to reach a configuration in which (1) exactly one node is
    elected as leader and (2) this node remains as the unique leader no matter what
    sequence of interactions follows. On cliques, the complexity of this problem has
    recently been settled: time-optimal protocols stabilize in Θ(n log n) expected
    steps using Θ(log log n) states, whereas protocols that use O(1) states require
    Θ(n2) expected steps.\r\n\r\nIn this work, we investigate the complexity of stable
    leader election on general graphs. We provide the first non-trivial time lower
    bounds for leader election on general graphs, showing that, when moving beyond
    cliques, the complexity landscape of leader election becomes very diverse: the
    time required to elect a leader can range from O(1) to Θ(n3) expected steps. On
    the upper bound side, we first observe that there exists a protocol that is time-optimal
    on many graph families, but uses polynomially-many states. In contrast, we give
    a near-time-optimal protocol that uses only O(log2n) states that is at most a
    factor log n slower. Finally, we show that the constant-state protocol of Beauquier
    et al. [OPODIS 2013] is at most a factor n log n slower than the fast polynomial-state
    protocol. Moreover, among constant-state protocols, this protocol has near-optimal
    average case complexity on dense random graphs."
acknowledgement: We thank the anonymous reviewers for their helpful comments. We gratefully
  acknowledge funding from the European Research Council (ERC) under the European
  Union’s Horizon 2020 research and innovation programme (grant agreement No 805223
  ScaleML).
article_processing_charge: Yes (via OA deal)
arxiv: 1
author:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Joel
  full_name: Rybicki, Joel
  id: 334EFD2E-F248-11E8-B48F-1D18A9856A87
  last_name: Rybicki
  orcid: 0000-0002-6432-6646
- first_name: Sasha
  full_name: Voitovych, Sasha
  last_name: Voitovych
citation:
  ama: 'Alistarh D-A, Rybicki J, Voitovych S. Near-optimal leader election in population
    protocols on graphs. In: <i>Proceedings of the Annual ACM Symposium on Principles
    of Distributed Computing</i>. Association for Computing Machinery; 2022:246-256.
    doi:<a href="https://doi.org/10.1145/3519270.3538435">10.1145/3519270.3538435</a>'
  apa: 'Alistarh, D.-A., Rybicki, J., &#38; Voitovych, S. (2022). Near-optimal leader
    election in population protocols on graphs. In <i>Proceedings of the Annual ACM
    Symposium on Principles of Distributed Computing</i> (pp. 246–256). Salerno, Italy:
    Association for Computing Machinery. <a href="https://doi.org/10.1145/3519270.3538435">https://doi.org/10.1145/3519270.3538435</a>'
  chicago: Alistarh, Dan-Adrian, Joel Rybicki, and Sasha Voitovych. “Near-Optimal
    Leader Election in Population Protocols on Graphs.” In <i>Proceedings of the Annual
    ACM Symposium on Principles of Distributed Computing</i>, 246–56. Association
    for Computing Machinery, 2022. <a href="https://doi.org/10.1145/3519270.3538435">https://doi.org/10.1145/3519270.3538435</a>.
  ieee: D.-A. Alistarh, J. Rybicki, and S. Voitovych, “Near-optimal leader election
    in population protocols on graphs,” in <i>Proceedings of the Annual ACM Symposium
    on Principles of Distributed Computing</i>, Salerno, Italy, 2022, pp. 246–256.
  ista: 'Alistarh D-A, Rybicki J, Voitovych S. 2022. Near-optimal leader election
    in population protocols on graphs. Proceedings of the Annual ACM Symposium on
    Principles of Distributed Computing. PODC: Symposium on Principles of Distributed
    Computing, 246–256.'
  mla: Alistarh, Dan-Adrian, et al. “Near-Optimal Leader Election in Population Protocols
    on Graphs.” <i>Proceedings of the Annual ACM Symposium on Principles of Distributed
    Computing</i>, Association for Computing Machinery, 2022, pp. 246–56, doi:<a href="https://doi.org/10.1145/3519270.3538435">10.1145/3519270.3538435</a>.
  short: D.-A. Alistarh, J. Rybicki, S. Voitovych, in:, Proceedings of the Annual
    ACM Symposium on Principles of Distributed Computing, Association for Computing
    Machinery, 2022, pp. 246–256.
conference:
  end_date: 2022-07-29
  location: Salerno, Italy
  name: 'PODC: Symposium on Principles of Distributed Computing'
  start_date: 2022-07-25
corr_author: '1'
date_created: 2022-08-14T22:01:46Z
date_published: 2022-07-21T00:00:00Z
date_updated: 2025-12-30T09:04:17Z
day: '21'
ddc:
- '000'
department:
- _id: DaAl
doi: 10.1145/3519270.3538435
ec_funded: 1
external_id:
  arxiv:
  - '2205.12597'
  isi:
  - '001031439100030'
file:
- access_level: open_access
  checksum: 4c6b29172b8e355b4fbc364a2e0827b2
  content_type: application/pdf
  creator: cchlebak
  date_created: 2022-08-16T08:05:15Z
  date_updated: 2022-08-16T08:05:15Z
  file_id: '11854'
  file_name: 2022_PODC_Alistarh.pdf
  file_size: 1593474
  relation: main_file
  success: 1
file_date_updated: 2022-08-16T08:05:15Z
has_accepted_license: '1'
isi: 1
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 246-256
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the Annual ACM Symposium on Principles of Distributed
  Computing
publication_identifier:
  isbn:
  - '9781450392624'
publication_status: published
publisher: Association for Computing Machinery
quality_controlled: '1'
related_material:
  record:
  - id: '19969'
    relation: later_version
    status: public
scopus_import: '1'
status: public
title: Near-optimal leader election in population protocols on graphs
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
year: '2022'
...
---
_id: '12299'
abstract:
- lang: eng
  text: 'Transfer learning is a classic paradigm by which models pretrained on large
    “upstream” datasets are adapted to yield good results on “downstream” specialized
    datasets. Generally, more accurate models on the “upstream” dataset tend to provide
    better transfer accuracy “downstream”. In this work, we perform an in-depth investigation
    of this phenomenon in the context of convolutional neural networks (CNNs) trained
    on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying
    their connections. We consider transfer using unstructured pruned models obtained
    by applying several state-of-the-art pruning methods, including magnitude-based,
    second-order, regrowth, lottery-ticket, and regularization approaches, in the
    context of twelve standard transfer tasks. In a nutshell, our study shows that
    sparse models can match or even outperform the transfer performance of dense models,
    even at high sparsities, and, while doing so, can lead to significant inference
    and even training speedups. At the same time, we observe and analyze significant
    differences in the behaviour of different pruning methods. The code is available
    at: https://github.com/IST-DASLab/sparse-imagenet-transfer.'
acknowledgement: he authors would like to sincerely thank Christoph Lampert and Nir
  Shavit for fruitful discussions during the development of this work, and Eldar Kurtic
  for experimental support. EI was supported in part by the FWF DK VGSCO, grant agreement
  number W1260-N35, while AP and DA acknowledge generous support by the ERC, via Starting
  Grant 805223 ScaleML.
article_processing_charge: No
arxiv: 1
author:
- first_name: Eugenia B
  full_name: Iofinova, Eugenia B
  id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
  last_name: Iofinova
  orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
  full_name: Peste, Elena-Alexandra
  id: 32D78294-F248-11E8-B48F-1D18A9856A87
  last_name: Peste
- first_name: Mark
  full_name: Kurtz, Mark
  last_name: Kurtz
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Iofinova EB, Krumes A, Kurtz M, Alistarh D-A. How well do sparse ImageNet
    models transfer? In: <i>2022 IEEE/CVF Conference on Computer Vision and Pattern
    Recognition</i>. Institute of Electrical and Electronics Engineers; 2022:12256-12266.
    doi:<a href="https://doi.org/10.1109/cvpr52688.2022.01195">10.1109/cvpr52688.2022.01195</a>'
  apa: 'Iofinova, E. B., Krumes, A., Kurtz, M., &#38; Alistarh, D.-A. (2022). How
    well do sparse ImageNet models transfer? In <i>2022 IEEE/CVF Conference on Computer
    Vision and Pattern Recognition</i> (pp. 12256–12266). New Orleans, LA, United
    States: Institute of Electrical and Electronics Engineers. <a href="https://doi.org/10.1109/cvpr52688.2022.01195">https://doi.org/10.1109/cvpr52688.2022.01195</a>'
  chicago: Iofinova, Eugenia B, Alexandra Krumes, Mark Kurtz, and Dan-Adrian Alistarh.
    “How Well Do Sparse ImageNet Models Transfer?” In <i>2022 IEEE/CVF Conference
    on Computer Vision and Pattern Recognition</i>, 12256–66. Institute of Electrical
    and Electronics Engineers, 2022. <a href="https://doi.org/10.1109/cvpr52688.2022.01195">https://doi.org/10.1109/cvpr52688.2022.01195</a>.
  ieee: E. B. Iofinova, A. Krumes, M. Kurtz, and D.-A. Alistarh, “How well do sparse
    ImageNet models transfer?,” in <i>2022 IEEE/CVF Conference on Computer Vision
    and Pattern Recognition</i>, New Orleans, LA, United States, 2022, pp. 12256–12266.
  ista: 'Iofinova EB, Krumes A, Kurtz M, Alistarh D-A. 2022. How well do sparse ImageNet
    models transfer? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
    CVPR: Computer Vision and Pattern Recognition, 12256–12266.'
  mla: Iofinova, Eugenia B., et al. “How Well Do Sparse ImageNet Models Transfer?”
    <i>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, Institute
    of Electrical and Electronics Engineers, 2022, pp. 12256–66, doi:<a href="https://doi.org/10.1109/cvpr52688.2022.01195">10.1109/cvpr52688.2022.01195</a>.
  short: E.B. Iofinova, A. Krumes, M. Kurtz, D.-A. Alistarh, in:, 2022 IEEE/CVF Conference
    on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics
    Engineers, 2022, pp. 12256–12266.
conference:
  end_date: 2022-06-24
  location: New Orleans, LA, United States
  name: 'CVPR: Computer Vision and Pattern Recognition'
  start_date: 2022-06-18
corr_author: '1'
date_created: 2023-01-16T10:06:00Z
date_published: 2022-09-27T00:00:00Z
date_updated: 2026-04-07T13:30:19Z
day: '27'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52688.2022.01195
ec_funded: 1
external_id:
  arxiv:
  - '2111.13445'
  isi:
  - '000870759105034'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2111.13445
month: '09'
oa: 1
oa_version: Preprint
page: 12256-12266
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
  grant_number: W1260-N35
  name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
  eissn:
  - 2575-7075
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
related_material:
  record:
  - id: '13074'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: How well do sparse ImageNet models transfer?
type: conference
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
year: '2022'
...
---
_id: '17059'
abstract:
- lang: eng
  text: The recent focus on the efficiency of deep neural networks (DNNs) has led
    to significant work on model compression approaches, of which weight pruning is
    one of the most popular. At the same time, there is rapidly-growing computational
    support for efficiently executing the unstructured-sparse models obtained via
    pruning. Yet, most existing pruning methods minimize just the number of remaining
    weights, i.e. the size of the model, rather than optimizing for inference time.
    We address this gap by introducing SPDY, a new compression method which automatically
    determines layer-wise sparsity targets achieving a desired inference speedup on
    a given system, while minimizing accuracy loss. SPDY is the composition of two
    new techniques. The first is an efficient and general dynamic programming algorithm
    for solving constrained layer-wise compression problems, given a set of layer-wise
    error scores. The second technique is a local search procedure for automatically
    determining such scores in an accurate and robust manner. Experiments across popular
    vision and language models show that SPDY guarantees speedups while recovering
    higher accuracy relative to existing strategies, both for one-shot and gradual
    pruning scenarios, and is compatible with most existing pruning approaches. We
    also extend our approach to the recently-proposed task of pruning with very little
    data, where we achieve the best known accuracy recovery when pruning to the GPU-supported
    2:4 sparsity pattern.
acknowledgement: "We gratefully acknowledge funding from the European Research Council
  (ERC) under the European Union’s Horizon 2020 programme (grant agreement No 805223
  ScaleML),\r\nas well as computational support from AWS EC2. We thank Eldar Kurtic
  for code and hyper-parameters for BERT pruning, and the Neural Magic Team, notably
  Michael Goin and\r\nMark Kurtz, for support with their software."
alternative_title:
- PMLR
article_processing_charge: Yes
author:
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Frantar E, Alistarh D-A. SPDY: Accurate pruning with speedup guarantees. In:
    <i>39th International Conference on Machine Learning</i>. Vol 162. ML Research
    Press; 2022:6726-6743.'
  apa: 'Frantar, E., &#38; Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup
    guarantees. In <i>39th International Conference on Machine Learning</i> (Vol.
    162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press.'
  chicago: 'Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with
    Speedup Guarantees.” In <i>39th International Conference on Machine Learning</i>,
    162:6726–43. ML Research Press, 2022.'
  ieee: 'E. Frantar and D.-A. Alistarh, “SPDY: Accurate pruning with speedup guarantees,”
    in <i>39th International Conference on Machine Learning</i>, Baltimore, MD, United
    States, 2022, vol. 162, pp. 6726–6743.'
  ista: 'Frantar E, Alistarh D-A. 2022. SPDY: Accurate pruning with speedup guarantees.
    39th International Conference on Machine Learning. ICML: International Conference
    on Machine Learning, PMLR, vol. 162, 6726–6743.'
  mla: 'Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup
    Guarantees.” <i>39th International Conference on Machine Learning</i>, vol. 162,
    ML Research Press, 2022, pp. 6726–43.'
  short: E. Frantar, D.-A. Alistarh, in:, 39th International Conference on Machine
    Learning, ML Research Press, 2022, pp. 6726–6743.
conference:
  end_date: 2022-07-23
  location: Baltimore, MD, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2022-07-17
corr_author: '1'
date_created: 2024-05-28T13:45:20Z
date_published: 2022-07-20T00:00:00Z
date_updated: 2025-04-14T07:49:14Z
day: '20'
ddc:
- '000'
department:
- _id: DaAl
ec_funded: 1
external_id:
  isi:
  - '000922378801029'
file:
- access_level: open_access
  checksum: 5179a1e4dfc0fbfab6674907299e414a
  content_type: application/pdf
  creator: dernst
  date_created: 2024-08-19T06:54:41Z
  date_updated: 2024-08-19T06:54:41Z
  file_id: '17440'
  file_name: 2022_PMLR_Frantar.pdf
  file_size: 615916
  relation: main_file
  success: 1
file_date_updated: 2024-08-19T06:54:41Z
has_accepted_license: '1'
intvolume: '       162'
isi: 1
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 6726-6743
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: 39th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SPDY: Accurate pruning with speedup guarantees'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 162
year: '2022'
...