[{"publisher":"Springer Nature","_id":"19969","page":"207-245","title":"Near-optimal leader election in population protocols on graphs","scopus_import":"1","OA_place":"publisher","status":"public","related_material":{"record":[{"relation":"earlier_version","id":"11844","status":"public"}]},"abstract":[{"lang":"eng","text":"In the stochastic population protocol model, we are given a connected graph with n nodes, and in every time step, a scheduler samples an edge of the graph uniformly at random and the nodes connected by this edge interact. A fundamental task in this model is stable leader election, in which all nodes start in an identical state and the aim is to reach a configuration in which (1)\r\nexactly one node is elected as leader and (2) this node remains as the unique leader no matter what sequence of interactions follows. On cliques, the complexity of this problem has recently been settled: time-optimal protocols stabilize in (n log n) expected steps using (log log n) states, whereas protocols that use O(1) states require (n2) expected steps. In this work, we investigate the complexity of stable leader election on graphs. We provide the first non-trivial time lower bounds on general graphs, showing that, when moving beyond cliques, the complexity of stable leader election can range from O(1) to (n3) expected steps. We describe a protocol that is time-optimal on many graph families, but uses polynomially-many states. In contrast, we give a near-time-optimal protocol that uses only O(log2 n) states that is at most a factor O(log n) slower. Finally, we observe that for many graphs the constant-state protocol of Beauquier et al. [OPODIS 2013] is at most a factor O(n log n) slower than the fast polynomial-state protocol, and among constant-state protocols, this protocol has near-optimal average case complexity on dense random graphs."}],"doi":"10.1007/s00446-025-00487-7","acknowledgement":"We thank all anonymous reviewers for their helpful comments. We would also like to thank Jakob Solnerzik and Olivier Stietel for catching some errors in the proofs. Open Access funding enabled and organized by Projekt DEAL. We gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","type":"journal_article","file_date_updated":"2025-12-30T09:03:55Z","publication_identifier":{"eissn":["1432-0452"],"issn":["0178-2770"]},"publication_status":"published","corr_author":"1","volume":38,"date_updated":"2025-12-30T09:04:18Z","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"oa":1,"file":[{"file_id":"20900","checksum":"2789c0fdfb58f64930f05f6ac2b3ca61","creator":"dernst","relation":"main_file","file_size":770705,"content_type":"application/pdf","success":1,"access_level":"open_access","date_created":"2025-12-30T09:03:55Z","file_name":"2025_DistributedComp_Alistarh.pdf","date_updated":"2025-12-30T09:03:55Z"}],"department":[{"_id":"DaAl"}],"oa_version":"Published Version","PlanS_conform":"1","article_processing_charge":"Yes (via OA deal)","day":"01","isi":1,"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png"},"author":[{"last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian"},{"id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0002-6432-6646","last_name":"Rybicki","first_name":"Joel","full_name":"Rybicki, Joel"},{"last_name":"Voitovych","first_name":"Sasha","full_name":"Voitovych, Sasha"}],"license":"https://creativecommons.org/licenses/by/4.0/","month":"09","year":"2025","publication":"Distributed Computing","OA_type":"hybrid","quality_controlled":"1","language":[{"iso":"eng"}],"ec_funded":1,"external_id":{"arxiv":["2205.12597"],"isi":["001518300400001"]},"arxiv":1,"article_type":"original","intvolume":"        38","has_accepted_license":"1","citation":{"chicago":"Alistarh, Dan-Adrian, Joel Rybicki, and Sasha Voitovych. “Near-Optimal Leader Election in Population Protocols on Graphs.” <i>Distributed Computing</i>. Springer Nature, 2025. <a href=\"https://doi.org/10.1007/s00446-025-00487-7\">https://doi.org/10.1007/s00446-025-00487-7</a>.","short":"D.-A. Alistarh, J. Rybicki, S. Voitovych, Distributed Computing 38 (2025) 207–245.","mla":"Alistarh, Dan-Adrian, et al. “Near-Optimal Leader Election in Population Protocols on Graphs.” <i>Distributed Computing</i>, vol. 38, Springer Nature, 2025, pp. 207–45, doi:<a href=\"https://doi.org/10.1007/s00446-025-00487-7\">10.1007/s00446-025-00487-7</a>.","apa":"Alistarh, D.-A., Rybicki, J., &#38; Voitovych, S. (2025). Near-optimal leader election in population protocols on graphs. <i>Distributed Computing</i>. Springer Nature. <a href=\"https://doi.org/10.1007/s00446-025-00487-7\">https://doi.org/10.1007/s00446-025-00487-7</a>","ama":"Alistarh D-A, Rybicki J, Voitovych S. Near-optimal leader election in population protocols on graphs. <i>Distributed Computing</i>. 2025;38:207-245. doi:<a href=\"https://doi.org/10.1007/s00446-025-00487-7\">10.1007/s00446-025-00487-7</a>","ista":"Alistarh D-A, Rybicki J, Voitovych S. 2025. Near-optimal leader election in population protocols on graphs. Distributed Computing. 38, 207–245.","ieee":"D.-A. Alistarh, J. Rybicki, and S. Voitovych, “Near-optimal leader election in population protocols on graphs,” <i>Distributed Computing</i>, vol. 38. Springer Nature, pp. 207–245, 2025."},"date_created":"2025-07-06T22:01:24Z","date_published":"2025-09-01T00:00:00Z","ddc":["510"]},{"issue":"19","day":"11","article_processing_charge":"No","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","author":[{"last_name":"Talaei","first_name":"Shayan","full_name":"Talaei, Shayan"},{"full_name":"Ansaripour, Matin","first_name":"Matin","last_name":"Ansaripour"},{"first_name":"Giorgi","full_name":"Nadiradze, Giorgi","last_name":"Nadiradze","orcid":"0000-0001-5634-0731","id":"3279A00C-F248-11E8-B48F-1D18A9856A87"},{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"month":"04","project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"oa":1,"department":[{"_id":"DaAl"}],"oa_version":"Preprint","external_id":{"arxiv":["2210.07703"]},"arxiv":1,"intvolume":"        39","article_type":"original","citation":{"chicago":"Talaei, Shayan, Matin Ansaripour, Giorgi Nadiradze, and Dan-Adrian Alistarh. “Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence.” <i>Proceedings of the 39th AAAI Conference on Artificial Intelligence</i>. Association for the Advancement of Artificial Intelligence, 2025. <a href=\"https://doi.org/10.1609/aaai.v39i19.34290\">https://doi.org/10.1609/aaai.v39i19.34290</a>.","short":"S. Talaei, M. Ansaripour, G. Nadiradze, D.-A. Alistarh, Proceedings of the 39th AAAI Conference on Artificial Intelligence 39 (2025) 20778–20786.","mla":"Talaei, Shayan, et al. “Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence.” <i>Proceedings of the 39th AAAI Conference on Artificial Intelligence</i>, vol. 39, no. 19, Association for the Advancement of Artificial Intelligence, 2025, pp. 20778–86, doi:<a href=\"https://doi.org/10.1609/aaai.v39i19.34290\">10.1609/aaai.v39i19.34290</a>.","apa":"Talaei, S., Ansaripour, M., Nadiradze, G., &#38; Alistarh, D.-A. (2025). Hybrid decentralized optimization: Leveraging both first- and zeroth-order optimizers for faster convergence. <i>Proceedings of the 39th AAAI Conference on Artificial Intelligence</i>. Association for the Advancement of Artificial Intelligence. <a href=\"https://doi.org/10.1609/aaai.v39i19.34290\">https://doi.org/10.1609/aaai.v39i19.34290</a>","ama":"Talaei S, Ansaripour M, Nadiradze G, Alistarh D-A. Hybrid decentralized optimization: Leveraging both first- and zeroth-order optimizers for faster convergence. <i>Proceedings of the 39th AAAI Conference on Artificial Intelligence</i>. 2025;39(19):20778-20786. doi:<a href=\"https://doi.org/10.1609/aaai.v39i19.34290\">10.1609/aaai.v39i19.34290</a>","ista":"Talaei S, Ansaripour M, Nadiradze G, Alistarh D-A. 2025. Hybrid decentralized optimization: Leveraging both first- and zeroth-order optimizers for faster convergence. Proceedings of the 39th AAAI Conference on Artificial Intelligence. 39(19), 20778–20786.","ieee":"S. Talaei, M. Ansaripour, G. Nadiradze, and D.-A. Alistarh, “Hybrid decentralized optimization: Leveraging both first- and zeroth-order optimizers for faster convergence,” <i>Proceedings of the 39th AAAI Conference on Artificial Intelligence</i>, vol. 39, no. 19. Association for the Advancement of Artificial Intelligence, pp. 20778–20786, 2025."},"date_created":"2025-05-19T14:15:35Z","date_published":"2025-04-11T00:00:00Z","year":"2025","publication":"Proceedings of the 39th AAAI Conference on Artificial Intelligence","OA_type":"free access","quality_controlled":"1","language":[{"iso":"eng"}],"ec_funded":1,"status":"public","related_material":{"link":[{"relation":"software","url":"https://github.com/ShayanTalaei/HDO"}]},"publisher":"Association for the Advancement of Artificial Intelligence","page":"20778-20786","_id":"19713","title":"Hybrid decentralized optimization: Leveraging both first- and zeroth-order optimizers for faster convergence","scopus_import":"1","OA_place":"publisher","main_file_link":[{"open_access":"1","url":"https://doi.org/10.1609/aaai.v39i19.34290"}],"acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement\r\nNo 805223 ScaleML). The authors would like to acknowledge Eugenia Iofinova for useful discussions during the inception of this project.","type":"journal_article","publication_identifier":{"eissn":["2374-3468"],"issn":["2159-5399"]},"publication_status":"published","corr_author":"1","volume":39,"date_updated":"2026-02-16T12:34:44Z","abstract":[{"text":"Distributed optimization is the standard way of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Our results hold for both convex and non-convex objectives. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical, even when training deep neural networks.","lang":"eng"}],"doi":"10.1609/aaai.v39i19.34290"},{"abstract":[{"text":"Large language models (LLMs) have made tremendous progress in the past few years, from being able to generate coherent text to matching or surpassing humans in a wide variety of creative, knowledge or reasoning tasks. Much of this can be attributed to massively increased scale, both in the size of the model as well as the amount of training data, from 100s of millions to 100s of billions, or even trillions. This trend is expected to continue, which, although exciting, also raises major practical concerns. Already today's 100+ billion parameter LLMs require top-of-the-line hardware just to run. Hence, it is clear that sustaining these developments will require significant efficiency advances.\r\n\r\nHistorically, one of the most practical ways of improving model efficiency has been compression, especially in the form of sparsity or quantization. While this has been studied extensively in the past, existing accurate methods are all designed for models around 100 million parameters; scaling them up to ones literally 1000x larger is highly challenging. In this thesis, we introduce a new unified sparsification and quantization approach OBC, which through additional algorithmic enhancements leads to GPTQ and SparseGPT, the first techniques fast and accurate enough to compress 100+ billion parameter models to 4- or even 3-bit precision and 50% weight-sparsity, respectively. Additionally, we show how weight-only quantizion does not just bring space savings but also up to 4.5x faster generation speed, via custom GPU kernels.\r\n\r\nIn fact, we show for the first time that it is possible to develop an FP16 times INT4 mixed-precision matrix multiplication kernel, called Marlin, which comes close to simultaneously maximizing both memory and compute utilization, making weight-only quantization highly practical even for multi-user serving. Further, we demonstrate that GPTQ can be scaled to widely overparametrized trillion-parameter models, where extreme sub-1-bit compression rates can be achieved without any inference slow-down, by co-designing a bespoke entropy coding scheme together with an efficient kernel.\r\n\r\nFinally, we also study compression from the perspective of someone with access to massive amounts of compute resources for training large models completely from scratch. Here the key questions evolve around the joint scaling behavior between compression, model size, and amount of training data used. Based on extensive experimental results for both vision and text models, we introduce the first scaling law which accurately captures the relationship between weight-sparsity, number of non-zero weights and data. This further allows us to characterize the optimal sparsity, which we find to increase the longer a fixed cost model is being trained.\r\n\r\nOverall, this thesis presents contributions to three different angles of large model efficiency: affordable but accurate algorithms, highly efficient systems implementations, and fundamental scaling laws for compressed training.","lang":"eng"}],"doi":"10.15479/at:ista:17485","publication_status":"published","corr_author":"1","date_updated":"2025-04-14T13:52:01Z","alternative_title":["ISTA Thesis"],"acknowledged_ssus":[{"_id":"ScienComp"}],"type":"dissertation","file_date_updated":"2024-09-06T16:24:59Z","publication_identifier":{"issn":["2663-337X"]},"publisher":"Institute of Science and Technology Austria","degree_awarded":"PhD","_id":"17485","page":"129","title":"Compressing large neural networks : Algorithms, systems and scaling laws","related_material":{"record":[{"id":"18062","relation":"part_of_dissertation","status":"public"},{"relation":"part_of_dissertation","id":"18061","status":"public"},{"status":"public","relation":"part_of_dissertation","id":"17378"},{"id":"17087","relation":"part_of_dissertation","status":"public"},{"status":"public","relation":"part_of_dissertation","id":"14458"}]},"status":"public","supervisor":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","last_name":"Alistarh"}],"language":[{"iso":"eng"}],"ec_funded":1,"year":"2024","has_accepted_license":"1","citation":{"chicago":"Frantar, Elias. “Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws.” Institute of Science and Technology Austria, 2024. <a href=\"https://doi.org/10.15479/at:ista:17485\">https://doi.org/10.15479/at:ista:17485</a>.","short":"E. Frantar, Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws, Institute of Science and Technology Austria, 2024.","mla":"Frantar, Elias. <i>Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws</i>. Institute of Science and Technology Austria, 2024, doi:<a href=\"https://doi.org/10.15479/at:ista:17485\">10.15479/at:ista:17485</a>.","apa":"Frantar, E. (2024). <i>Compressing large neural networks : Algorithms, systems and scaling laws</i>. Institute of Science and Technology Austria. <a href=\"https://doi.org/10.15479/at:ista:17485\">https://doi.org/10.15479/at:ista:17485</a>","ama":"Frantar E. Compressing large neural networks : Algorithms, systems and scaling laws. 2024. doi:<a href=\"https://doi.org/10.15479/at:ista:17485\">10.15479/at:ista:17485</a>","ista":"Frantar E. 2024. Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria.","ieee":"E. Frantar, “Compressing large neural networks : Algorithms, systems and scaling laws,” Institute of Science and Technology Austria, 2024."},"date_created":"2024-09-02T11:01:48Z","ddc":["000"],"date_published":"2024-09-05T00:00:00Z","oa_version":"Published Version","oa":1,"project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"file":[{"relation":"source_file","file_id":"17570","checksum":"5d785645805a78c5b4ce7cc3df557b09","creator":"efrantar","content_type":"application/zip","access_level":"closed","date_created":"2024-09-05T12:04:11Z","file_name":"thesis-final.zip","date_updated":"2024-09-05T12:04:11Z","file_size":1615167},{"file_size":2376611,"content_type":"application/pdf","access_level":"open_access","success":1,"date_created":"2024-09-06T16:24:59Z","file_name":"frantar_thesis_final.pdf","date_updated":"2024-09-06T16:24:59Z","file_id":"17880","checksum":"a9dd1c2d23734986924eb44ebb55fd8f","creator":"efrantar","relation":"main_file"}],"department":[{"_id":"GradSch"},{"_id":"DaAl"}],"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","author":[{"first_name":"Elias","full_name":"Frantar, Elias","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","last_name":"Frantar"}],"month":"09","day":"05","article_processing_charge":"No"},{"abstract":[{"text":"Deep learning is essential in numerous applications nowadays, with many recent advancements made possible by training very large models. Despite their broad applicability, training neural networks is often time-intensive, and it is usually impractical to manage large models and datasets on a single machine. To address these issues, distributed deep learning training has become increasingly important. However, distributed training requires synchronization among nodes, and the mini-batch stochastic gradient descent algorithm places a significant load on network connections. A possible solution to tackle the synchronization bottleneck is to reduce a message size by lossy compression.\r\n\r\nIn this thesis, we investigate systems and algorithmic approaches to communication compression during training. From the systems perspective, we demonstrate that a common approach of expensive hardware overprovisioning can be replaced through a thorough system design. We introduce a framework that introduces efficient software support for compressed communication in machine learning applications, applicable to both multi-GPU single-node training and larger-scale multi-node training. Our framework integrates with popular ML frameworks, providing up to 3x speedups for multi-GPU nodes based on commodity hardware and order-of-magnitude improvements in the multi-node setting, with negligible impact on accuracy.\r\n\r\nAlso, we consider an application of our framework to different communication schemes, such as Fully Sharded Data Parallel. We provide strong convergence guarantees for the compression in such a setup. Empirical validation shows that our method preserves model accuracy for GPT-family models with up to 1.3 billion parameters, while completely removing the communication bottlenecks of non-compressed alternatives, providing up to 2.2x speedups end-to-end.\r\n\r\nFrom the algorithmic side, we propose a general framework that dynamically adjusts the degree of compression across a model's layers during training. This approach enhances overall compression and results in significant speedups without compromising accuracy. Our algorithm utilizes an adaptive algorithm that automatically selects the optimal compression parameters for model layers, ensuring the best compression ratio while adhering to an error constraint. Our method is effective across all existing families of compression methods. It achieves up to 2.5x faster training and up to a 5x improvement in compression compared to efficient implementations of current approaches. Additionally, LGreCo can complement existing adaptive algorithms.\r\n","lang":"eng"}],"doi":"10.15479/at:ista:17490","corr_author":"1","publication_status":"published","date_updated":"2025-09-10T09:54:21Z","type":"dissertation","acknowledged_ssus":[{"_id":"ScienComp"}],"alternative_title":["ISTA Thesis"],"publication_identifier":{"issn":["2663-337X"]},"file_date_updated":"2024-09-04T08:36:06Z","_id":"17490","degree_awarded":"PhD","page":"102","publisher":"Institute of Science and Technology Austria","title":"Communication-efficient distributed training of deep neural networks : An algorithms and systems perspective","related_material":{"record":[{"status":"public","id":"17456","relation":"part_of_dissertation"},{"relation":"part_of_dissertation","id":"14461","status":"public"},{"status":"public","id":"12780","relation":"part_of_dissertation"}]},"status":"public","language":[{"iso":"eng"}],"supervisor":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","orcid":"0000-0003-3650-940X"}],"ec_funded":1,"year":"2024","ddc":["000"],"date_published":"2024-09-04T00:00:00Z","has_accepted_license":"1","citation":{"ieee":"I. Markov, “Communication-efficient distributed training of deep neural networks : An algorithms and systems perspective,” Institute of Science and Technology Austria, 2024.","ista":"Markov I. 2024. Communication-efficient distributed training of deep neural networks : An algorithms and systems perspective. Institute of Science and Technology Austria.","ama":"Markov I. Communication-efficient distributed training of deep neural networks : An algorithms and systems perspective. 2024. doi:<a href=\"https://doi.org/10.15479/at:ista:17490\">10.15479/at:ista:17490</a>","apa":"Markov, I. (2024). <i>Communication-efficient distributed training of deep neural networks : An algorithms and systems perspective</i>. Institute of Science and Technology Austria. <a href=\"https://doi.org/10.15479/at:ista:17490\">https://doi.org/10.15479/at:ista:17490</a>","mla":"Markov, Ilia. <i>Communication-Efficient Distributed Training of Deep Neural Networks : An Algorithms and Systems Perspective</i>. Institute of Science and Technology Austria, 2024, doi:<a href=\"https://doi.org/10.15479/at:ista:17490\">10.15479/at:ista:17490</a>.","short":"I. Markov, Communication-Efficient Distributed Training of Deep Neural Networks : An Algorithms and Systems Perspective, Institute of Science and Technology Austria, 2024.","chicago":"Markov, Ilia. “Communication-Efficient Distributed Training of Deep Neural Networks : An Algorithms and Systems Perspective.” Institute of Science and Technology Austria, 2024. <a href=\"https://doi.org/10.15479/at:ista:17490\">https://doi.org/10.15479/at:ista:17490</a>."},"date_created":"2024-09-04T08:51:11Z","oa_version":"Published Version","oa":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"department":[{"_id":"GradSch"},{"_id":"DaAl"}],"file":[{"relation":"source_file","file_id":"17491","checksum":"77609f4835d2730e46fa0d42d9134ed9","creator":"imarkov","content_type":"application/x-zip-compressed","access_level":"closed","date_created":"2024-09-04T08:35:35Z","file_name":"Thesis.zip","date_updated":"2024-09-04T08:35:35Z","file_size":43327753},{"relation":"main_file","checksum":"9e68f7217570f756ceb8f70b980938cd","creator":"imarkov","file_id":"17492","date_created":"2024-09-04T08:36:06Z","date_updated":"2024-09-04T08:36:06Z","file_name":"Thesis_final_version_pdfa2.pdf","content_type":"application/pdf","access_level":"open_access","success":1,"file_size":2756082}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode","short":"CC BY-NC-SA (4.0)","name":"Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)","image":"/images/cc_by_nc_sa.png"},"author":[{"first_name":"Ilia","full_name":"Markov, Ilia","id":"D0CF4148-C985-11E9-8066-0BDEE5697425","last_name":"Markov"}],"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","month":"09","license":"https://creativecommons.org/licenses/by-nc-sa/4.0/","article_processing_charge":"No","day":"04"},{"day":"25","article_processing_charge":"No","isi":1,"issue":"4","month":"07","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Aspnes","full_name":"Aspnes, James","first_name":"James"},{"last_name":"Ellen","full_name":"Ellen, Faith","first_name":"Faith"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"},{"full_name":"Zhu, Leqi","first_name":"Leqi","id":"a2117c59-cee4-11ed-b9d0-874ecf0f8ac5","last_name":"Zhu"}],"department":[{"_id":"DaAl"}],"oa":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"oa_version":"Preprint","arxiv":1,"external_id":{"isi":["001082972300004"],"arxiv":["1811.01421"]},"citation":{"ieee":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, and L. Zhu, “Why extension-based proofs fail,” <i>SIAM Journal on Computing</i>, vol. 52, no. 4. Society for Industrial and Applied Mathematics, pp. 913–944, 2023.","ista":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. 2023. Why extension-based proofs fail. SIAM Journal on Computing. 52(4), 913–944.","ama":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. Why extension-based proofs fail. <i>SIAM Journal on Computing</i>. 2023;52(4):913-944. doi:<a href=\"https://doi.org/10.1137/20M1375851\">10.1137/20M1375851</a>","apa":"Alistarh, D.-A., Aspnes, J., Ellen, F., Gelashvili, R., &#38; Zhu, L. (2023). Why extension-based proofs fail. <i>SIAM Journal on Computing</i>. Society for Industrial and Applied Mathematics. <a href=\"https://doi.org/10.1137/20M1375851\">https://doi.org/10.1137/20M1375851</a>","mla":"Alistarh, Dan-Adrian, et al. “Why Extension-Based Proofs Fail.” <i>SIAM Journal on Computing</i>, vol. 52, no. 4, Society for Industrial and Applied Mathematics, 2023, pp. 913–44, doi:<a href=\"https://doi.org/10.1137/20M1375851\">10.1137/20M1375851</a>.","short":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, L. Zhu, SIAM Journal on Computing 52 (2023) 913–944.","chicago":"Alistarh, Dan-Adrian, James Aspnes, Faith Ellen, Rati Gelashvili, and Leqi Zhu. “Why Extension-Based Proofs Fail.” <i>SIAM Journal on Computing</i>. Society for Industrial and Applied Mathematics, 2023. <a href=\"https://doi.org/10.1137/20M1375851\">https://doi.org/10.1137/20M1375851</a>."},"date_created":"2023-09-24T22:01:11Z","date_published":"2023-07-25T00:00:00Z","intvolume":"        52","article_type":"original","publication":"SIAM Journal on Computing","year":"2023","ec_funded":1,"quality_controlled":"1","language":[{"iso":"eng"}],"status":"public","related_material":{"record":[{"relation":"earlier_version","id":"6676","status":"public"}]},"scopus_import":"1","title":"Why extension-based proofs fail","publisher":"Society for Industrial and Applied Mathematics","page":"913-944","_id":"14364","main_file_link":[{"url":"https://arxiv.org/abs/1811.01421","open_access":"1"}],"publication_identifier":{"issn":["0097-5397"],"eissn":["1095-7111"]},"acknowledgement":"We would like to thank Valerie King, Toniann Pitassi, and Michael Saks for helpful discussions and Shi Hao Liu for his useful feedback.\r\nThis research was supported by the Natural Science and Engineering Research Council of Canada under grants RGPIN-2015-05080 and RGPIN-2020-04178, a postgraduate scholarship, and a postdoctoral fellowship; a University of Toronto postdoctoral fellowship; the National Science Foundation under grants CCF-1217921, CCF-1301926, CCF-1637385, CCF-1650596, and IIS-1447786; the U.S. Department of Energy under grant ER26116/DE-SC0008923; the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement 805223 ScaleML; and the Oracle and Intel corporations. Some of the work on this paper was done while Faith Ellen was visiting IST Austria.","type":"journal_article","volume":52,"date_updated":"2025-05-14T11:26:06Z","publication_status":"published","doi":"10.1137/20M1375851","abstract":[{"lang":"eng","text":"We introduce extension-based proofs, a class of impossibility proofs that includes valency arguments. They are modelled as an interaction between a prover and a protocol. Using proofs based on combinatorial topology, it has been shown that it is impossible to deterministically solve -set agreement among  processes or approximate agreement on a cycle of length 4 among  processes in a wait-free manner in asynchronous models where processes communicate using objects that can be constructed from shared registers. However, it was unknown whether proofs based on simpler techniques were possible. We show that these impossibility results cannot be obtained by extension-based proofs in the iterated snapshot model and, hence, extension-based proofs are limited in power."}]},{"publication_identifier":{"eissn":["2640-3498"]},"alternative_title":["PMLR"],"acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl.","acknowledged_ssus":[{"_id":"ScienComp"}],"type":"conference","volume":202,"date_updated":"2025-04-14T13:52:01Z","publication_status":"published","corr_author":"1","abstract":[{"text":"We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.","lang":"eng"}],"status":"public","related_material":{"record":[{"status":"public","relation":"dissertation_contains","id":"17485"}]},"title":"SparseGPT: Massive language models can be accurately pruned in one-shot","scopus_import":"1","conference":{"end_date":"2023-07-29","start_date":"2023-07-23","location":"Honolulu, Hawaii, HI, United States","name":"ICML: International Conference on Machine Learning"},"publisher":"ML Research Press","page":"10323-10337","_id":"14458","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2301.00774"}],"arxiv":1,"external_id":{"arxiv":["2301.00774"]},"citation":{"ista":"Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.","ama":"Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:10323-10337.","ieee":"E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be accurately pruned in one-shot,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 10323–10337.","short":"E. Frantar, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 10323–10337.","chicago":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:10323–37. ML Research Press, 2023.","apa":"Frantar, E., &#38; Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.","mla":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 10323–37."},"date_created":"2023-10-29T23:01:16Z","date_published":"2023-07-30T00:00:00Z","intvolume":"       202","publication":"Proceedings of the 40th International Conference on Machine Learning","year":"2023","ec_funded":1,"quality_controlled":"1","language":[{"iso":"eng"}],"day":"30","article_processing_charge":"No","month":"07","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","author":[{"last_name":"Frantar","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","full_name":"Frantar, Elias","first_name":"Elias"},{"orcid":"0000-0003-3650-940X","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian"}],"department":[{"_id":"DaAl"}],"project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"oa":1,"oa_version":"Preprint"},{"status":"public","title":"SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge","scopus_import":"1","page":"26215-26227","_id":"14460","publisher":"ML Research Press","conference":{"end_date":"2023-07-29","start_date":"2023-07-23","location":"Honolulu, Hawaii, HI, United States","name":"ICML: International Conference on Machine Learning"},"main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2302.04852","open_access":"1"}],"publication_identifier":{"eissn":["2640-3498"]},"type":"conference","alternative_title":["PMLR"],"acknowledgement":"We would like to thank Elias Frantar for his valuable assistance and support at the outset of this project, and the anonymous ICML and SNN reviewers for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant 805223 ScaleML. ","date_updated":"2025-04-14T07:49:12Z","volume":202,"corr_author":"1","publication_status":"published","abstract":[{"lang":"eng","text":"We provide an efficient implementation of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware."}],"article_processing_charge":"No","day":"30","month":"07","author":[{"full_name":"Nikdan, Mahdi","first_name":"Mahdi","id":"66374281-f394-11eb-9cf6-869147deecc0","last_name":"Nikdan"},{"last_name":"Pegolotti","first_name":"Tommaso","full_name":"Pegolotti, Tommaso"},{"full_name":"Iofinova, Eugenia B","first_name":"Eugenia B","orcid":"0000-0002-7778-3221","last_name":"Iofinova","id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117"},{"last_name":"Kurtic","id":"47beb3a5-07b5-11eb-9b87-b108ec578218","first_name":"Eldar","full_name":"Kurtic, Eldar"},{"orcid":"0000-0003-3650-940X","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","department":[{"_id":"DaAl"}],"oa":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"oa_version":"Preprint","arxiv":1,"external_id":{"arxiv":["2302.04852"]},"date_published":"2023-07-30T00:00:00Z","date_created":"2023-10-29T23:01:17Z","citation":{"chicago":"Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:26215–27. ML Research Press, 2023.","short":"M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 26215–26227.","mla":"Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 26215–27.","apa":"Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., &#38; Alistarh, D.-A. (2023). SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United States: ML Research Press.","ama":"Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:26215-26227.","ista":"Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.","ieee":"M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227."},"intvolume":"       202","publication":"Proceedings of the 40th International Conference on Machine Learning","year":"2023","ec_funded":1,"language":[{"iso":"eng"}],"quality_controlled":"1"},{"corr_author":"1","publication_status":"published","date_updated":"2025-05-08T13:20:20Z","volume":202,"acknowledged_ssus":[{"_id":"ScienComp"}],"type":"conference","acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), as well as experimental support from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl. AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT), the support of Fondation Hadamard with a PRMO grant, and the support of CNRS with a CoopIntEER IEA grant (project ALFRED).","alternative_title":["PMLR"],"publication_identifier":{"eissn":["2640-3498"]},"abstract":[{"text":"Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is that applying compression techniques to FSDP is challenging: as the vast majority of the communication involves the model’s weights, direct compression alters convergence and leads to accuracy loss. We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. To derive QSDP we prove that a natural modification of SGD achieves convergence even when we only maintain quantized weights, and thus the domain over which we train consists of quantized points and is, therefore, highly non-convex. We validate this approach by training GPT-family models with up to 1.3 billion parameters on a multi-node cluster. Experiments show that QSDP preserves model accuracy, while completely removing the communication bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x.","lang":"eng"}],"related_material":{"record":[{"relation":"dissertation_contains","id":"17490","status":"public"}]},"status":"public","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2302.02390","open_access":"1"}],"_id":"14461","page":"24020-24044","conference":{"end_date":"2023-07-29","start_date":"2023-07-23","location":"Honolulu, Hawaii, HI, United States","name":"ICML: International Conference on Machine Learning"},"publisher":"ML Research Press","scopus_import":"1","title":"Quantized distributed training of large models with convergence guarantees","intvolume":"       202","date_published":"2023-07-30T00:00:00Z","date_created":"2023-10-29T23:01:17Z","citation":{"ama":"Markov I, Vladu A, Guo Q, Alistarh D-A. Quantized distributed training of large models with convergence guarantees. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:24020-24044.","ista":"Markov I, Vladu A, Guo Q, Alistarh D-A. 2023. Quantized distributed training of large models with convergence guarantees. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 24020–24044.","ieee":"I. Markov, A. Vladu, Q. Guo, and D.-A. Alistarh, “Quantized distributed training of large models with convergence guarantees,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 24020–24044.","chicago":"Markov, Ilia, Adrian Vladu, Qi Guo, and Dan-Adrian Alistarh. “Quantized Distributed Training of Large Models with Convergence Guarantees.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:24020–44. ML Research Press, 2023.","short":"I. Markov, A. Vladu, Q. Guo, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 24020–24044.","mla":"Markov, Ilia, et al. “Quantized Distributed Training of Large Models with Convergence Guarantees.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 24020–44.","apa":"Markov, I., Vladu, A., Guo, Q., &#38; Alistarh, D.-A. (2023). Quantized distributed training of large models with convergence guarantees. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 24020–24044). Honolulu, Hawaii, HI, United States: ML Research Press."},"external_id":{"arxiv":["2302.02390"]},"arxiv":1,"language":[{"iso":"eng"}],"quality_controlled":"1","ec_funded":1,"year":"2023","publication":"Proceedings of the 40th International Conference on Machine Learning","author":[{"id":"D0CF4148-C985-11E9-8066-0BDEE5697425","last_name":"Markov","full_name":"Markov, Ilia","first_name":"Ilia"},{"first_name":"Adrian","full_name":"Vladu, Adrian","last_name":"Vladu"},{"last_name":"Guo","full_name":"Guo, Qi","first_name":"Qi"},{"last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"07","day":"30","article_processing_charge":"No","oa_version":"Preprint","oa":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"department":[{"_id":"DaAl"}]},{"external_id":{"isi":["000934262700001"]},"intvolume":"       948","article_type":"original","ddc":["000"],"date_published":"2023-02-28T00:00:00Z","citation":{"ista":"Alistarh D-A, Ellen F, Rybicki J. 2023. Wait-free approximate agreement on graphs. Theoretical Computer Science. 948(2), 113733.","ama":"Alistarh D-A, Ellen F, Rybicki J. Wait-free approximate agreement on graphs. <i>Theoretical Computer Science</i>. 2023;948(2). doi:<a href=\"https://doi.org/10.1016/j.tcs.2023.113733\">10.1016/j.tcs.2023.113733</a>","ieee":"D.-A. Alistarh, F. Ellen, and J. Rybicki, “Wait-free approximate agreement on graphs,” <i>Theoretical Computer Science</i>, vol. 948, no. 2. Elsevier, 2023.","short":"D.-A. Alistarh, F. Ellen, J. Rybicki, Theoretical Computer Science 948 (2023).","chicago":"Alistarh, Dan-Adrian, Faith Ellen, and Joel Rybicki. “Wait-Free Approximate Agreement on Graphs.” <i>Theoretical Computer Science</i>. Elsevier, 2023. <a href=\"https://doi.org/10.1016/j.tcs.2023.113733\">https://doi.org/10.1016/j.tcs.2023.113733</a>.","apa":"Alistarh, D.-A., Ellen, F., &#38; Rybicki, J. (2023). Wait-free approximate agreement on graphs. <i>Theoretical Computer Science</i>. Elsevier. <a href=\"https://doi.org/10.1016/j.tcs.2023.113733\">https://doi.org/10.1016/j.tcs.2023.113733</a>","mla":"Alistarh, Dan-Adrian, et al. “Wait-Free Approximate Agreement on Graphs.” <i>Theoretical Computer Science</i>, vol. 948, no. 2, 113733, Elsevier, 2023, doi:<a href=\"https://doi.org/10.1016/j.tcs.2023.113733\">10.1016/j.tcs.2023.113733</a>."},"date_created":"2023-02-19T23:00:55Z","has_accepted_license":"1","year":"2023","publication":"Theoretical Computer Science","language":[{"iso":"eng"}],"quality_controlled":"1","ec_funded":1,"issue":"2","isi":1,"article_processing_charge":"Yes (via OA deal)","day":"28","author":[{"last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian"},{"first_name":"Faith","full_name":"Ellen, Faith","last_name":"Ellen"},{"id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","last_name":"Rybicki","orcid":"0000-0002-6432-6646","first_name":"Joel","full_name":"Rybicki, Joel"}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png"},"user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","month":"02","oa":1,"project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"},{"name":"Coordination in constrained and natural distributed systems","call_identifier":"H2020","grant_number":"840605","_id":"26A5D39A-B435-11E9-9278-68D0E5697425"}],"article_number":"113733","file":[{"file_size":602333,"date_created":"2023-02-20T07:30:20Z","file_name":"2023_TheoreticalCompScience_Alistarh.pdf","date_updated":"2023-02-20T07:30:20Z","content_type":"application/pdf","success":1,"access_level":"open_access","checksum":"b27c5290f2f1500c403494364ee39c9f","creator":"dernst","file_id":"12570","relation":"main_file"}],"department":[{"_id":"DaAl"}],"oa_version":"Published Version","type":"journal_article","acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 805223 ScaleML) and under the Marie Skłodowska-Curie grant agreement No. 840605 and from the Natural Sciences and Engineering Research Council of Canada grant RGPIN-2020-04178. Part of this work was done while Faith Ellen was visiting IST Austria.","publication_identifier":{"issn":["0304-3975"]},"file_date_updated":"2023-02-20T07:30:20Z","corr_author":"1","publication_status":"published","date_updated":"2025-04-14T07:49:16Z","volume":948,"abstract":[{"text":"Approximate agreement is one of the few variants of consensus that can be solved in a wait-free manner in asynchronous systems where processes communicate by reading and writing to shared memory. In this work, we consider a natural generalisation of approximate agreement on arbitrary undirected connected graphs. Each process is given a node of the graph as input and, if non-faulty, must output a node such that\r\n– all the outputs are within distance 1 of one another, and\r\n– each output value lies on a shortest path between two input values.\r\nFrom prior work, it is known that there is no wait-free algorithm among  processes for this problem on any cycle of length , by reduction from 2-set agreement (Castañeda et al., 2018).\r\n\r\nIn this work, we investigate the solvability of this task on general graphs. We give a new, direct proof of the impossibility of approximate agreement on cycles of length , via a generalisation of Sperner's Lemma to convex polygons. We also extend the reduction from 2-set agreement to a larger class of graphs, showing that approximate agreement on these graphs is unsolvable. On the positive side, we present a wait-free algorithm for a different class of graphs, which properly contains the class of chordal graphs.","lang":"eng"}],"doi":"10.1016/j.tcs.2023.113733","status":"public","_id":"12566","publisher":"Elsevier","scopus_import":"1","title":"Wait-free approximate agreement on graphs"},{"status":"public","related_material":{"record":[{"status":"public","id":"13074","relation":"dissertation_contains"}],"link":[{"relation":"software","url":"https://github.com/IST-DASLab/CrAM"}]},"_id":"13053","conference":{"start_date":"2023-05-01","end_date":"2023-05-05","name":"ICLR: International Conference on Learning Representations","location":"Kigali, Rwanda "},"publisher":"OpenReview","title":"CrAM: A Compression-Aware Minimizer","main_file_link":[{"url":"https://openreview.net/pdf?id=_eTZBs-yedr","open_access":"1"}],"type":"conference","acknowledged_ssus":[{"_id":"ScienComp"}],"acknowledgement":"AP, EK, DA received funding from the European Research Council (ERC) under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further acknowledge the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).","file_date_updated":"2024-07-22T09:09:45Z","corr_author":"1","publication_status":"published","date_updated":"2025-04-15T06:49:21Z","abstract":[{"text":"Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90% with reasonable (∼1%) accuracy loss, which is competitive with gradual compression methods. Additionally, CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware. The code for reproducing the results is available at this https URL .","lang":"eng"}],"day":"01","article_processing_charge":"No","author":[{"full_name":"Peste, Elena-Alexandra","first_name":"Elena-Alexandra","id":"32D78294-F248-11E8-B48F-1D18A9856A87","last_name":"Peste"},{"first_name":"Adrian","full_name":"Vladu, Adrian","last_name":"Vladu"},{"first_name":"Eldar","full_name":"Kurtic, Eldar","last_name":"Kurtic","id":"47beb3a5-07b5-11eb-9b87-b108ec578218"},{"id":"40C20FD2-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-8622-7887","last_name":"Lampert","first_name":"Christoph","full_name":"Lampert, Christoph"},{"last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"05","oa":1,"project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"department":[{"_id":"GradSch"},{"_id":"DaAl"},{"_id":"ChLa"}],"file":[{"file_size":458201,"access_level":"open_access","success":1,"content_type":"application/pdf","file_name":"2023_ICLR_Peste.pdf","date_updated":"2024-07-22T09:09:45Z","date_created":"2024-07-22T09:09:45Z","file_id":"17294","creator":"dernst","checksum":"a6eec897e13a91cdc3eeaf309801752c","relation":"main_file"}],"oa_version":"Published Version","external_id":{"arxiv":["2207.14200"]},"arxiv":1,"date_published":"2023-05-01T00:00:00Z","ddc":["000"],"date_created":"2023-05-23T11:36:18Z","has_accepted_license":"1","citation":{"mla":"Krumes, Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” <i>11th International Conference on Learning Representations </i>, OpenReview, 2023.","apa":"Krumes, A., Vladu, A., Kurtic, E., Lampert, C., &#38; Alistarh, D.-A. (2023). CrAM: A Compression-Aware Minimizer. In <i>11th International Conference on Learning Representations </i>. Kigali, Rwanda : OpenReview.","chicago":"Krumes, Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert, and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In <i>11th International Conference on Learning Representations </i>. OpenReview, 2023.","short":"A. Krumes, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International Conference on Learning Representations , OpenReview, 2023.","ieee":"A. Krumes, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM: A Compression-Aware Minimizer,” in <i>11th International Conference on Learning Representations </i>, Kigali, Rwanda , 2023.","ama":"Krumes A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. In: <i>11th International Conference on Learning Representations </i>. OpenReview; 2023.","ista":"Krumes A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. 2023. CrAM: A Compression-Aware Minimizer. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations."},"year":"2023","publication":"11th International Conference on Learning Representations ","language":[{"iso":"eng"}],"quality_controlled":"1","ec_funded":1},{"project":[{"name":"International IST Doctoral Program","call_identifier":"H2020","grant_number":"665385","_id":"2564DBCA-B435-11E9-9278-68D0E5697425"},{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"oa":1,"file":[{"file_id":"13087","checksum":"6b3354968403cb9d48cc5a83611fb571","creator":"epeste","relation":"main_file","file_size":2152072,"content_type":"application/pdf","access_level":"open_access","success":1,"date_created":"2023-05-24T16:11:16Z","date_updated":"2023-05-24T16:11:16Z","file_name":"PhD_Thesis_Alexandra_Peste_final.pdf"},{"file_size":1658293,"file_name":"PhD_Thesis_APeste.zip","date_updated":"2023-05-24T16:12:59Z","date_created":"2023-05-24T16:12:59Z","access_level":"closed","content_type":"application/zip","creator":"epeste","checksum":"8d0df94bbcf4db72c991f22503b3fd60","file_id":"13088","relation":"source_file"}],"department":[{"_id":"GradSch"},{"_id":"DaAl"},{"_id":"ChLa"}],"oa_version":"Published Version","day":"23","article_processing_charge":"No","author":[{"first_name":"Elena-Alexandra","full_name":"Peste, Elena-Alexandra","last_name":"Peste","id":"32D78294-F248-11E8-B48F-1D18A9856A87"}],"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","month":"05","year":"2023","language":[{"iso":"eng"}],"supervisor":[{"full_name":"Lampert, Christoph","first_name":"Christoph","last_name":"Lampert","orcid":"0000-0001-8622-7887","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87"},{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"ec_funded":1,"ddc":["000"],"date_published":"2023-05-23T00:00:00Z","has_accepted_license":"1","date_created":"2023-05-23T17:07:53Z","citation":{"ieee":"A. Krumes, “Efficiency and generalization of sparse neural networks,” Institute of Science and Technology Austria, 2023.","ista":"Krumes A. 2023. Efficiency and generalization of sparse neural networks. Institute of Science and Technology Austria.","ama":"Krumes A. Efficiency and generalization of sparse neural networks. 2023. doi:<a href=\"https://doi.org/10.15479/at:ista:13074\">10.15479/at:ista:13074</a>","apa":"Krumes, A. (2023). <i>Efficiency and generalization of sparse neural networks</i>. Institute of Science and Technology Austria. <a href=\"https://doi.org/10.15479/at:ista:13074\">https://doi.org/10.15479/at:ista:13074</a>","mla":"Krumes, Alexandra. <i>Efficiency and Generalization of Sparse Neural Networks</i>. Institute of Science and Technology Austria, 2023, doi:<a href=\"https://doi.org/10.15479/at:ista:13074\">10.15479/at:ista:13074</a>.","short":"A. Krumes, Efficiency and Generalization of Sparse Neural Networks, Institute of Science and Technology Austria, 2023.","chicago":"Krumes, Alexandra. “Efficiency and Generalization of Sparse Neural Networks.” Institute of Science and Technology Austria, 2023. <a href=\"https://doi.org/10.15479/at:ista:13074\">https://doi.org/10.15479/at:ista:13074</a>."},"_id":"13074","degree_awarded":"PhD","page":"147","publisher":"Institute of Science and Technology Austria","title":"Efficiency and generalization of sparse neural networks","status":"public","related_material":{"record":[{"id":"13053","relation":"part_of_dissertation","status":"public"},{"id":"11458","relation":"part_of_dissertation","status":"public"},{"status":"public","id":"12299","relation":"part_of_dissertation"}]},"abstract":[{"text":"Deep learning has become an integral part of a large number of important applications, and many of the recent breakthroughs have been enabled by the ability to train very large models, capable to capture complex patterns and relationships from the data. At the same time, the massive sizes of modern deep learning models have made their deployment to smaller devices more challenging; this is particularly important, as in many applications the users rely on accurate deep learning predictions, but they only have access to devices with limited memory and compute power. One solution to this problem is to prune neural networks, by setting as many of their parameters as possible to zero, to obtain accurate sparse models with lower memory footprint. Despite the great research progress in obtaining sparse models that preserve accuracy, while satisfying memory and computational constraints, there are still many challenges associated with efficiently training sparse models, as well as understanding their generalization properties.\r\n\r\nThe focus of this thesis is to investigate how the training process of sparse models can be made more efficient, and to understand the differences between sparse and dense models in terms of how well they can generalize to changes in the data distribution. We first study a method for co-training sparse and dense models, at a lower cost compared to regular training. With our method we can obtain very accurate sparse networks, and dense models that can recover the baseline accuracy. Furthermore, we are able to more easily analyze the differences, at prediction level, between the sparse-dense model pairs. Next, we investigate the generalization properties of sparse neural networks in more detail, by studying how well different sparse models trained on a larger task can adapt to smaller, more specialized tasks, in a transfer learning scenario. Our analysis across multiple pruning methods and sparsity levels reveals that sparse models provide features that can transfer similarly to or better than the dense baseline. However, the choice of the pruning method plays an important role, and can influence the results when the features are fixed (linear finetuning), or when they are allowed to adapt to the new task (full finetuning). Using sparse models with fixed masks for finetuning on new tasks has an important practical advantage, as it enables training neural networks on smaller devices. However, one drawback of current pruning methods is that the entire training cycle has to be repeated to obtain the initial sparse model, for every sparsity target; in consequence, the entire training process is costly and also multiple models need to be stored. In the last part of the thesis we propose a method that can train accurate dense models that are compressible in a single step, to multiple sparsity levels, without additional finetuning. Our method results in sparse models that can be competitive with existing pruning methods, and which can also successfully generalize to new tasks.","lang":"eng"}],"doi":"10.15479/at:ista:13074","type":"dissertation","acknowledged_ssus":[{"_id":"ScienComp"}],"alternative_title":["ISTA Thesis"],"publication_identifier":{"issn":["2663-337X"]},"file_date_updated":"2023-05-24T16:12:59Z","corr_author":"1","publication_status":"published","date_updated":"2025-12-30T11:05:21Z"},{"publication_identifier":{"eisbn":["9781611977585"]},"acknowledgement":"We thank the anonymous reviewers for their helpful feedback on previous versions of this work. Parts ofthis work appeared in DISC 2021 as a brief announcement [ 21]. This work was supported in part by theEuropean Research Council (ERC) under the European Union’s Horizon 2020 research and innovationprogramme (grant agreement No 805223 ScaleML), the Academy of Finland (grant agreement No 333837),the Austrian Science Fund (FWF) and netIDEE (grant agreement No P 33775-N), and the AustrianScience Fund (FWF) project DELTA (grant agreement No I 5025-N).","type":"book_chapter","date_updated":"2025-09-24T09:24:20Z","publication_status":"published","doi":"10.1137/1.9781611977585.ch17","abstract":[{"lang":"eng","text":"The sinkless orientation problem plays a key role in understanding the foundations of distributed computing. The problem can be used to separate two fundamental models of distributed graph algorithms, LOCAL and SLOCAL: the locality of sinkless orientation is Ω(log n) in the deterministic LOCAL model and O(log log n) in the deterministic SLOCAL model. Both of these results are known by prior work, but here we give new simple, self-contained proofs for them."}],"status":"public","title":"Sinkless Orientation Made Simple","OA_place":"repository","publisher":"2023 Society for Industrial and Applied Mathematics","conference":{"end_date":"2023-01-25","start_date":"2023-01-23","location":"Florence, Italy","name":"SOSA: Symposium on Simplicity in Algorithms"},"page":"175-191","_id":"19983","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2108.02655"}],"arxiv":1,"external_id":{"arxiv":["2108.02655"]},"citation":{"short":"A. Balliu, J. Korhonen, F. Kuhn, H. Lievonen, D. Olivetti, S. Pai, A. Paz, J. Rybicki, S. Schmid, J. Studený, J. Suomela, J. Uitto, in:, Symposium on Simplicity in Algorithms, 2023 Society for Industrial and Applied Mathematics, 2023, pp. 175–191.","chicago":"Balliu, Alkida, Janne Korhonen, Fabian Kuhn, Henrik Lievonen, Dennis Olivetti, Shreyas Pai, Ami Paz, et al. “Sinkless Orientation Made Simple.” In <i>Symposium on Simplicity in Algorithms</i>, 175–91. 2023 Society for Industrial and Applied Mathematics, 2023. <a href=\"https://doi.org/10.1137/1.9781611977585.ch17\">https://doi.org/10.1137/1.9781611977585.ch17</a>.","apa":"Balliu, A., Korhonen, J., Kuhn, F., Lievonen, H., Olivetti, D., Pai, S., … Uitto, J. (2023). Sinkless Orientation Made Simple. In <i>Symposium on Simplicity in Algorithms</i> (pp. 175–191). Florence, Italy: 2023 Society for Industrial and Applied Mathematics. <a href=\"https://doi.org/10.1137/1.9781611977585.ch17\">https://doi.org/10.1137/1.9781611977585.ch17</a>","mla":"Balliu, Alkida, et al. “Sinkless Orientation Made Simple.” <i>Symposium on Simplicity in Algorithms</i>, 2023 Society for Industrial and Applied Mathematics, 2023, pp. 175–91, doi:<a href=\"https://doi.org/10.1137/1.9781611977585.ch17\">10.1137/1.9781611977585.ch17</a>.","ista":"Balliu A, Korhonen J, Kuhn F, Lievonen H, Olivetti D, Pai S, Paz A, Rybicki J, Schmid S, Studený J, Suomela J, Uitto J. 2023.Sinkless Orientation Made Simple. In: Symposium on Simplicity in Algorithms. , 175–191.","ama":"Balliu A, Korhonen J, Kuhn F, et al. Sinkless Orientation Made Simple. In: <i>Symposium on Simplicity in Algorithms</i>. 2023 Society for Industrial and Applied Mathematics; 2023:175-191. doi:<a href=\"https://doi.org/10.1137/1.9781611977585.ch17\">10.1137/1.9781611977585.ch17</a>","ieee":"A. Balliu <i>et al.</i>, “Sinkless Orientation Made Simple,” in <i>Symposium on Simplicity in Algorithms</i>, 2023 Society for Industrial and Applied Mathematics, 2023, pp. 175–191."},"date_created":"2025-07-10T13:11:47Z","date_published":"2023-01-12T00:00:00Z","publication":"Symposium on Simplicity in Algorithms","year":"2023","ec_funded":1,"quality_controlled":"1","OA_type":"green","language":[{"iso":"eng"}],"day":"12","article_processing_charge":"No","month":"01","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","author":[{"full_name":"Balliu, Alkida","first_name":"Alkida","last_name":"Balliu"},{"first_name":"Janne","full_name":"Korhonen, Janne","last_name":"Korhonen","id":"C5402D42-15BC-11E9-A202-CA2BE6697425"},{"first_name":"Fabian","full_name":"Kuhn, Fabian","last_name":"Kuhn"},{"first_name":"Henrik","full_name":"Lievonen, Henrik","last_name":"Lievonen"},{"last_name":"Olivetti","first_name":"Dennis","full_name":"Olivetti, Dennis"},{"last_name":"Pai","full_name":"Pai, Shreyas","first_name":"Shreyas"},{"first_name":"Ami","full_name":"Paz, Ami","last_name":"Paz"},{"id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","last_name":"Rybicki","orcid":"0000-0002-6432-6646","first_name":"Joel","full_name":"Rybicki, Joel"},{"last_name":"Schmid","full_name":"Schmid, Stefan","first_name":"Stefan"},{"full_name":"Studený, Jan","first_name":"Jan","last_name":"Studený"},{"first_name":"Jukka","full_name":"Suomela, Jukka","last_name":"Suomela"},{"full_name":"Uitto, Jara","first_name":"Jara","last_name":"Uitto"}],"department":[{"_id":"DaAl"}],"oa":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"},{"_id":"bd9e3a2e-d553-11ed-ba76-8aa684ce17fe","grant_number":"P33775","name":"Fast Algorithms for a Reactive Network Layer"}],"oa_version":"Preprint"},{"doi":"10.1109/cvpr52729.2023.02334","abstract":[{"lang":"eng","text":"Pruning—that is, setting a significant subset of the parameters of a neural network to zero—is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias."}],"publication_identifier":{"eissn":["2575-7075"],"eisbn":["9798350301298"]},"acknowledgement":"The authors would like to sincerely thank Sara Hooker for her feedback during the development of this work. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via Starting Grant 805223 ScaleML.","type":"conference","date_updated":"2025-04-25T10:32:05Z","publication_status":"published","corr_author":"1","title":"Bias in pruned vision models: In-depth analysis and countermeasures","conference":{"location":"Vancouver, BC, Canada","name":"CVPR: Conference on Computer Vision and Pattern Recognition","end_date":"2023-06-24","start_date":"2023-06-17"},"publisher":"IEEE","page":"24364-24373","_id":"14771","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2304.12622","open_access":"1"}],"status":"public","related_material":{"link":[{"relation":"software","url":"https://github.com/IST-DASLab/pruned-vision-model-bias"}]},"publication":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition","year":"2023","ec_funded":1,"quality_controlled":"1","language":[{"iso":"eng"}],"arxiv":1,"external_id":{"isi":["001062531308068"],"arxiv":["2304.12622"]},"citation":{"mla":"Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures.” <i>2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, IEEE, 2023, pp. 24364–73, doi:<a href=\"https://doi.org/10.1109/cvpr52729.2023.02334\">10.1109/cvpr52729.2023.02334</a>.","apa":"Iofinova, E. B., Krumes, A., &#38; Alistarh, D.-A. (2023). Bias in pruned vision models: In-depth analysis and countermeasures. In <i>2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i> (pp. 24364–24373). Vancouver, BC, Canada: IEEE. <a href=\"https://doi.org/10.1109/cvpr52729.2023.02334\">https://doi.org/10.1109/cvpr52729.2023.02334</a>","chicago":"Iofinova, Eugenia B, Alexandra Krumes, and Dan-Adrian Alistarh. “Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In <i>2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, 24364–73. IEEE, 2023. <a href=\"https://doi.org/10.1109/cvpr52729.2023.02334\">https://doi.org/10.1109/cvpr52729.2023.02334</a>.","short":"E.B. Iofinova, A. Krumes, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373.","ieee":"E. B. Iofinova, A. Krumes, and D.-A. Alistarh, “Bias in pruned vision models: In-depth analysis and countermeasures,” in <i>2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, Vancouver, BC, Canada, 2023, pp. 24364–24373.","ama":"Iofinova EB, Krumes A, Alistarh D-A. Bias in pruned vision models: In-depth analysis and countermeasures. In: <i>2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>. IEEE; 2023:24364-24373. doi:<a href=\"https://doi.org/10.1109/cvpr52729.2023.02334\">10.1109/cvpr52729.2023.02334</a>","ista":"Iofinova EB, Krumes A, Alistarh D-A. 2023. Bias in pruned vision models: In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition, 24364–24373."},"date_created":"2024-01-10T08:42:40Z","date_published":"2023-08-22T00:00:00Z","department":[{"_id":"DaAl"},{"_id":"ChLa"}],"project":[{"name":"Vienna Graduate School on Computational Optimization","grant_number":"W1260-N35","_id":"9B9290DE-BA93-11EA-9121-9846C619BF3A"},{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"oa":1,"oa_version":"Preprint","day":"22","article_processing_charge":"No","isi":1,"month":"08","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","author":[{"first_name":"Eugenia B","full_name":"Iofinova, Eugenia B","orcid":"0000-0002-7778-3221","last_name":"Iofinova","id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117"},{"full_name":"Peste, Elena-Alexandra","first_name":"Elena-Alexandra","id":"32D78294-F248-11E8-B48F-1D18A9856A87","last_name":"Peste"},{"last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}]},{"month":"05","author":[{"id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","last_name":"Frantar","first_name":"Elias","full_name":"Frantar, Elias"},{"last_name":"Ashkboos","full_name":"Ashkboos, Saleh","first_name":"Saleh"},{"full_name":"Hoefler, Torsten","first_name":"Torsten","last_name":"Hoefler"},{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","last_name":"Alistarh"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","day":"01","article_processing_charge":"No","oa_version":"Published Version","file":[{"success":1,"access_level":"open_access","content_type":"application/pdf","date_updated":"2024-08-05T07:52:44Z","file_name":"2023_ICLR_Frantar.pdf","date_created":"2024-08-05T07:52:44Z","file_size":437492,"relation":"main_file","file_id":"17385","creator":"dernst","checksum":"aacbf11dbd8b02a3e0bfd942a33e0593"}],"department":[{"_id":"DaAl"}],"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"oa":1,"ddc":["000"],"date_published":"2023-05-01T00:00:00Z","citation":{"mla":"Frantar, Elias, et al. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” <i>11th International Conference on Learning Representations </i>, International Conference on Learning Representations, 2023.","apa":"Frantar, E., Ashkboos, S., Hoefler, T., &#38; Alistarh, D.-A. (2023). OPTQ: Accurate post-training quantization for generative pre-trained transformers. In <i>11th International Conference on Learning Representations </i>. Kigali, Rwanda: International Conference on Learning Representations.","chicago":"Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” In <i>11th International Conference on Learning Representations </i>. International Conference on Learning Representations, 2023.","short":"E. Frantar, S. Ashkboos, T. Hoefler, D.-A. Alistarh, in:, 11th International Conference on Learning Representations , International Conference on Learning Representations, 2023.","ieee":"E. Frantar, S. Ashkboos, T. Hoefler, and D.-A. Alistarh, “OPTQ: Accurate post-training quantization for generative pre-trained transformers,” in <i>11th International Conference on Learning Representations </i>, Kigali, Rwanda, 2023.","ama":"Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. OPTQ: Accurate post-training quantization for generative pre-trained transformers. In: <i>11th International Conference on Learning Representations </i>. International Conference on Learning Representations; 2023.","ista":"Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. 2023. OPTQ: Accurate post-training quantization for generative pre-trained transformers. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations."},"date_created":"2024-08-04T22:01:22Z","has_accepted_license":"1","ec_funded":1,"language":[{"iso":"eng"}],"quality_controlled":"1","publication":"11th International Conference on Learning Representations ","year":"2023","related_material":{"link":[{"url":"https://github.com/IST-DASLab/gptq","relation":"software"}],"record":[{"status":"public","id":"17485","relation":"dissertation_contains"}]},"status":"public","scopus_import":"1","title":"OPTQ: Accurate post-training quantization for generative pre-trained transformers","_id":"17378","publisher":"International Conference on Learning Representations","conference":{"end_date":"2023-05-05","start_date":"2023-05-01","name":"ICLR: International Conference on Learning Representations","location":"Kigali, Rwanda"},"date_updated":"2025-04-14T13:52:00Z","corr_author":"1","publication_status":"published","file_date_updated":"2024-08-05T07:52:44Z","acknowledged_ssus":[{"_id":"ScienComp"}],"type":"conference","acknowledgement":"Elias Frantar and Dan Alistarh gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl. The work of Saleh Ashkboos and Torsten Hoefler was supported by the PASC DaCeMI project, received EuroHPC-JU funding under grant MAELSTROM, No. 955513. We thank the Swiss National Supercomputing Center (CSCS) for supporting us with compute infrastructure.","abstract":[{"text":"Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose OPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. Specifically, OPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline. Our method more than doubles the compression gains relative to previously-proposed one-shot quantization methods, preserving accuracy, allowing us for the first time to execute an 175 billion-parameter model inside a single GPU for generative inference. Moreover, we also show that our method can still provide reasonable accuracy in the extreme quantization regime, in which weights are quantized to 2-bit or even ternary quantization levels. We show experimentally that these improvements can be leveraged for end-to-end inference speedups over FP16, of around 3.25x when using high-end GPUs (NVIDIA A100) and 4.5x when using more cost-effective ones (NVIDIA A6000). The implementation is available at https://github.com/IST-DASLab/gptq.","lang":"eng"}]},{"oa_version":"Published Version","oa":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"file":[{"content_type":"application/pdf","access_level":"open_access","success":1,"date_created":"2022-08-16T08:05:15Z","date_updated":"2022-08-16T08:05:15Z","file_name":"2022_PODC_Alistarh.pdf","file_size":1593474,"relation":"main_file","file_id":"11854","checksum":"4c6b29172b8e355b4fbc364a2e0827b2","creator":"cchlebak"}],"department":[{"_id":"DaAl"}],"user_id":"317138e5-6ab7-11ef-aa6d-ffef3953e345","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png"},"author":[{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Rybicki, Joel","first_name":"Joel","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","last_name":"Rybicki","orcid":"0000-0002-6432-6646"},{"first_name":"Sasha","full_name":"Voitovych, Sasha","last_name":"Voitovych"}],"month":"07","day":"21","article_processing_charge":"Yes (via OA deal)","isi":1,"quality_controlled":"1","language":[{"iso":"eng"}],"ec_funded":1,"year":"2022","publication":"Proceedings of the Annual ACM Symposium on Principles of Distributed Computing","date_created":"2022-08-14T22:01:46Z","has_accepted_license":"1","citation":{"ista":"Alistarh D-A, Rybicki J, Voitovych S. 2022. Near-optimal leader election in population protocols on graphs. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing. PODC: Symposium on Principles of Distributed Computing, 246–256.","ama":"Alistarh D-A, Rybicki J, Voitovych S. Near-optimal leader election in population protocols on graphs. In: <i>Proceedings of the Annual ACM Symposium on Principles of Distributed Computing</i>. Association for Computing Machinery; 2022:246-256. doi:<a href=\"https://doi.org/10.1145/3519270.3538435\">10.1145/3519270.3538435</a>","ieee":"D.-A. Alistarh, J. Rybicki, and S. Voitovych, “Near-optimal leader election in population protocols on graphs,” in <i>Proceedings of the Annual ACM Symposium on Principles of Distributed Computing</i>, Salerno, Italy, 2022, pp. 246–256.","short":"D.-A. Alistarh, J. Rybicki, S. Voitovych, in:, Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2022, pp. 246–256.","chicago":"Alistarh, Dan-Adrian, Joel Rybicki, and Sasha Voitovych. “Near-Optimal Leader Election in Population Protocols on Graphs.” In <i>Proceedings of the Annual ACM Symposium on Principles of Distributed Computing</i>, 246–56. Association for Computing Machinery, 2022. <a href=\"https://doi.org/10.1145/3519270.3538435\">https://doi.org/10.1145/3519270.3538435</a>.","apa":"Alistarh, D.-A., Rybicki, J., &#38; Voitovych, S. (2022). Near-optimal leader election in population protocols on graphs. In <i>Proceedings of the Annual ACM Symposium on Principles of Distributed Computing</i> (pp. 246–256). Salerno, Italy: Association for Computing Machinery. <a href=\"https://doi.org/10.1145/3519270.3538435\">https://doi.org/10.1145/3519270.3538435</a>","mla":"Alistarh, Dan-Adrian, et al. “Near-Optimal Leader Election in Population Protocols on Graphs.” <i>Proceedings of the Annual ACM Symposium on Principles of Distributed Computing</i>, Association for Computing Machinery, 2022, pp. 246–56, doi:<a href=\"https://doi.org/10.1145/3519270.3538435\">10.1145/3519270.3538435</a>."},"ddc":["000"],"date_published":"2022-07-21T00:00:00Z","external_id":{"arxiv":["2205.12597"],"isi":["001031439100030"]},"arxiv":1,"conference":{"name":"PODC: Symposium on Principles of Distributed Computing","location":"Salerno, Italy","end_date":"2022-07-29","start_date":"2022-07-25"},"publisher":"Association for Computing Machinery","_id":"11844","page":"246-256","title":"Near-optimal leader election in population protocols on graphs","scopus_import":"1","related_material":{"record":[{"status":"public","id":"19969","relation":"later_version"}]},"status":"public","abstract":[{"text":"In the stochastic population protocol model, we are given a connected graph with n nodes, and in every time step, a scheduler samples an edge of the graph uniformly at random and the nodes connected by this edge interact. A fundamental task in this model is stable leader election, in which all nodes start in an identical state and the aim is to reach a configuration in which (1) exactly one node is elected as leader and (2) this node remains as the unique leader no matter what sequence of interactions follows. On cliques, the complexity of this problem has recently been settled: time-optimal protocols stabilize in Θ(n log n) expected steps using Θ(log log n) states, whereas protocols that use O(1) states require Θ(n2) expected steps.\r\n\r\nIn this work, we investigate the complexity of stable leader election on general graphs. We provide the first non-trivial time lower bounds for leader election on general graphs, showing that, when moving beyond cliques, the complexity landscape of leader election becomes very diverse: the time required to elect a leader can range from O(1) to Θ(n3) expected steps. On the upper bound side, we first observe that there exists a protocol that is time-optimal on many graph families, but uses polynomially-many states. In contrast, we give a near-time-optimal protocol that uses only O(log2n) states that is at most a factor log n slower. Finally, we show that the constant-state protocol of Beauquier et al. [OPODIS 2013] is at most a factor n log n slower than the fast polynomial-state protocol. Moreover, among constant-state protocols, this protocol has near-optimal average case complexity on dense random graphs.","lang":"eng"}],"doi":"10.1145/3519270.3538435","publication_status":"published","corr_author":"1","date_updated":"2025-12-30T09:04:17Z","acknowledgement":"We thank the anonymous reviewers for their helpful comments. We gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","type":"conference","file_date_updated":"2022-08-16T08:05:15Z","publication_identifier":{"isbn":["9781450392624"]}},{"user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","author":[{"full_name":"Iofinova, Eugenia B","first_name":"Eugenia B","id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117","last_name":"Iofinova","orcid":"0000-0002-7778-3221"},{"last_name":"Peste","id":"32D78294-F248-11E8-B48F-1D18A9856A87","full_name":"Peste, Elena-Alexandra","first_name":"Elena-Alexandra"},{"full_name":"Kurtz, Mark","first_name":"Mark","last_name":"Kurtz"},{"last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}],"month":"09","article_processing_charge":"No","day":"27","isi":1,"oa_version":"Preprint","oa":1,"project":[{"name":"Vienna Graduate School on Computational Optimization","_id":"9B9290DE-BA93-11EA-9121-9846C619BF3A","grant_number":"W1260-N35"},{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"department":[{"_id":"DaAl"},{"_id":"ChLa"}],"citation":{"short":"E.B. Iofinova, A. Krumes, M. Kurtz, D.-A. Alistarh, in:, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 12256–12266.","chicago":"Iofinova, Eugenia B, Alexandra Krumes, Mark Kurtz, and Dan-Adrian Alistarh. “How Well Do Sparse ImageNet Models Transfer?” In <i>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, 12256–66. Institute of Electrical and Electronics Engineers, 2022. <a href=\"https://doi.org/10.1109/cvpr52688.2022.01195\">https://doi.org/10.1109/cvpr52688.2022.01195</a>.","apa":"Iofinova, E. B., Krumes, A., Kurtz, M., &#38; Alistarh, D.-A. (2022). How well do sparse ImageNet models transfer? In <i>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i> (pp. 12256–12266). New Orleans, LA, United States: Institute of Electrical and Electronics Engineers. <a href=\"https://doi.org/10.1109/cvpr52688.2022.01195\">https://doi.org/10.1109/cvpr52688.2022.01195</a>","mla":"Iofinova, Eugenia B., et al. “How Well Do Sparse ImageNet Models Transfer?” <i>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, Institute of Electrical and Electronics Engineers, 2022, pp. 12256–66, doi:<a href=\"https://doi.org/10.1109/cvpr52688.2022.01195\">10.1109/cvpr52688.2022.01195</a>.","ista":"Iofinova EB, Krumes A, Kurtz M, Alistarh D-A. 2022. How well do sparse ImageNet models transfer? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR: Computer Vision and Pattern Recognition, 12256–12266.","ama":"Iofinova EB, Krumes A, Kurtz M, Alistarh D-A. How well do sparse ImageNet models transfer? In: <i>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>. Institute of Electrical and Electronics Engineers; 2022:12256-12266. doi:<a href=\"https://doi.org/10.1109/cvpr52688.2022.01195\">10.1109/cvpr52688.2022.01195</a>","ieee":"E. B. Iofinova, A. Krumes, M. Kurtz, and D.-A. Alistarh, “How well do sparse ImageNet models transfer?,” in <i>2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition</i>, New Orleans, LA, United States, 2022, pp. 12256–12266."},"date_created":"2023-01-16T10:06:00Z","date_published":"2022-09-27T00:00:00Z","external_id":{"arxiv":["2111.13445"],"isi":["000870759105034"]},"arxiv":1,"quality_controlled":"1","language":[{"iso":"eng"}],"ec_funded":1,"year":"2022","publication":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition","related_material":{"record":[{"id":"13074","relation":"dissertation_contains","status":"public"}]},"status":"public","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2111.13445","open_access":"1"}],"conference":{"location":"New Orleans, LA, United States","name":"CVPR: Computer Vision and Pattern Recognition","end_date":"2022-06-24","start_date":"2022-06-18"},"publisher":"Institute of Electrical and Electronics Engineers","page":"12256-12266","_id":"12299","scopus_import":"1","title":"How well do sparse ImageNet models transfer?","publication_status":"published","corr_author":"1","date_updated":"2025-04-25T10:32:05Z","acknowledgement":"he authors would like to sincerely thank Christoph Lampert and Nir Shavit for fruitful discussions during the development of this work, and Eldar Kurtic for experimental support. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35, while AP and DA acknowledge generous support by the ERC, via Starting Grant 805223 ScaleML.","type":"conference","publication_identifier":{"eissn":["2575-7075"]},"abstract":[{"lang":"eng","text":"Transfer learning is a classic paradigm by which models pretrained on large “upstream” datasets are adapted to yield good results on “downstream” specialized datasets. Generally, more accurate models on the “upstream” dataset tend to provide better transfer accuracy “downstream”. In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, regrowth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods. The code is available at: https://github.com/IST-DASLab/sparse-imagenet-transfer."}],"doi":"10.1109/cvpr52688.2022.01195"},{"user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","author":[{"last_name":"Postnikova","full_name":"Postnikova, Anastasiia","first_name":"Anastasiia"},{"full_name":"Koval, Nikita","first_name":"Nikita","last_name":"Koval","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"},{"orcid":"0000-0001-5634-0731","last_name":"Nadiradze","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","full_name":"Nadiradze, Giorgi","first_name":"Giorgi"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}],"month":"04","article_processing_charge":"No","day":"02","isi":1,"oa_version":"Preprint","oa":1,"project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"department":[{"_id":"DaAl"}],"citation":{"mla":"Postnikova, Anastasiia, et al. “Multi-Queues Can Be State-of-the-Art Priority Schedulers.” <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</i>, Association for Computing Machinery, 2022, pp. 353–67, doi:<a href=\"https://doi.org/10.1145/3503221.3508432\">10.1145/3503221.3508432</a>.","apa":"Postnikova, A., Koval, N., Nadiradze, G., &#38; Alistarh, D.-A. (2022). Multi-queues can be state-of-the-art priority schedulers. In <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</i> (pp. 353–367). Seoul, Republic of Korea: Association for Computing Machinery. <a href=\"https://doi.org/10.1145/3503221.3508432\">https://doi.org/10.1145/3503221.3508432</a>","chicago":"Postnikova, Anastasiia, Nikita Koval, Giorgi Nadiradze, and Dan-Adrian Alistarh. “Multi-Queues Can Be State-of-the-Art Priority Schedulers.” In <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</i>, 353–67. Association for Computing Machinery, 2022. <a href=\"https://doi.org/10.1145/3503221.3508432\">https://doi.org/10.1145/3503221.3508432</a>.","short":"A. Postnikova, N. Koval, G. Nadiradze, D.-A. Alistarh, in:, Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2022, pp. 353–367.","ieee":"A. Postnikova, N. Koval, G. Nadiradze, and D.-A. Alistarh, “Multi-queues can be state-of-the-art priority schedulers,” in <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</i>, Seoul, Republic of Korea, 2022, pp. 353–367.","ama":"Postnikova A, Koval N, Nadiradze G, Alistarh D-A. Multi-queues can be state-of-the-art priority schedulers. In: <i>Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming</i>. Association for Computing Machinery; 2022:353-367. doi:<a href=\"https://doi.org/10.1145/3503221.3508432\">10.1145/3503221.3508432</a>","ista":"Postnikova A, Koval N, Nadiradze G, Alistarh D-A. 2022. Multi-queues can be state-of-the-art priority schedulers. Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP: Sympopsium on Principles and Practice of Parallel Programming, 353–367."},"date_created":"2022-04-17T22:01:46Z","date_published":"2022-04-02T00:00:00Z","external_id":{"isi":["000883318200025"],"arxiv":["2109.00657"]},"arxiv":1,"quality_controlled":"1","language":[{"iso":"eng"}],"ec_funded":1,"year":"2022","publication":"Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","related_material":{"record":[{"relation":"research_data","id":"13076","status":"public"}]},"status":"public","main_file_link":[{"url":" https://doi.org/10.48550/arXiv.2109.00657","open_access":"1"}],"conference":{"start_date":"2022-04-02","end_date":"2022-04-06","location":"Seoul, Republic of Korea","name":"PPoPP: Sympopsium on Principles and Practice of Parallel Programming"},"publisher":"Association for Computing Machinery","page":"353-367","_id":"11180","title":"Multi-queues can be state-of-the-art priority schedulers","scopus_import":"1","publication_status":"published","corr_author":"1","date_updated":"2025-04-14T07:49:13Z","acknowledgement":"We would like to thank the anonymous reviewers for their useful comments. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","type":"conference","publication_identifier":{"isbn":["9781450392044"]},"abstract":[{"text":"Designing and implementing efficient parallel priority schedulers is an active research area. An intriguing proposed design is the Multi-Queue: given n threads and m ≥ n distinct priority queues, task insertions are performed uniformly at random, while, to delete, a thread picks two queues uniformly at random, and removes the observed task of higher priority. This approach scales well, and has probabilistic rank guarantees: roughly, the rank of each task removed, relative to remaining tasks in all other queues, is O (m) in expectation. Yet, the performance of this pattern is below that of well-engineered schedulers, which eschew theoretical guarantees for practical efficiency.\r\n\r\nWe investigate whether it is possible to design and implement a Multi-Queue-based task scheduler that is both highly-efficient and has analytical guarantees. We propose a new variant called the Stealing Multi-Queue (SMQ), a cache-efficient variant of the Multi-Queue, which leverages both queue affinity---each thread has a local queue, from which tasks are usually removed; but, with some probability, threads also attempt to steal higher-priority tasks from the other queues---and task batching, that is, the processing of several tasks in a single insert / remove step. These ideas are well-known for task scheduling without priorities; our theoretical contribution is showing that, despite relaxations, this design can still provide rank guarantees, which in turn implies bounds on total work performed. We provide a general SMQ implementation which can surpass state-of-the-art schedulers such as OBIM and PMOD in terms of performance on popular graph-processing benchmarks. Notably, the performance improvement comes mainly from the superior rank guarantees provided by our scheduler, confirming that analytically-reasoned approaches can still provide performance improvements for priority task scheduling.","lang":"eng"}],"doi":"10.1145/3503221.3508432"},{"abstract":[{"text":"Subgraph detection has recently been one of the most studied problems in the CONGEST model of distributed computing. In this work, we study the distributed complexity of problems closely related to subgraph detection, mainly focusing on induced subgraph detection. The main line of this work presents lower bounds and parameterized algorithms w.r.t structural parameters of the input graph:\r\n- On general graphs, we give unconditional lower bounds for induced detection of cycles and patterns of treewidth 2 in CONGEST. Moreover, by adapting reductions from centralized parameterized complexity, we prove lower bounds in CONGEST for detecting patterns with a 4-clique, and for induced path detection conditional on the hardness of triangle detection in the congested clique.\r\n- On graphs of bounded degeneracy, we show that induced paths can be detected fast in CONGEST using techniques from parameterized algorithms, while detecting cycles and patterns of treewidth 2 is hard.\r\n- On graphs of bounded vertex cover number, we show that induced subgraph detection is easy in CONGEST for any pattern graph. More specifically, we adapt a centralized parameterized algorithm for a more general maximum common induced subgraph detection problem to the distributed setting. In addition to these induced subgraph detection results, we study various related problems in the CONGEST and congested clique models, including for multicolored versions of subgraph-detection-like problems.","lang":"eng"}],"doi":"10.4230/LIPIcs.OPODIS.2021.15","corr_author":"1","publication_status":"published","date_updated":"2025-04-14T07:49:13Z","volume":217,"type":"conference","alternative_title":["LIPIcs"],"acknowledgement":"Amir Nikabadi: Supported by the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). Janne H. Korhonen: Supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).\r\nWe thank François Le Gall and Masayuki Miyamoto for sharing their work on lower bounds for induced subgraph detection [36].","publication_identifier":{"issn":["1868-8969"],"isbn":["9783959772198"]},"file_date_updated":"2022-05-02T07:53:00Z","_id":"11183","conference":{"name":"OPODIS","location":"Strasbourg, France","end_date":"2021-12-15","start_date":"2021-12-13"},"publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","title":"Beyond distributed subgraph detection: Induced subgraphs, multicolored problems and graph parameters","scopus_import":"1","status":"public","language":[{"iso":"eng"}],"quality_controlled":"1","ec_funded":1,"year":"2022","publication":"25th International Conference on Principles of Distributed Systems","intvolume":"       217","ddc":["510"],"date_published":"2022-02-01T00:00:00Z","date_created":"2022-04-17T22:01:47Z","citation":{"chicago":"Nikabadi, Amir, and Janne Korhonen. “Beyond Distributed Subgraph Detection: Induced Subgraphs, Multicolored Problems and Graph Parameters.” In <i>25th International Conference on Principles of Distributed Systems</i>, edited by Quentin Bramas, Vincent Gramoli, and Alessia Milani, Vol. 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. <a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.15\">https://doi.org/10.4230/LIPIcs.OPODIS.2021.15</a>.","short":"A. Nikabadi, J. Korhonen, in:, Q. Bramas, V. Gramoli, A. Milani (Eds.), 25th International Conference on Principles of Distributed Systems, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.","mla":"Nikabadi, Amir, and Janne Korhonen. “Beyond Distributed Subgraph Detection: Induced Subgraphs, Multicolored Problems and Graph Parameters.” <i>25th International Conference on Principles of Distributed Systems</i>, edited by Quentin Bramas et al., vol. 217, 15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022, doi:<a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.15\">10.4230/LIPIcs.OPODIS.2021.15</a>.","apa":"Nikabadi, A., &#38; Korhonen, J. (2022). Beyond distributed subgraph detection: Induced subgraphs, multicolored problems and graph parameters. In Q. Bramas, V. Gramoli, &#38; A. Milani (Eds.), <i>25th International Conference on Principles of Distributed Systems</i> (Vol. 217). Strasbourg, France: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. <a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.15\">https://doi.org/10.4230/LIPIcs.OPODIS.2021.15</a>","ama":"Nikabadi A, Korhonen J. Beyond distributed subgraph detection: Induced subgraphs, multicolored problems and graph parameters. In: Bramas Q, Gramoli V, Milani A, eds. <i>25th International Conference on Principles of Distributed Systems</i>. Vol 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2022. doi:<a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.15\">10.4230/LIPIcs.OPODIS.2021.15</a>","ista":"Nikabadi A, Korhonen J. 2022. Beyond distributed subgraph detection: Induced subgraphs, multicolored problems and graph parameters. 25th International Conference on Principles of Distributed Systems. OPODIS, LIPIcs, vol. 217, 15.","ieee":"A. Nikabadi and J. Korhonen, “Beyond distributed subgraph detection: Induced subgraphs, multicolored problems and graph parameters,” in <i>25th International Conference on Principles of Distributed Systems</i>, Strasbourg, France, 2022, vol. 217."},"has_accepted_license":"1","editor":[{"last_name":"Bramas","full_name":"Bramas, Quentin","first_name":"Quentin"},{"last_name":"Gramoli","first_name":"Vincent","full_name":"Gramoli, Vincent"},{"last_name":"Milani","full_name":"Milani, Alessia","first_name":"Alessia"}],"oa_version":"Published Version","project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"oa":1,"article_number":"15","department":[{"_id":"DaAl"}],"file":[{"file_id":"11345","checksum":"626551c14de5d4091573200ed0535752","creator":"dernst","relation":"main_file","file_size":790396,"content_type":"application/pdf","success":1,"access_level":"open_access","date_created":"2022-05-02T07:53:00Z","file_name":"2022_LIPICs_Nikabadi.pdf","date_updated":"2022-05-02T07:53:00Z"}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png"},"author":[{"last_name":"Nikabadi","full_name":"Nikabadi, Amir","first_name":"Amir"},{"first_name":"Janne","full_name":"Korhonen, Janne","id":"C5402D42-15BC-11E9-A202-CA2BE6697425","last_name":"Korhonen"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"02","day":"01","article_processing_charge":"No"},{"publication_identifier":{"issn":["1868-8969"],"isbn":["9783959772198"]},"file_date_updated":"2022-05-02T08:06:33Z","type":"conference","alternative_title":["LIPIcs"],"acknowledgement":"Dan Alistarh: This project has received funding from the European Research Council (ERC)\r\nunder the European Union’s Horizon 2020 research and innovation programme (grant agreement No.805223 ScaleML).\r\nJoel Rybicki: This project has received from the European Union’s Horizon 2020 research and\r\ninnovation programme under the Marie Skłodowska-Curie grant agreement No. 840605.\r\nAcknowledgements We grateful to Giorgi Nadiradze for pointing out a generalisation of the phase clock construction to non-regular graphs. We also thank anonymous reviewers for their useful comments on earlier versions of this manuscript.","date_updated":"2025-04-14T07:49:13Z","volume":217,"corr_author":"1","publication_status":"published","doi":"10.4230/LIPIcs.OPODIS.2021.14","abstract":[{"lang":"eng","text":"Let G be a graph on n nodes. In the stochastic population protocol model, a collection of n indistinguishable, resource-limited nodes collectively solve tasks via pairwise interactions. In each interaction, two randomly chosen neighbors first read each other’s states, and then update their local states. A rich line of research has established tight upper and lower bounds on the complexity of fundamental tasks, such as majority and leader election, in this model, when G is a clique. Specifically, in the clique, these tasks can be solved fast, i.e., in n polylog n pairwise interactions, with high probability, using at most polylog n states per node.\r\nIn this work, we consider the more general setting where G is an arbitrary regular graph, and present a technique for simulating protocols designed for fully-connected networks in any connected regular graph. Our main result is a simulation that is efficient on many interesting graph families: roughly, the simulation overhead is polylogarithmic in the number of nodes, and quadratic in the conductance of the graph. As a sample application, we show that, in any regular graph with conductance φ, both leader election and exact majority can be solved in φ^{-2} ⋅ n polylog n pairwise interactions, with high probability, using at most φ^{-2} ⋅ polylog n states per node. This shows that there are fast and space-efficient population protocols for leader election and exact majority on graphs with good expansion properties. We believe our results will prove generally useful, as they allow efficient technology transfer between the well-mixed (clique) case, and the under-explored spatial setting."}],"status":"public","scopus_import":"1","title":"Fast graphical population protocols","_id":"11184","conference":{"end_date":"2021-12-15","start_date":"2021-12-13","name":"OPODIS","location":"Strasbourg, France"},"publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","editor":[{"first_name":"Quentin","full_name":"Bramas, Quentin","last_name":"Bramas"},{"last_name":"Gramoli","full_name":"Gramoli, Vincent","first_name":"Vincent"},{"first_name":"Alessia","full_name":"Milani, Alessia","last_name":"Milani"}],"arxiv":1,"external_id":{"arxiv":["2102.08808"]},"ddc":["510"],"date_published":"2022-02-01T00:00:00Z","date_created":"2022-04-17T22:01:47Z","has_accepted_license":"1","citation":{"chicago":"Alistarh, Dan-Adrian, Rati Gelashvili, and Joel Rybicki. “Fast Graphical Population Protocols.” In <i>25th International Conference on Principles of Distributed Systems</i>, edited by Quentin Bramas, Vincent Gramoli, and Alessia Milani, Vol. 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. <a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.14\">https://doi.org/10.4230/LIPIcs.OPODIS.2021.14</a>.","short":"D.-A. Alistarh, R. Gelashvili, J. Rybicki, in:, Q. Bramas, V. Gramoli, A. Milani (Eds.), 25th International Conference on Principles of Distributed Systems, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.","mla":"Alistarh, Dan-Adrian, et al. “Fast Graphical Population Protocols.” <i>25th International Conference on Principles of Distributed Systems</i>, edited by Quentin Bramas et al., vol. 217, 14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022, doi:<a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.14\">10.4230/LIPIcs.OPODIS.2021.14</a>.","apa":"Alistarh, D.-A., Gelashvili, R., &#38; Rybicki, J. (2022). Fast graphical population protocols. In Q. Bramas, V. Gramoli, &#38; A. Milani (Eds.), <i>25th International Conference on Principles of Distributed Systems</i> (Vol. 217). Strasbourg, France: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. <a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.14\">https://doi.org/10.4230/LIPIcs.OPODIS.2021.14</a>","ama":"Alistarh D-A, Gelashvili R, Rybicki J. Fast graphical population protocols. In: Bramas Q, Gramoli V, Milani A, eds. <i>25th International Conference on Principles of Distributed Systems</i>. Vol 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2022. doi:<a href=\"https://doi.org/10.4230/LIPIcs.OPODIS.2021.14\">10.4230/LIPIcs.OPODIS.2021.14</a>","ista":"Alistarh D-A, Gelashvili R, Rybicki J. 2022. Fast graphical population protocols. 25th International Conference on Principles of Distributed Systems. OPODIS, LIPIcs, vol. 217, 14.","ieee":"D.-A. Alistarh, R. Gelashvili, and J. Rybicki, “Fast graphical population protocols,” in <i>25th International Conference on Principles of Distributed Systems</i>, Strasbourg, France, 2022, vol. 217."},"intvolume":"       217","publication":"25th International Conference on Principles of Distributed Systems","year":"2022","ec_funded":1,"language":[{"iso":"eng"}],"quality_controlled":"1","article_processing_charge":"No","day":"01","month":"02","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png"},"author":[{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","orcid":"0000-0003-3650-940X"},{"last_name":"Gelashvili","full_name":"Gelashvili, Rati","first_name":"Rati"},{"first_name":"Joel","full_name":"Rybicki, Joel","orcid":"0000-0002-6432-6646","last_name":"Rybicki","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","file":[{"relation":"main_file","checksum":"2c7c982174c6f98c4ca6e92539d15086","creator":"dernst","file_id":"11346","date_created":"2022-05-02T08:06:33Z","file_name":"2022_LIPICs_Alistarh.pdf","date_updated":"2022-05-02T08:06:33Z","content_type":"application/pdf","access_level":"open_access","success":1,"file_size":959406}],"department":[{"_id":"DaAl"}],"oa":1,"project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"},{"call_identifier":"H2020","name":"Coordination in constrained and natural distributed systems","grant_number":"840605","_id":"26A5D39A-B435-11E9-9278-68D0E5697425"}],"article_number":"14","oa_version":"Published Version"},{"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png"},"author":[{"last_name":"Frantar","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","full_name":"Frantar, Elias","first_name":"Elias"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"month":"07","day":"20","article_processing_charge":"Yes","isi":1,"oa_version":"Published Version","oa":1,"project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"department":[{"_id":"DaAl"}],"file":[{"relation":"main_file","file_id":"17440","checksum":"5179a1e4dfc0fbfab6674907299e414a","creator":"dernst","content_type":"application/pdf","access_level":"open_access","success":1,"date_created":"2024-08-19T06:54:41Z","file_name":"2022_PMLR_Frantar.pdf","date_updated":"2024-08-19T06:54:41Z","file_size":615916}],"intvolume":"       162","date_created":"2024-05-28T13:45:20Z","citation":{"ama":"Frantar E, Alistarh D-A. SPDY: Accurate pruning with speedup guarantees. In: <i>39th International Conference on Machine Learning</i>. Vol 162. ML Research Press; 2022:6726-6743.","ista":"Frantar E, Alistarh D-A. 2022. SPDY: Accurate pruning with speedup guarantees. 39th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 162, 6726–6743.","ieee":"E. Frantar and D.-A. Alistarh, “SPDY: Accurate pruning with speedup guarantees,” in <i>39th International Conference on Machine Learning</i>, Baltimore, MD, United States, 2022, vol. 162, pp. 6726–6743.","chicago":"Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” In <i>39th International Conference on Machine Learning</i>, 162:6726–43. ML Research Press, 2022.","short":"E. Frantar, D.-A. Alistarh, in:, 39th International Conference on Machine Learning, ML Research Press, 2022, pp. 6726–6743.","mla":"Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” <i>39th International Conference on Machine Learning</i>, vol. 162, ML Research Press, 2022, pp. 6726–43.","apa":"Frantar, E., &#38; Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup guarantees. In <i>39th International Conference on Machine Learning</i> (Vol. 162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press."},"has_accepted_license":"1","date_published":"2022-07-20T00:00:00Z","ddc":["000"],"external_id":{"isi":["000922378801029"]},"quality_controlled":"1","language":[{"iso":"eng"}],"ec_funded":1,"year":"2022","publication":"39th International Conference on Machine Learning","status":"public","conference":{"name":"ICML: International Conference on Machine Learning","location":"Baltimore, MD, United States","start_date":"2022-07-17","end_date":"2022-07-23"},"publisher":"ML Research Press","_id":"17059","page":"6726-6743","title":"SPDY: Accurate pruning with speedup guarantees","scopus_import":"1","publication_status":"published","corr_author":"1","volume":162,"date_updated":"2025-04-14T07:49:14Z","acknowledgement":"We gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No 805223 ScaleML),\r\nas well as computational support from AWS EC2. We thank Eldar Kurtic for code and hyper-parameters for BERT pruning, and the Neural Magic Team, notably Michael Goin and\r\nMark Kurtz, for support with their software.","alternative_title":["PMLR"],"type":"conference","file_date_updated":"2024-08-19T06:54:41Z","abstract":[{"lang":"eng","text":"The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational support for efficiently executing the unstructured-sparse models obtained via pruning. Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. SPDY is the composition of two new techniques. The first is an efficient and general dynamic programming algorithm for solving constrained layer-wise compression problems, given a set of layer-wise error scores. The second technique is a local search procedure for automatically determining such scores in an accurate and robust manner. Experiments across popular vision and language models show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches. We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern."}]}]
