[{"date_published":"2024-01-08T00:00:00Z","page":"542-553","citation":{"ama":"Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In: Proceedings of Machine Learning Research. Vol 234. ML Research Press; 2024:542-553.","ista":"Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.","ieee":"E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in Proceedings of Machine Learning Research, Hongkong, China, 2024, vol. 234, pp. 542–553.","apa":"Kurtic, E., Hoefler, T., & Alistarh, D.-A. (2024). How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In Proceedings of Machine Learning Research (Vol. 234, pp. 542–553). Hongkong, China: ML Research Press.","mla":"Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” Proceedings of Machine Learning Research, vol. 234, ML Research Press, 2024, pp. 542–53.","short":"E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning Research, ML Research Press, 2024, pp. 542–553.","chicago":"Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” In Proceedings of Machine Learning Research, 234:542–53. ML Research Press, 2024."},"publication":"Proceedings of Machine Learning Research","article_processing_charge":"No","day":"08","scopus_import":"1","oa_version":"Preprint","intvolume":" 234","title":"How to prune your language model: Recovering accuracy on the \"Sparsity May Cry\" benchmark","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"15011","abstract":[{"text":"Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach.","lang":"eng"}],"alternative_title":["PMLR"],"type":"conference","language":[{"iso":"eng"}],"conference":{"name":"CPAL: Conference on Parsimony and Learning","start_date":"2024-01-03","location":"Hongkong, China","end_date":"2024-01-06"},"quality_controlled":"1","oa":1,"main_file_link":[{"open_access":"1","url":"https://proceedings.mlr.press/v234/kurtic24a"}],"external_id":{"arxiv":["2312.13547"]},"publication_identifier":{"eissn":["2640-3498"]},"month":"01","volume":234,"date_updated":"2024-02-26T10:30:52Z","date_created":"2024-02-18T23:01:03Z","author":[{"id":"47beb3a5-07b5-11eb-9b87-b108ec578218","last_name":"Kurtic","first_name":"Eldar","full_name":"Kurtic, Eldar"},{"first_name":"Torsten","last_name":"Hoefler","full_name":"Hoefler, Torsten"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"}],"department":[{"_id":"DaAl"}],"publisher":"ML Research Press","publication_status":"published","year":"2024"},{"language":[{"iso":"eng"}],"date_published":"2023-02-25T00:00:00Z","doi":"10.1145/3572848.3577481","conference":{"end_date":"2023-03-01","location":"Montreal, QC, Canada","start_date":"2023-02-25","name":"PPoPP: Sympopsium on Principles and Practice of Parallel Programming"},"page":"107-118","quality_controlled":"1","citation":{"chicago":"Koval, Nikita, Dan-Adrian Alistarh, and Roman Elizarov. “Fast and Scalable Channels in Kotlin Coroutines.” In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 107–18. Association for Computing Machinery, 2023. https://doi.org/10.1145/3572848.3577481.","mla":"Koval, Nikita, et al. “Fast and Scalable Channels in Kotlin Coroutines.” Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2023, pp. 107–18, doi:10.1145/3572848.3577481.","short":"N. Koval, D.-A. Alistarh, R. Elizarov, in:, Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2023, pp. 107–118.","ista":"Koval N, Alistarh D-A, Elizarov R. 2023. Fast and scalable channels in Kotlin Coroutines. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP: Sympopsium on Principles and Practice of Parallel Programming, 107–118.","ieee":"N. Koval, D.-A. Alistarh, and R. Elizarov, “Fast and scalable channels in Kotlin Coroutines,” in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Montreal, QC, Canada, 2023, pp. 107–118.","apa":"Koval, N., Alistarh, D.-A., & Elizarov, R. (2023). Fast and scalable channels in Kotlin Coroutines. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 107–118). Montreal, QC, Canada: Association for Computing Machinery. https://doi.org/10.1145/3572848.3577481","ama":"Koval N, Alistarh D-A, Elizarov R. Fast and scalable channels in Kotlin Coroutines. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2023:107-118. doi:10.1145/3572848.3577481"},"external_id":{"arxiv":["2211.04986"]},"oa":1,"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2211.04986"}],"publication":"Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","publication_identifier":{"isbn":["9798400700156"]},"article_processing_charge":"No","month":"02","day":"25","scopus_import":"1","oa_version":"Preprint","date_updated":"2023-03-20T07:29:28Z","date_created":"2023-03-19T23:00:58Z","author":[{"full_name":"Koval, Nikita","last_name":"Koval","first_name":"Nikita","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Elizarov, Roman","first_name":"Roman","last_name":"Elizarov"}],"publisher":"Association for Computing Machinery","department":[{"_id":"DaAl"}],"publication_status":"published","status":"public","title":"Fast and scalable channels in Kotlin Coroutines","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"12735","year":"2023","abstract":[{"lang":"eng","text":"Asynchronous programming has gained significant popularity over the last decade: support for this programming pattern is available in many popular languages via libraries and native language implementations, typically in the form of coroutines or the async/await construct. Instead of programming via shared memory, this concept assumes implicit synchronization through message passing. The key data structure enabling such communication is the rendezvous channel. Roughly, a rendezvous channel is a blocking queue of size zero, so both send(e) and receive() operations wait for each other, performing a rendezvous when they meet. To optimize the message passing pattern, channels are usually equipped with a fixed-size buffer, so sends do not suspend and put elements into the buffer until its capacity is exceeded. This primitive is known as a buffered channel.\r\n\r\nThis paper presents a fast and scalable algorithm for both rendezvous and buffered channels. Similarly to modern queues, our solution is based on an infinite array with two positional counters for send(e) and receive() operations, leveraging the unconditional Fetch-And-Add instruction to update them. Yet, the algorithm requires non-trivial modifications of this classic pattern, in order to support the full channel semantics, such as buffering and cancellation of waiting requests. We compare the performance of our solution to that of the Kotlin implementation, as well as against other academic proposals, showing up to 9.8× speedup. To showcase its expressiveness and performance, we also integrated the proposed algorithm into the standard Kotlin Coroutines library, replacing the previous channel implementations."}],"type":"conference"},{"type":"conference","abstract":[{"lang":"eng","text":"Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90% with reasonable (∼1%) accuracy loss, which is competitive with gradual compression methods. Additionally, CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware. The code for reproducing the results is available at this https URL ."}],"ec_funded":1,"_id":"13053","year":"2023","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","acknowledgement":"AP, EK, DA received funding from the European Research Council (ERC) under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further acknowledge the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp)-","title":"CrAM: A Compression-Aware Minimizer","status":"public","publication_status":"accepted","department":[{"_id":"GradSch"},{"_id":"DaAl"},{"_id":"ChLa"}],"author":[{"last_name":"Peste","first_name":"Elena-Alexandra","id":"32D78294-F248-11E8-B48F-1D18A9856A87","full_name":"Peste, Elena-Alexandra"},{"full_name":"Vladu, Adrian","last_name":"Vladu","first_name":"Adrian"},{"full_name":"Kurtic, Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218","last_name":"Kurtic","first_name":"Eldar"},{"full_name":"Lampert, Christoph","last_name":"Lampert","first_name":"Christoph","orcid":"0000-0001-8622-7887","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"related_material":{"record":[{"id":"13074","relation":"dissertation_contains","status":"public"}]},"date_created":"2023-05-23T11:36:18Z","date_updated":"2023-06-01T12:54:45Z","oa_version":"Preprint","month":"05","article_processing_charge":"No","publication":"11th International Conference on Learning Representations ","oa":1,"main_file_link":[{"url":"https://openreview.net/pdf?id=_eTZBs-yedr","open_access":"1"}],"external_id":{"arxiv":["2207.14200"]},"citation":{"mla":"Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th International Conference on Learning Representations .","short":"E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International Conference on Learning Representations , n.d.","chicago":"Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert, and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International Conference on Learning Representations , n.d.","ama":"Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. In: 11th International Conference on Learning Representations .","ista":"Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations.","apa":"Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.). CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning Representations . Kigali, Rwanda .","ieee":"E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM: A Compression-Aware Minimizer,” in 11th International Conference on Learning Representations , Kigali, Rwanda ."},"quality_controlled":"1","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"conference":{"location":"Kigali, Rwanda ","start_date":"2023-05-01","end_date":"2023-05-05","name":"ICLR: International Conference on Learning Representations"},"date_published":"2023-05-01T00:00:00Z","acknowledged_ssus":[{"_id":"ScienComp"}],"language":[{"iso":"eng"}]},{"year":"2023","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery ","publication_status":"published","author":[{"full_name":"Koval, Nikita","last_name":"Koval","first_name":"Nikita","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"},{"first_name":"Dmitry","last_name":"Khalanskiy","full_name":"Khalanskiy, Dmitry"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"}],"volume":7,"date_created":"2023-07-02T22:00:43Z","date_updated":"2023-07-17T08:43:19Z","article_number":"116","file_date_updated":"2023-07-03T13:09:39Z","tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"oa":1,"quality_controlled":"1","doi":"10.1145/3591230","language":[{"iso":"eng"}],"publication_identifier":{"eissn":["2475-1421"]},"month":"06","_id":"13179","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","intvolume":" 7","ddc":["000"],"title":"CQS: A formally-verified framework for fair and abortable synchronization","status":"public","file":[{"relation":"main_file","file_id":"13187","checksum":"5dba6e73f0ed79adbdae14d165bc2f68","success":1,"date_created":"2023-07-03T13:09:39Z","date_updated":"2023-07-03T13:09:39Z","access_level":"open_access","file_name":"2023_ACMProgram.Lang._Koval.pdf","file_size":1266773,"content_type":"application/pdf","creator":"alisjak"}],"oa_version":"Published Version","type":"journal_article","abstract":[{"text":"Writing concurrent code that is both correct and efficient is notoriously difficult. Thus, programmers often prefer to use synchronization abstractions, which render code simpler and easier to reason about. Despite a wealth of work on this topic, there is still a gap between the rich semantics provided by synchronization abstractions in modern programming languages—specifically, fair FIFO ordering of synchronization requests and support for abortable operations—and frameworks for implementing it correctly and efficiently. Supporting such semantics is critical given the rising popularity of constructs for asynchronous programming, such as coroutines, which abort frequently and are cheaper to suspend and resume compared to native threads.\r\n\r\nThis paper introduces a new framework called CancellableQueueSynchronizer (CQS), which enables simple yet efficient implementations of a wide range of fair and abortable synchronization primitives: mutexes, semaphores, barriers, count-down latches, and blocking pools. Our main contribution is algorithmic, as implementing both fairness and abortability efficiently at this level of generality is non-trivial. Importantly, all our algorithms, including the CQS framework and the primitives built on top of it, come with formal proofs in the Iris framework for Coq for many of their properties. These proofs are modular, so it is easy to show correctness for new primitives implemented on top of CQS. From a practical perspective, implementation of CQS for native threads on the JVM improves throughput by up to two orders of magnitude over Java’s AbstractQueuedSynchronizer, the only practical abstraction offering similar semantics. Further, we successfully integrated CQS as a core component of the popular Kotlin Coroutines library, validating the framework’s practical impact and expressiveness in a real-world environment. In sum, CancellableQueueSynchronizer is the first framework to combine expressiveness with formal guarantees and solid practical performance. Our approach should be extensible to other languages and families of synchronization primitives.","lang":"eng"}],"citation":{"ama":"Koval N, Khalanskiy D, Alistarh D-A. CQS: A formally-verified framework for fair and abortable synchronization. Proceedings of the ACM on Programming Languages. 2023;7. doi:10.1145/3591230","ista":"Koval N, Khalanskiy D, Alistarh D-A. 2023. CQS: A formally-verified framework for fair and abortable synchronization. Proceedings of the ACM on Programming Languages. 7, 116.","apa":"Koval, N., Khalanskiy, D., & Alistarh, D.-A. (2023). CQS: A formally-verified framework for fair and abortable synchronization. Proceedings of the ACM on Programming Languages. Association for Computing Machinery . https://doi.org/10.1145/3591230","ieee":"N. Koval, D. Khalanskiy, and D.-A. Alistarh, “CQS: A formally-verified framework for fair and abortable synchronization,” Proceedings of the ACM on Programming Languages, vol. 7. Association for Computing Machinery , 2023.","mla":"Koval, Nikita, et al. “CQS: A Formally-Verified Framework for Fair and Abortable Synchronization.” Proceedings of the ACM on Programming Languages, vol. 7, 116, Association for Computing Machinery , 2023, doi:10.1145/3591230.","short":"N. Koval, D. Khalanskiy, D.-A. Alistarh, Proceedings of the ACM on Programming Languages 7 (2023).","chicago":"Koval, Nikita, Dmitry Khalanskiy, and Dan-Adrian Alistarh. “CQS: A Formally-Verified Framework for Fair and Abortable Synchronization.” Proceedings of the ACM on Programming Languages. Association for Computing Machinery , 2023. https://doi.org/10.1145/3591230."},"publication":"Proceedings of the ACM on Programming Languages","article_type":"original","date_published":"2023-06-06T00:00:00Z","scopus_import":"1","has_accepted_license":"1","article_processing_charge":"No","day":"06"},{"doi":"10.1145/3558481.3591082","conference":{"end_date":"2023-06-19","start_date":"2023-06-17","location":"Orlando, FL, United States","name":"SPAA: Symposium on Parallelism in Algorithms and Architectures"},"language":[{"iso":"eng"}],"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"external_id":{"arxiv":["2304.09331"]},"oa":1,"quality_controlled":"1","publication_identifier":{"isbn":["9781450395458"]},"month":"06","author":[{"id":"2e711909-896a-11ed-bdf8-eb0f5a2984c6","first_name":"Alexander","last_name":"Fedorov","full_name":"Fedorov, Alexander"},{"full_name":"Hashemi, Diba","first_name":"Diba","last_name":"Hashemi","id":"ed9595ea-2f8f-11ee-ba95-d2b546540783"},{"last_name":"Nadiradze","first_name":"Giorgi","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","full_name":"Nadiradze, Giorgi"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"}],"date_created":"2023-07-23T22:01:12Z","date_updated":"2023-07-31T10:54:32Z","year":"2023","publisher":"Association for Computing Machinery","department":[{"_id":"DaAl"},{"_id":"GradSch"}],"publication_status":"published","file_date_updated":"2023-07-31T10:53:08Z","date_published":"2023-06-17T00:00:00Z","citation":{"ama":"Fedorov A, Hashemi D, Nadiradze G, Alistarh D-A. Provably-efficient and internally-deterministic parallel Union-Find. In: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery; 2023:261-271. doi:10.1145/3558481.3591082","ista":"Fedorov A, Hashemi D, Nadiradze G, Alistarh D-A. 2023. Provably-efficient and internally-deterministic parallel Union-Find. Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures. SPAA: Symposium on Parallelism in Algorithms and Architectures, 261–271.","ieee":"A. Fedorov, D. Hashemi, G. Nadiradze, and D.-A. Alistarh, “Provably-efficient and internally-deterministic parallel Union-Find,” in Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, Orlando, FL, United States, 2023, pp. 261–271.","apa":"Fedorov, A., Hashemi, D., Nadiradze, G., & Alistarh, D.-A. (2023). Provably-efficient and internally-deterministic parallel Union-Find. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 261–271). Orlando, FL, United States: Association for Computing Machinery. https://doi.org/10.1145/3558481.3591082","mla":"Fedorov, Alexander, et al. “Provably-Efficient and Internally-Deterministic Parallel Union-Find.” Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, Association for Computing Machinery, 2023, pp. 261–71, doi:10.1145/3558481.3591082.","short":"A. Fedorov, D. Hashemi, G. Nadiradze, D.-A. Alistarh, in:, Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, Association for Computing Machinery, 2023, pp. 261–271.","chicago":"Fedorov, Alexander, Diba Hashemi, Giorgi Nadiradze, and Dan-Adrian Alistarh. “Provably-Efficient and Internally-Deterministic Parallel Union-Find.” In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 261–71. Association for Computing Machinery, 2023. https://doi.org/10.1145/3558481.3591082."},"publication":"Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures","page":"261-271","has_accepted_license":"1","article_processing_charge":"Yes (in subscription journal)","day":"17","scopus_import":"1","file":[{"relation":"main_file","file_id":"13334","date_created":"2023-07-31T10:53:08Z","date_updated":"2023-07-31T10:53:08Z","checksum":"72e312aabf0c5248c99b5cd3a88e4c88","success":1,"file_name":"2023_SPAA_Fedorov.pdf","access_level":"open_access","content_type":"application/pdf","file_size":2087937,"creator":"dernst"}],"oa_version":"Published Version","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"13262","status":"public","title":"Provably-efficient and internally-deterministic parallel Union-Find","ddc":["000"],"abstract":[{"text":"Determining the degree of inherent parallelism in classical sequential algorithms and leveraging it for fast parallel execution is a key topic in parallel computing, and detailed analyses are known for a wide range of classical algorithms. In this paper, we perform the first such analysis for the fundamental Union-Find problem, in which we are given a graph as a sequence of edges, and must maintain its connectivity structure under edge additions. We prove that classic sequential algorithms for this problem are well-parallelizable under reasonable assumptions, addressing a conjecture by [Blelloch, 2017]. More precisely, we show via a new potential argument that, under uniform random edge ordering, parallel union-find operations are unlikely to interfere: T concurrent threads processing the graph in parallel will encounter memory contention O(T2 · log |V| · log |E|) times in expectation, where |E| and |V| are the number of edges and nodes in the graph, respectively. We leverage this result to design a new parallel Union-Find algorithm that is both internally deterministic, i.e., its results are guaranteed to match those of a sequential execution, but also work-efficient and scalable, as long as the number of threads T is O(|E|1 over 3 - ε), for an arbitrarily small constant ε > 0, which holds for most large real-world graphs. We present lower bounds which show that our analysis is close to optimal, and experimental results suggesting that the performance cost of internal determinism is limited.","lang":"eng"}],"type":"conference"},{"_id":"12566","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","title":"Wait-free approximate agreement on graphs","status":"public","ddc":["000"],"intvolume":" 948","file":[{"file_name":"2023_TheoreticalCompScience_Alistarh.pdf","access_level":"open_access","file_size":602333,"content_type":"application/pdf","creator":"dernst","relation":"main_file","file_id":"12570","date_created":"2023-02-20T07:30:20Z","date_updated":"2023-02-20T07:30:20Z","checksum":"b27c5290f2f1500c403494364ee39c9f","success":1}],"oa_version":"Published Version","type":"journal_article","abstract":[{"text":"Approximate agreement is one of the few variants of consensus that can be solved in a wait-free manner in asynchronous systems where processes communicate by reading and writing to shared memory. In this work, we consider a natural generalisation of approximate agreement on arbitrary undirected connected graphs. Each process is given a node of the graph as input and, if non-faulty, must output a node such that\r\n– all the outputs are within distance 1 of one another, and\r\n– each output value lies on a shortest path between two input values.\r\nFrom prior work, it is known that there is no wait-free algorithm among processes for this problem on any cycle of length , by reduction from 2-set agreement (Castañeda et al., 2018).\r\n\r\nIn this work, we investigate the solvability of this task on general graphs. We give a new, direct proof of the impossibility of approximate agreement on cycles of length , via a generalisation of Sperner's Lemma to convex polygons. We also extend the reduction from 2-set agreement to a larger class of graphs, showing that approximate agreement on these graphs is unsolvable. On the positive side, we present a wait-free algorithm for a different class of graphs, which properly contains the class of chordal graphs.","lang":"eng"}],"issue":"2","publication":"Theoretical Computer Science","citation":{"chicago":"Alistarh, Dan-Adrian, Faith Ellen, and Joel Rybicki. “Wait-Free Approximate Agreement on Graphs.” Theoretical Computer Science. Elsevier, 2023. https://doi.org/10.1016/j.tcs.2023.113733.","short":"D.-A. Alistarh, F. Ellen, J. Rybicki, Theoretical Computer Science 948 (2023).","mla":"Alistarh, Dan-Adrian, et al. “Wait-Free Approximate Agreement on Graphs.” Theoretical Computer Science, vol. 948, no. 2, 113733, Elsevier, 2023, doi:10.1016/j.tcs.2023.113733.","ieee":"D.-A. Alistarh, F. Ellen, and J. Rybicki, “Wait-free approximate agreement on graphs,” Theoretical Computer Science, vol. 948, no. 2. Elsevier, 2023.","apa":"Alistarh, D.-A., Ellen, F., & Rybicki, J. (2023). Wait-free approximate agreement on graphs. Theoretical Computer Science. Elsevier. https://doi.org/10.1016/j.tcs.2023.113733","ista":"Alistarh D-A, Ellen F, Rybicki J. 2023. Wait-free approximate agreement on graphs. Theoretical Computer Science. 948(2), 113733.","ama":"Alistarh D-A, Ellen F, Rybicki J. Wait-free approximate agreement on graphs. Theoretical Computer Science. 2023;948(2). doi:10.1016/j.tcs.2023.113733"},"article_type":"original","date_published":"2023-02-28T00:00:00Z","scopus_import":"1","day":"28","article_processing_charge":"Yes (via OA deal)","has_accepted_license":"1","year":"2023","acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 805223 ScaleML) and under the Marie Skłodowska-Curie grant agreement No. 840605 and from the Natural Sciences and Engineering Research Council of Canada grant RGPIN-2020-04178. Part of this work was done while Faith Ellen was visiting IST Austria.","publication_status":"published","publisher":"Elsevier","department":[{"_id":"DaAl"}],"author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"first_name":"Faith","last_name":"Ellen","full_name":"Ellen, Faith"},{"full_name":"Rybicki, Joel","last_name":"Rybicki","first_name":"Joel","orcid":"0000-0002-6432-6646","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87"}],"date_updated":"2023-08-01T13:17:20Z","date_created":"2023-02-19T23:00:55Z","volume":948,"article_number":"113733","file_date_updated":"2023-02-20T07:30:20Z","ec_funded":1,"oa":1,"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"external_id":{"isi":["000934262700001"]},"isi":1,"quality_controlled":"1","project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"},{"grant_number":"840605","_id":"26A5D39A-B435-11E9-9278-68D0E5697425","name":"Coordination in constrained and natural distributed systems","call_identifier":"H2020"}],"doi":"10.1016/j.tcs.2023.113733","language":[{"iso":"eng"}],"month":"02","publication_identifier":{"issn":["0304-3975"]}},{"author":[{"full_name":"Aksenov, Vitalii","id":"2980135A-F248-11E8-B48F-1D18A9856A87","last_name":"Aksenov","first_name":"Vitalii"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Alexandra","last_name":"Drozdova","full_name":"Drozdova, Alexandra"},{"first_name":"Amirkeivan","last_name":"Mohtashami","full_name":"Mohtashami, Amirkeivan"}],"volume":36,"date_created":"2023-01-22T23:00:55Z","date_updated":"2023-08-14T12:54:32Z","year":"2023","department":[{"_id":"DaAl"}],"publisher":"Springer Nature","publication_status":"published","publication_identifier":{"issn":["0178-2770"],"eissn":["1432-0452"]},"month":"09","doi":"10.1007/s00446-022-00441-x","language":[{"iso":"eng"}],"oa":1,"external_id":{"arxiv":["2008.01009"],"isi":["000913424000001"]},"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2008.01009"}],"quality_controlled":"1","isi":1,"abstract":[{"lang":"eng","text":"The design and implementation of efficient concurrent data structures has seen significant attention. However, most of this work has focused on concurrent data structures providing good worst-case guarantees, although, in real workloads, objects are often accessed at different rates. Efficient distribution-adaptive data structures, such as splay-trees, are known in the sequential case; however, they often are hard to translate efficiently to the concurrent case. We investigate distribution-adaptive concurrent data structures, and propose a new design called the splay-list. At a high level, the splay-list is similar to a standard skip-list, with the key distinction that the height of each element adapts dynamically to its access rate: popular elements “move up,” whereas rarely-accessed elements decrease in height. We show that the splay-list provides order-optimal amortized complexity bounds for a subset of operations, while being amenable to efficient concurrent implementation. Experiments show that the splay-list can leverage distribution-adaptivity for performance, and can outperform the only previously-known distribution-adaptive concurrent design in certain workloads."}],"type":"journal_article","oa_version":"Preprint","_id":"12330","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","intvolume":" 36","title":"The splay-list: A distribution-adaptive concurrent skip-list","status":"public","article_processing_charge":"No","day":"01","scopus_import":"1","date_published":"2023-09-01T00:00:00Z","citation":{"chicago":"Aksenov, Vitalii, Dan-Adrian Alistarh, Alexandra Drozdova, and Amirkeivan Mohtashami. “The Splay-List: A Distribution-Adaptive Concurrent Skip-List.” Distributed Computing. Springer Nature, 2023. https://doi.org/10.1007/s00446-022-00441-x.","short":"V. Aksenov, D.-A. Alistarh, A. Drozdova, A. Mohtashami, Distributed Computing 36 (2023) 395–418.","mla":"Aksenov, Vitalii, et al. “The Splay-List: A Distribution-Adaptive Concurrent Skip-List.” Distributed Computing, vol. 36, Springer Nature, 2023, pp. 395–418, doi:10.1007/s00446-022-00441-x.","ieee":"V. Aksenov, D.-A. Alistarh, A. Drozdova, and A. Mohtashami, “The splay-list: A distribution-adaptive concurrent skip-list,” Distributed Computing, vol. 36. Springer Nature, pp. 395–418, 2023.","apa":"Aksenov, V., Alistarh, D.-A., Drozdova, A., & Mohtashami, A. (2023). The splay-list: A distribution-adaptive concurrent skip-list. Distributed Computing. Springer Nature. https://doi.org/10.1007/s00446-022-00441-x","ista":"Aksenov V, Alistarh D-A, Drozdova A, Mohtashami A. 2023. The splay-list: A distribution-adaptive concurrent skip-list. Distributed Computing. 36, 395–418.","ama":"Aksenov V, Alistarh D-A, Drozdova A, Mohtashami A. The splay-list: A distribution-adaptive concurrent skip-list. Distributed Computing. 2023;36:395-418. doi:10.1007/s00446-022-00441-x"},"publication":"Distributed Computing","page":"395-418","article_type":"original"},{"article_processing_charge":"No","day":"30","scopus_import":"1","date_published":"2023-07-30T00:00:00Z","citation":{"ista":"Markov I, Vladu A, Guo Q, Alistarh D-A. 2023. Quantized distributed training of large models with convergence guarantees. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 24020–24044.","apa":"Markov, I., Vladu, A., Guo, Q., & Alistarh, D.-A. (2023). Quantized distributed training of large models with convergence guarantees. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 24020–24044). Honolulu, Hawaii, HI, United States: ML Research Press.","ieee":"I. Markov, A. Vladu, Q. Guo, and D.-A. Alistarh, “Quantized distributed training of large models with convergence guarantees,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 24020–24044.","ama":"Markov I, Vladu A, Guo Q, Alistarh D-A. Quantized distributed training of large models with convergence guarantees. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:24020-24044.","chicago":"Markov, Ilia, Adrian Vladu, Qi Guo, and Dan-Adrian Alistarh. “Quantized Distributed Training of Large Models with Convergence Guarantees.” In Proceedings of the 40th International Conference on Machine Learning, 202:24020–44. ML Research Press, 2023.","mla":"Markov, Ilia, et al. “Quantized Distributed Training of Large Models with Convergence Guarantees.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 24020–44.","short":"I. Markov, A. Vladu, Q. Guo, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 24020–24044."},"publication":"Proceedings of the 40th International Conference on Machine Learning","page":"24020-24044","abstract":[{"lang":"eng","text":"Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is that applying compression techniques to FSDP is challenging: as the vast majority of the communication involves the model’s weights, direct compression alters convergence and leads to accuracy loss. We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. To derive QSDP we prove that a natural modification of SGD achieves convergence even when we only maintain quantized weights, and thus the domain over which we train consists of quantized points and is, therefore, highly non-convex. We validate this approach by training GPT-family models with up to 1.3 billion parameters on a multi-node cluster. Experiments show that QSDP preserves model accuracy, while completely removing the communication bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x."}],"type":"conference","alternative_title":["PMLR"],"oa_version":"Preprint","_id":"14461","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","intvolume":" 202","status":"public","title":"Quantized distributed training of large models with convergence guarantees","publication_identifier":{"eissn":["2640-3498"]},"month":"07","conference":{"start_date":"2023-07-23","location":"Honolulu, Hawaii, HI, United States","end_date":"2023-07-29","name":"ICML: International Conference on Machine Learning"},"language":[{"iso":"eng"}],"acknowledged_ssus":[{"_id":"ScienComp"}],"oa":1,"external_id":{"arxiv":["2302.02390"]},"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2302.02390"}],"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"quality_controlled":"1","ec_funded":1,"author":[{"first_name":"Ilia","last_name":"Markov","id":"D0CF4148-C985-11E9-8066-0BDEE5697425","full_name":"Markov, Ilia"},{"full_name":"Vladu, Adrian","last_name":"Vladu","first_name":"Adrian"},{"full_name":"Guo, Qi","first_name":"Qi","last_name":"Guo"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"volume":202,"date_updated":"2023-10-31T09:40:45Z","date_created":"2023-10-29T23:01:17Z","year":"2023","acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), as well as experimental support from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl. AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT), the support of Fondation Hadamard with a PRMO grant, and the support of CNRS with a CoopIntEER IEA grant (project ALFRED).","department":[{"_id":"DaAl"}],"publisher":"ML Research Press","publication_status":"published"},{"oa_version":"Preprint","intvolume":" 202","status":"public","title":"SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge","_id":"14460","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","abstract":[{"text":"We provide an efficient implementation of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware.","lang":"eng"}],"alternative_title":["PMLR"],"type":"conference","date_published":"2023-07-30T00:00:00Z","page":"26215-26227","citation":{"short":"M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 26215–26227.","mla":"Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 26215–27.","chicago":"Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” In Proceedings of the 40th International Conference on Machine Learning, 202:26215–27. ML Research Press, 2023.","ama":"Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:26215-26227.","apa":"Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., & Alistarh, D.-A. (2023). SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United States: ML Research Press.","ieee":"M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227.","ista":"Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 26215–26227."},"publication":"Proceedings of the 40th International Conference on Machine Learning","article_processing_charge":"No","day":"30","scopus_import":"1","volume":202,"date_updated":"2023-10-31T09:33:51Z","date_created":"2023-10-29T23:01:17Z","author":[{"id":"66374281-f394-11eb-9cf6-869147deecc0","first_name":"Mahdi","last_name":"Nikdan","full_name":"Nikdan, Mahdi"},{"last_name":"Pegolotti","first_name":"Tommaso","full_name":"Pegolotti, Tommaso"},{"id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117","orcid":"0000-0002-7778-3221","first_name":"Eugenia B","last_name":"Iofinova","full_name":"Iofinova, Eugenia B"},{"last_name":"Kurtic","first_name":"Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218","full_name":"Kurtic, Eldar"},{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"}],"department":[{"_id":"DaAl"}],"publisher":"ML Research Press","publication_status":"published","year":"2023","acknowledgement":"We would like to thank Elias Frantar for his valuable assistance and support at the outset of this project, and the anonymous ICML and SNN reviewers for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant 805223 ScaleML. ","ec_funded":1,"language":[{"iso":"eng"}],"conference":{"end_date":"2023-07-29","location":"Honolulu, Hawaii, HI, United States","start_date":"2023-07-23","name":"ICML: International Conference on Machine Learning"},"project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"quality_controlled":"1","external_id":{"arxiv":["2302.04852"]},"oa":1,"main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2302.04852","open_access":"1"}],"publication_identifier":{"eissn":["2640-3498"]},"month":"07"},{"alternative_title":["PMLR"],"type":"conference","abstract":[{"text":"We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.","lang":"eng"}],"status":"public","title":"SparseGPT: Massive language models can be accurately pruned in one-shot","intvolume":" 202","_id":"14458","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","oa_version":"Preprint","scopus_import":"1","day":"30","article_processing_charge":"No","page":"10323-10337","publication":"Proceedings of the 40th International Conference on Machine Learning","citation":{"ista":"Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.","ieee":"E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be accurately pruned in one-shot,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 10323–10337.","apa":"Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.","ama":"Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:10323-10337.","chicago":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.","mla":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 10323–37.","short":"E. Frantar, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 10323–10337."},"date_published":"2023-07-30T00:00:00Z","ec_funded":1,"publication_status":"published","publisher":"ML Research Press","department":[{"_id":"DaAl"}],"year":"2023","acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl.","date_updated":"2023-10-31T09:59:42Z","date_created":"2023-10-29T23:01:16Z","volume":202,"author":[{"full_name":"Frantar, Elias","first_name":"Elias","last_name":"Frantar","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"}],"month":"07","publication_identifier":{"eissn":["2640-3498"]},"quality_controlled":"1","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2301.00774"}],"external_id":{"arxiv":["2301.00774"]},"oa":1,"acknowledged_ssus":[{"_id":"ScienComp"}],"language":[{"iso":"eng"}],"conference":{"end_date":"2023-07-29","location":"Honolulu, Hawaii, HI, United States","start_date":"2023-07-23","name":"ICML: International Conference on Machine Learning"}},{"abstract":[{"text":"We introduce extension-based proofs, a class of impossibility proofs that includes valency arguments. They are modelled as an interaction between a prover and a protocol. Using proofs based on combinatorial topology, it has been shown that it is impossible to deterministically solve -set agreement among processes or approximate agreement on a cycle of length 4 among processes in a wait-free manner in asynchronous models where processes communicate using objects that can be constructed from shared registers. However, it was unknown whether proofs based on simpler techniques were possible. We show that these impossibility results cannot be obtained by extension-based proofs in the iterated snapshot model and, hence, extension-based proofs are limited in power.","lang":"eng"}],"issue":"4","type":"journal_article","oa_version":"Preprint","_id":"14364","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Why extension-based proofs fail","status":"public","intvolume":" 52","day":"25","article_processing_charge":"No","scopus_import":"1","date_published":"2023-07-25T00:00:00Z","publication":"SIAM Journal on Computing","citation":{"ieee":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, and L. Zhu, “Why extension-based proofs fail,” SIAM Journal on Computing, vol. 52, no. 4. Society for Industrial and Applied Mathematics, pp. 913–944, 2023.","apa":"Alistarh, D.-A., Aspnes, J., Ellen, F., Gelashvili, R., & Zhu, L. (2023). Why extension-based proofs fail. SIAM Journal on Computing. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/20M1375851","ista":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. 2023. Why extension-based proofs fail. SIAM Journal on Computing. 52(4), 913–944.","ama":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. Why extension-based proofs fail. SIAM Journal on Computing. 2023;52(4):913-944. doi:10.1137/20M1375851","chicago":"Alistarh, Dan-Adrian, James Aspnes, Faith Ellen, Rati Gelashvili, and Leqi Zhu. “Why Extension-Based Proofs Fail.” SIAM Journal on Computing. Society for Industrial and Applied Mathematics, 2023. https://doi.org/10.1137/20M1375851.","short":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, L. Zhu, SIAM Journal on Computing 52 (2023) 913–944.","mla":"Alistarh, Dan-Adrian, et al. “Why Extension-Based Proofs Fail.” SIAM Journal on Computing, vol. 52, no. 4, Society for Industrial and Applied Mathematics, 2023, pp. 913–44, doi:10.1137/20M1375851."},"article_type":"original","page":"913-944","ec_funded":1,"author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"first_name":"James","last_name":"Aspnes","full_name":"Aspnes, James"},{"last_name":"Ellen","first_name":"Faith","full_name":"Ellen, Faith"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"},{"full_name":"Zhu, Leqi","first_name":"Leqi","last_name":"Zhu","id":"a2117c59-cee4-11ed-b9d0-874ecf0f8ac5"}],"related_material":{"record":[{"relation":"earlier_version","status":"public","id":"6676"}]},"date_updated":"2023-12-13T12:28:29Z","date_created":"2023-09-24T22:01:11Z","volume":52,"acknowledgement":"We would like to thank Valerie King, Toniann Pitassi, and Michael Saks for helpful discussions and Shi Hao Liu for his useful feedback.\r\nThis research was supported by the Natural Science and Engineering Research Council of Canada under grants RGPIN-2015-05080 and RGPIN-2020-04178, a postgraduate scholarship, and a postdoctoral fellowship; a University of Toronto postdoctoral fellowship; the National Science Foundation under grants CCF-1217921, CCF-1301926, CCF-1637385, CCF-1650596, and IIS-1447786; the U.S. Department of Energy under grant ER26116/DE-SC0008923; the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement 805223 ScaleML; and the Oracle and Intel corporations. Some of the work on this paper was done while Faith Ellen was visiting IST Austria.","year":"2023","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Society for Industrial and Applied Mathematics","month":"07","publication_identifier":{"issn":["0097-5397"],"eissn":["1095-7111"]},"doi":"10.1137/20M1375851","language":[{"iso":"eng"}],"oa":1,"main_file_link":[{"url":"https://arxiv.org/abs/1811.01421","open_access":"1"}],"external_id":{"isi":["001082972300004"],"arxiv":["1811.01421"]},"isi":1,"quality_controlled":"1","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}]},{"ec_funded":1,"year":"2023","acknowledgement":"The authors would like to sincerely thank Sara Hooker for her feedback during the development of this work. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via Starting Grant 805223 ScaleML.","publication_status":"published","publisher":"IEEE","department":[{"_id":"DaAl"},{"_id":"ChLa"}],"author":[{"first_name":"Eugenia B","last_name":"Iofinova","id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117","orcid":"0000-0002-7778-3221","full_name":"Iofinova, Eugenia B"},{"id":"32D78294-F248-11E8-B48F-1D18A9856A87","first_name":"Elena-Alexandra","last_name":"Peste","full_name":"Peste, Elena-Alexandra"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"related_material":{"link":[{"relation":"software","url":"https://github.com/IST-DASLab/pruned-vision-model-bias"}]},"date_created":"2024-01-10T08:42:40Z","date_updated":"2024-01-10T08:59:26Z","month":"08","publication_identifier":{"eissn":["2575-7075"],"eisbn":["9798350301298"]},"external_id":{"isi":["001062531308068"],"arxiv":["2304.12622"]},"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2304.12622"}],"oa":1,"quality_controlled":"1","isi":1,"project":[{"grant_number":" W1260-N35","_id":"9B9290DE-BA93-11EA-9121-9846C619BF3A","name":"Vienna Graduate School on Computational Optimization"},{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"conference":{"start_date":"2023-06-17","location":"Vancouver, BC, Canada","end_date":"2023-06-24","name":"CVPR: Conference on Computer Vision and Pattern Recognition"},"doi":"10.1109/cvpr52729.2023.02334","language":[{"iso":"eng"}],"type":"conference","abstract":[{"text":"Pruning—that is, setting a significant subset of the parameters of a neural network to zero—is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias.","lang":"eng"}],"_id":"14771","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","title":"Bias in pruned vision models: In-depth analysis and countermeasures","oa_version":"Preprint","day":"22","article_processing_charge":"No","publication":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition","citation":{"mla":"Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–73, doi:10.1109/cvpr52729.2023.02334.","short":"E.B. Iofinova, E.-A. Peste, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373.","chicago":"Iofinova, Eugenia B, Elena-Alexandra Peste, and Dan-Adrian Alistarh. “Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 24364–73. IEEE, 2023. https://doi.org/10.1109/cvpr52729.2023.02334.","ama":"Iofinova EB, Peste E-A, Alistarh D-A. Bias in pruned vision models: In-depth analysis and countermeasures. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE; 2023:24364-24373. doi:10.1109/cvpr52729.2023.02334","ista":"Iofinova EB, Peste E-A, Alistarh D-A. 2023. Bias in pruned vision models: In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition, 24364–24373.","apa":"Iofinova, E. B., Peste, E.-A., & Alistarh, D.-A. (2023). Bias in pruned vision models: In-depth analysis and countermeasures. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24364–24373). Vancouver, BC, Canada: IEEE. https://doi.org/10.1109/cvpr52729.2023.02334","ieee":"E. B. Iofinova, E.-A. Peste, and D.-A. Alistarh, “Bias in pruned vision models: In-depth analysis and countermeasures,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 24364–24373."},"page":"24364-24373","date_published":"2023-08-22T00:00:00Z"},{"author":[{"first_name":"Nikita","last_name":"Koval","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87","full_name":"Koval, Nikita"},{"id":"2e711909-896a-11ed-bdf8-eb0f5a2984c6","last_name":"Fedorov","first_name":"Alexander","full_name":"Fedorov, Alexander"},{"last_name":"Sokolova","first_name":"Maria","full_name":"Sokolova, Maria"},{"full_name":"Tsitelov, Dmitry","first_name":"Dmitry","last_name":"Tsitelov"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"}],"related_material":{"record":[{"status":"public","relation":"research_data","id":"14995"}]},"date_created":"2023-09-03T22:01:16Z","date_updated":"2024-02-27T07:46:52Z","volume":13964,"year":"2023","publication_status":"published","publisher":"Springer Nature","department":[{"_id":"DaAl"},{"_id":"GradSch"}],"file_date_updated":"2023-09-06T08:16:25Z","conference":{"location":"Paris, France","start_date":"2023-07-17","end_date":"2023-07-22","name":"CAV: Computer Aided Verification"},"doi":"10.1007/978-3-031-37706-8_8","language":[{"iso":"eng"}],"oa":1,"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"quality_controlled":"1","month":"07","publication_identifier":{"isbn":["9783031377051"],"eissn":["1611-3349"],"issn":["0302-9743"]},"oa_version":"Published Version","file":[{"file_id":"14275","relation":"main_file","date_created":"2023-09-06T08:16:25Z","date_updated":"2023-09-06T08:16:25Z","success":1,"checksum":"c346016393123a0a2338ad4d976f61bc","file_name":"2023_LNCS_Koval.pdf","access_level":"open_access","creator":"dernst","content_type":"application/pdf","file_size":421408}],"_id":"14260","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","ddc":["000"],"status":"public","title":"Lincheck: A practical framework for testing concurrent data structures on JVM","intvolume":" 13964","abstract":[{"text":"This paper presents Lincheck, a new practical and user-friendly framework for testing concurrent algorithms on the Java Virtual Machine (JVM). Lincheck provides a simple and declarative way to write concurrent tests: instead of describing how to perform the test, users specify what to test by declaring all the operations to examine; the framework automatically handles the rest. As a result, tests written with Lincheck are concise and easy to understand. The framework automatically generates a set of concurrent scenarios, examines them using stress-testing or bounded model checking, and verifies that the results of each invocation are correct. Notably, if an error is detected via model checking, Lincheck provides an easy-to-follow trace to reproduce it, significantly simplifying the bug investigation.\r\n\r\nTo the best of our knowledge, Lincheck is the first production-ready tool on the JVM that offers such a simple way of writing concurrent tests, without requiring special skills or expertise. We successfully integrated Lincheck in the development process of several large projects, such as Kotlin Coroutines, and identified new bugs in popular concurrency libraries, such as a race in Java’s standard ConcurrentLinkedDeque and a liveliness bug in Java’s AbstractQueuedSynchronizer framework, which is used in most of the synchronization primitives. We believe that Lincheck can significantly improve the quality and productivity of concurrent algorithms research and development and become the state-of-the-art tool for checking their correctness.","lang":"eng"}],"type":"conference","alternative_title":["LNCS"],"date_published":"2023-07-17T00:00:00Z","publication":"35th International Conference on Computer Aided Verification ","citation":{"ieee":"N. Koval, A. Fedorov, M. Sokolova, D. Tsitelov, and D.-A. Alistarh, “Lincheck: A practical framework for testing concurrent data structures on JVM,” in 35th International Conference on Computer Aided Verification , Paris, France, 2023, vol. 13964, pp. 156–169.","apa":"Koval, N., Fedorov, A., Sokolova, M., Tsitelov, D., & Alistarh, D.-A. (2023). Lincheck: A practical framework for testing concurrent data structures on JVM. In 35th International Conference on Computer Aided Verification (Vol. 13964, pp. 156–169). Paris, France: Springer Nature. https://doi.org/10.1007/978-3-031-37706-8_8","ista":"Koval N, Fedorov A, Sokolova M, Tsitelov D, Alistarh D-A. 2023. Lincheck: A practical framework for testing concurrent data structures on JVM. 35th International Conference on Computer Aided Verification . CAV: Computer Aided Verification, LNCS, vol. 13964, 156–169.","ama":"Koval N, Fedorov A, Sokolova M, Tsitelov D, Alistarh D-A. Lincheck: A practical framework for testing concurrent data structures on JVM. In: 35th International Conference on Computer Aided Verification . Vol 13964. Springer Nature; 2023:156-169. doi:10.1007/978-3-031-37706-8_8","chicago":"Koval, Nikita, Alexander Fedorov, Maria Sokolova, Dmitry Tsitelov, and Dan-Adrian Alistarh. “Lincheck: A Practical Framework for Testing Concurrent Data Structures on JVM.” In 35th International Conference on Computer Aided Verification , 13964:156–69. Springer Nature, 2023. https://doi.org/10.1007/978-3-031-37706-8_8.","short":"N. Koval, A. Fedorov, M. Sokolova, D. Tsitelov, D.-A. Alistarh, in:, 35th International Conference on Computer Aided Verification , Springer Nature, 2023, pp. 156–169.","mla":"Koval, Nikita, et al. “Lincheck: A Practical Framework for Testing Concurrent Data Structures on JVM.” 35th International Conference on Computer Aided Verification , vol. 13964, Springer Nature, 2023, pp. 156–69, doi:10.1007/978-3-031-37706-8_8."},"page":"156-169","day":"17","article_processing_charge":"Yes (in subscription journal)","has_accepted_license":"1","scopus_import":"1"},{"abstract":[{"lang":"eng","text":"Lincheck is a new practical and user-friendly framework for testing concurrent data structures on the Java Virtual Machine (JVM). It provides a simple and declarative way to write concurrent tests. Instead of describing how to perform the test, users specify what to test by declaring all the operations to examine; the framework automatically handles the rest. As a result, tests written with Lincheck are concise and easy to understand. \r\nThe artifact presents a collection of Lincheck tests that discover new bugs in popular libraries and implementations from the concurrency literature -- they are listed in Table 1, Section 3. To evaluate the performance of Lincheck analysis, the collection of tests also includes those which check correct data structures and, thus, always succeed. Similarly to Table 2, Section 3, the experiments demonstrate the reasonable time to perform a test. Finally, Lincheck provides user-friendly output with an easy-to-follow trace to reproduce a detected error, significantly simplifying further investigation."}],"type":"research_data_reference","author":[{"id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87","first_name":"Nikita","last_name":"Koval","full_name":"Koval, Nikita"},{"full_name":"Fedorov, Alexander","id":"2e711909-896a-11ed-bdf8-eb0f5a2984c6","first_name":"Alexander","last_name":"Fedorov"},{"full_name":"Sokolova, Maria","first_name":"Maria","last_name":"Sokolova"},{"full_name":"Tsitelov, Dmitry","last_name":"Tsitelov","first_name":"Dmitry"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"}],"related_material":{"record":[{"status":"public","relation":"used_in_publication","id":"14260"}]},"date_created":"2024-02-14T15:14:13Z","date_updated":"2024-02-27T07:46:52Z","oa_version":"Published Version","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"14995","year":"2023","title":"Lincheck: A practical framework for testing concurrent data structures on JVM","ddc":["000"],"status":"public","publisher":"Zenodo","department":[{"_id":"DaAl"}],"month":"04","day":"28","article_processing_charge":"No","date_published":"2023-04-28T00:00:00Z","doi":"10.5281/ZENODO.7877757","oa":1,"citation":{"ama":"Koval N, Fedorov A, Sokolova M, Tsitelov D, Alistarh D-A. Lincheck: A practical framework for testing concurrent data structures on JVM. 2023. doi:10.5281/ZENODO.7877757","ieee":"N. Koval, A. Fedorov, M. Sokolova, D. Tsitelov, and D.-A. Alistarh, “Lincheck: A practical framework for testing concurrent data structures on JVM.” Zenodo, 2023.","apa":"Koval, N., Fedorov, A., Sokolova, M., Tsitelov, D., & Alistarh, D.-A. (2023). Lincheck: A practical framework for testing concurrent data structures on JVM. Zenodo. https://doi.org/10.5281/ZENODO.7877757","ista":"Koval N, Fedorov A, Sokolova M, Tsitelov D, Alistarh D-A. 2023. Lincheck: A practical framework for testing concurrent data structures on JVM, Zenodo, 10.5281/ZENODO.7877757.","short":"N. Koval, A. Fedorov, M. Sokolova, D. Tsitelov, D.-A. Alistarh, (2023).","mla":"Koval, Nikita, et al. Lincheck: A Practical Framework for Testing Concurrent Data Structures on JVM. Zenodo, 2023, doi:10.5281/ZENODO.7877757.","chicago":"Koval, Nikita, Alexander Fedorov, Maria Sokolova, Dmitry Tsitelov, and Dan-Adrian Alistarh. “Lincheck: A Practical Framework for Testing Concurrent Data Structures on JVM.” Zenodo, 2023. https://doi.org/10.5281/ZENODO.7877757."},"main_file_link":[{"open_access":"1","url":"https://doi.org/10.5281/zenodo.7877757"}]},{"oa_version":"Published Version","file":[{"relation":"main_file","file_id":"11346","date_updated":"2022-05-02T08:06:33Z","date_created":"2022-05-02T08:06:33Z","checksum":"2c7c982174c6f98c4ca6e92539d15086","success":1,"file_name":"2022_LIPICs_Alistarh.pdf","access_level":"open_access","file_size":959406,"content_type":"application/pdf","creator":"dernst"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"11184","title":"Fast graphical population protocols","ddc":["510"],"status":"public","intvolume":" 217","abstract":[{"text":"Let G be a graph on n nodes. In the stochastic population protocol model, a collection of n indistinguishable, resource-limited nodes collectively solve tasks via pairwise interactions. In each interaction, two randomly chosen neighbors first read each other’s states, and then update their local states. A rich line of research has established tight upper and lower bounds on the complexity of fundamental tasks, such as majority and leader election, in this model, when G is a clique. Specifically, in the clique, these tasks can be solved fast, i.e., in n polylog n pairwise interactions, with high probability, using at most polylog n states per node.\r\nIn this work, we consider the more general setting where G is an arbitrary regular graph, and present a technique for simulating protocols designed for fully-connected networks in any connected regular graph. Our main result is a simulation that is efficient on many interesting graph families: roughly, the simulation overhead is polylogarithmic in the number of nodes, and quadratic in the conductance of the graph. As a sample application, we show that, in any regular graph with conductance φ, both leader election and exact majority can be solved in φ^{-2} ⋅ n polylog n pairwise interactions, with high probability, using at most φ^{-2} ⋅ polylog n states per node. This shows that there are fast and space-efficient population protocols for leader election and exact majority on graphs with good expansion properties. We believe our results will prove generally useful, as they allow efficient technology transfer between the well-mixed (clique) case, and the under-explored spatial setting.","lang":"eng"}],"type":"conference","alternative_title":["LIPIcs"],"date_published":"2022-02-01T00:00:00Z","publication":"25th International Conference on Principles of Distributed Systems","citation":{"apa":"Alistarh, D.-A., Gelashvili, R., & Rybicki, J. (2022). Fast graphical population protocols. In Q. Bramas, V. Gramoli, & A. Milani (Eds.), 25th International Conference on Principles of Distributed Systems (Vol. 217). Strasbourg, France: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.OPODIS.2021.14","ieee":"D.-A. Alistarh, R. Gelashvili, and J. Rybicki, “Fast graphical population protocols,” in 25th International Conference on Principles of Distributed Systems, Strasbourg, France, 2022, vol. 217.","ista":"Alistarh D-A, Gelashvili R, Rybicki J. 2022. Fast graphical population protocols. 25th International Conference on Principles of Distributed Systems. OPODIS, LIPIcs, vol. 217, 14.","ama":"Alistarh D-A, Gelashvili R, Rybicki J. Fast graphical population protocols. In: Bramas Q, Gramoli V, Milani A, eds. 25th International Conference on Principles of Distributed Systems. Vol 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2022. doi:10.4230/LIPIcs.OPODIS.2021.14","chicago":"Alistarh, Dan-Adrian, Rati Gelashvili, and Joel Rybicki. “Fast Graphical Population Protocols.” In 25th International Conference on Principles of Distributed Systems, edited by Quentin Bramas, Vincent Gramoli, and Alessia Milani, Vol. 217. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. https://doi.org/10.4230/LIPIcs.OPODIS.2021.14.","short":"D.-A. Alistarh, R. Gelashvili, J. Rybicki, in:, Q. Bramas, V. Gramoli, A. Milani (Eds.), 25th International Conference on Principles of Distributed Systems, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.","mla":"Alistarh, Dan-Adrian, et al. “Fast Graphical Population Protocols.” 25th International Conference on Principles of Distributed Systems, edited by Quentin Bramas et al., vol. 217, 14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022, doi:10.4230/LIPIcs.OPODIS.2021.14."},"day":"01","has_accepted_license":"1","article_processing_charge":"No","scopus_import":"1","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Rati","last_name":"Gelashvili","full_name":"Gelashvili, Rati"},{"full_name":"Rybicki, Joel","orcid":"0000-0002-6432-6646","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","last_name":"Rybicki","first_name":"Joel"}],"date_created":"2022-04-17T22:01:47Z","date_updated":"2022-05-02T08:09:39Z","volume":217,"acknowledgement":"Dan Alistarh: This project has received funding from the European Research Council (ERC)\r\nunder the European Union’s Horizon 2020 research and innovation programme (grant agreement No.805223 ScaleML).\r\nJoel Rybicki: This project has received from the European Union’s Horizon 2020 research and\r\ninnovation programme under the Marie Skłodowska-Curie grant agreement No. 840605.\r\nAcknowledgements We grateful to Giorgi Nadiradze for pointing out a generalisation of the phase clock construction to non-regular graphs. We also thank anonymous reviewers for their useful comments on earlier versions of this manuscript.","year":"2022","publication_status":"published","publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","department":[{"_id":"DaAl"}],"editor":[{"first_name":"Quentin","last_name":"Bramas","full_name":"Bramas, Quentin"},{"last_name":"Gramoli","first_name":"Vincent","full_name":"Gramoli, Vincent"},{"last_name":"Milani","first_name":"Alessia","full_name":"Milani, Alessia"}],"file_date_updated":"2022-05-02T08:06:33Z","ec_funded":1,"article_number":"14","conference":{"end_date":"2021-12-15","location":"Strasbourg, France","start_date":"2021-12-13","name":"OPODIS"},"doi":"10.4230/LIPIcs.OPODIS.2021.14","language":[{"iso":"eng"}],"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"external_id":{"arxiv":["2102.08808"]},"oa":1,"quality_controlled":"1","project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"},{"name":"Coordination in constrained and natural distributed systems","call_identifier":"H2020","grant_number":"840605","_id":"26A5D39A-B435-11E9-9278-68D0E5697425"}],"month":"02","publication_identifier":{"issn":["1868-8969"],"isbn":["9783959772198"]}},{"oa_version":"Published Version","file":[{"file_id":"12795","relation":"main_file","date_updated":"2023-04-03T06:17:58Z","date_created":"2023-04-03T06:17:58Z","success":1,"checksum":"1a397746235f245da5468819247ff663","file_name":"2022_ACMMiddleware_Markov.pdf","access_level":"open_access","creator":"dernst","content_type":"application/pdf","file_size":1514169}],"title":"CGX: Adaptive system support for communication-efficient deep learning","ddc":["000"],"status":"public","_id":"12780","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","abstract":[{"text":"The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth over-provisioning. Overprovisioning comes at a cost: there is an order of magnitude price difference between \"cloud-grade\" servers with such support, relative to their popular \"consumer-grade\" counterparts, although single server-grade and consumer-grade GPUs can have similar computational envelopes.\r\n\r\nIn this paper, we show that the costly hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for compressed communication in ML applications, for both multi-GPU single-node training, as well as larger-scale multi-node training. CGX is based on two technical advances: At the system level, it relies on a re-developed communication stack for ML frameworks, which provides flexible, highly-efficient support for compressed communication. At the application level, it provides seamless, parameter-free integration with popular frameworks, so that end-users do not have to modify training recipes, nor significant training code. This is complemented by a layer-wise adaptive compression technique which dynamically balances compression gains with accuracy preservation. CGX integrates with popular ML frameworks, providing up to 3X speedups for multi-GPU nodes based on commodity hardware, and order-of-magnitude improvements in the multi-node setting, with negligible impact on accuracy.","lang":"eng"}],"type":"conference","date_published":"2022-11-01T00:00:00Z","page":"241-254","citation":{"chicago":"Markov, Ilia, Hamidreza Ramezanikebrya, and Dan-Adrian Alistarh. “CGX: Adaptive System Support for Communication-Efficient Deep Learning.” In Proceedings of the 23rd ACM/IFIP International Middleware Conference, 241–54. Association for Computing Machinery, 2022. https://doi.org/10.1145/3528535.3565248.","short":"I. Markov, H. Ramezanikebrya, D.-A. Alistarh, in:, Proceedings of the 23rd ACM/IFIP International Middleware Conference, Association for Computing Machinery, 2022, pp. 241–254.","mla":"Markov, Ilia, et al. “CGX: Adaptive System Support for Communication-Efficient Deep Learning.” Proceedings of the 23rd ACM/IFIP International Middleware Conference, Association for Computing Machinery, 2022, pp. 241–54, doi:10.1145/3528535.3565248.","ieee":"I. Markov, H. Ramezanikebrya, and D.-A. Alistarh, “CGX: Adaptive system support for communication-efficient deep learning,” in Proceedings of the 23rd ACM/IFIP International Middleware Conference, Quebec, QC, Canada, 2022, pp. 241–254.","apa":"Markov, I., Ramezanikebrya, H., & Alistarh, D.-A. (2022). CGX: Adaptive system support for communication-efficient deep learning. In Proceedings of the 23rd ACM/IFIP International Middleware Conference (pp. 241–254). Quebec, QC, Canada: Association for Computing Machinery. https://doi.org/10.1145/3528535.3565248","ista":"Markov I, Ramezanikebrya H, Alistarh D-A. 2022. CGX: Adaptive system support for communication-efficient deep learning. Proceedings of the 23rd ACM/IFIP International Middleware Conference. Middleware: International Middleware Conference, 241–254.","ama":"Markov I, Ramezanikebrya H, Alistarh D-A. CGX: Adaptive system support for communication-efficient deep learning. In: Proceedings of the 23rd ACM/IFIP International Middleware Conference. Association for Computing Machinery; 2022:241-254. doi:10.1145/3528535.3565248"},"publication":"Proceedings of the 23rd ACM/IFIP International Middleware Conference","article_processing_charge":"Yes (via OA deal)","has_accepted_license":"1","day":"01","date_updated":"2023-04-03T06:21:04Z","date_created":"2023-03-31T06:17:00Z","author":[{"id":"D0CF4148-C985-11E9-8066-0BDEE5697425","last_name":"Markov","first_name":"Ilia","full_name":"Markov, Ilia"},{"first_name":"Hamidreza","last_name":"Ramezanikebrya","full_name":"Ramezanikebrya, Hamidreza"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"publisher":"Association for Computing Machinery","department":[{"_id":"DaAl"}],"publication_status":"published","year":"2022","acknowledgement":"The authors sincerely thank Nikoli Dryden, Tal Ben-Nun, Torsten Hoefler and Bapi Chatterjee for useful discussions throughout the development of this project.","file_date_updated":"2023-04-03T06:17:58Z","language":[{"iso":"eng"}],"doi":"10.1145/3528535.3565248","conference":{"end_date":"2022-11-11","location":"Quebec, QC, Canada","start_date":"2022-11-07","name":"Middleware: International Middleware Conference"},"quality_controlled":"1","oa":1,"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"external_id":{"arxiv":["2111.08617"]},"publication_identifier":{"isbn":["9781450393409"]},"month":"11"},{"date_published":"2022-07-21T00:00:00Z","citation":{"ama":"Alistarh D-A, Rybicki J, Voitovych S. Near-optimal leader election in population protocols on graphs. In: Proceedings of the Annual ACM Symposium on Principles of Distributed Computing. Association for Computing Machinery; 2022:246-256. doi:10.1145/3519270.3538435","ista":"Alistarh D-A, Rybicki J, Voitovych S. 2022. Near-optimal leader election in population protocols on graphs. Proceedings of the Annual ACM Symposium on Principles of Distributed Computing. PODC: Symposium on Principles of Distributed Computing, 246–256.","ieee":"D.-A. Alistarh, J. Rybicki, and S. Voitovych, “Near-optimal leader election in population protocols on graphs,” in Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, Salerno, Italy, 2022, pp. 246–256.","apa":"Alistarh, D.-A., Rybicki, J., & Voitovych, S. (2022). Near-optimal leader election in population protocols on graphs. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (pp. 246–256). Salerno, Italy: Association for Computing Machinery. https://doi.org/10.1145/3519270.3538435","mla":"Alistarh, Dan-Adrian, et al. “Near-Optimal Leader Election in Population Protocols on Graphs.” Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2022, pp. 246–56, doi:10.1145/3519270.3538435.","short":"D.-A. Alistarh, J. Rybicki, S. Voitovych, in:, Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2022, pp. 246–256.","chicago":"Alistarh, Dan-Adrian, Joel Rybicki, and Sasha Voitovych. “Near-Optimal Leader Election in Population Protocols on Graphs.” In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, 246–56. Association for Computing Machinery, 2022. https://doi.org/10.1145/3519270.3538435."},"publication":"Proceedings of the Annual ACM Symposium on Principles of Distributed Computing","page":"246-256","article_processing_charge":"Yes (via OA deal)","has_accepted_license":"1","day":"21","scopus_import":"1","file":[{"checksum":"4c6b29172b8e355b4fbc364a2e0827b2","success":1,"date_updated":"2022-08-16T08:05:15Z","date_created":"2022-08-16T08:05:15Z","relation":"main_file","file_id":"11854","content_type":"application/pdf","file_size":1593474,"creator":"cchlebak","access_level":"open_access","file_name":"2022_PODC_Alistarh.pdf"}],"oa_version":"Published Version","_id":"11844","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","ddc":["000"],"status":"public","title":"Near-optimal leader election in population protocols on graphs","abstract":[{"text":"In the stochastic population protocol model, we are given a connected graph with n nodes, and in every time step, a scheduler samples an edge of the graph uniformly at random and the nodes connected by this edge interact. A fundamental task in this model is stable leader election, in which all nodes start in an identical state and the aim is to reach a configuration in which (1) exactly one node is elected as leader and (2) this node remains as the unique leader no matter what sequence of interactions follows. On cliques, the complexity of this problem has recently been settled: time-optimal protocols stabilize in Θ(n log n) expected steps using Θ(log log n) states, whereas protocols that use O(1) states require Θ(n2) expected steps.\r\n\r\nIn this work, we investigate the complexity of stable leader election on general graphs. We provide the first non-trivial time lower bounds for leader election on general graphs, showing that, when moving beyond cliques, the complexity landscape of leader election becomes very diverse: the time required to elect a leader can range from O(1) to Θ(n3) expected steps. On the upper bound side, we first observe that there exists a protocol that is time-optimal on many graph families, but uses polynomially-many states. In contrast, we give a near-time-optimal protocol that uses only O(log2n) states that is at most a factor log n slower. Finally, we show that the constant-state protocol of Beauquier et al. [OPODIS 2013] is at most a factor n log n slower than the fast polynomial-state protocol. Moreover, among constant-state protocols, this protocol has near-optimal average case complexity on dense random graphs.","lang":"eng"}],"type":"conference","doi":"10.1145/3519270.3538435","conference":{"start_date":"2022-07-25","location":"Salerno, Italy","end_date":"2022-07-29","name":"PODC: Symposium on Principles of Distributed Computing"},"language":[{"iso":"eng"}],"oa":1,"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"external_id":{"arxiv":["2205.12597"]},"project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"quality_controlled":"1","publication_identifier":{"isbn":["9781450392624"]},"month":"07","author":[{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Rybicki, Joel","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0002-6432-6646","first_name":"Joel","last_name":"Rybicki"},{"full_name":"Voitovych, Sasha","first_name":"Sasha","last_name":"Voitovych"}],"date_updated":"2023-06-14T12:06:01Z","date_created":"2022-08-14T22:01:46Z","acknowledgement":"We thank the anonymous reviewers for their helpful comments. We gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","year":"2022","publisher":"Association for Computing Machinery","department":[{"_id":"DaAl"}],"publication_status":"published","ec_funded":1,"file_date_updated":"2022-08-16T08:05:15Z"},{"file_date_updated":"2022-08-05T09:19:29Z","date_created":"2022-04-17T22:01:46Z","date_updated":"2023-08-03T06:49:20Z","author":[{"full_name":"Brown, Trevor A","first_name":"Trevor A","last_name":"Brown","id":"3569F0A0-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Sigouin","first_name":"William","full_name":"Sigouin, William"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"}],"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","year":"2022","acknowledgement":"This work was supported by: the Natural Sciences and Engineering Research Council of Canada (NSERC) Collaborative Research and Development grant: CRDPJ 539431-19, the\r\nCanada Foundation for Innovation John R. Evans Leaders Fund with equal support from the Ontario Research Fund CFI Leaders Opportunity Fund: 38512, Waterloo Huawei Joint Innovation Lab project “Scalable Infrastructure for Next Generation Data Management Systems”, NSERC Discovery Launch Supplement: DGECR-2019-00048, NSERC Discovery\r\nProgram under the grants: RGPIN-2019-04227 and RGPIN04512-2018, and the University of Waterloo. We would also like to thank the reviewers for their insightful comments.","month":"04","publication_identifier":{"isbn":["9781450392044"]},"language":[{"iso":"eng"}],"conference":{"start_date":"2022-04-02","location":"Seoul, Republic of Korea","end_date":"2022-04-06","name":"PPoPP: Sympopsium on Principles and Practice of Parallel Programming"},"doi":"10.1145/3503221.3508410","isi":1,"quality_controlled":"1","tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"external_id":{"isi":["000883318200027"]},"oa":1,"abstract":[{"text":"To maximize the performance of concurrent data structures, researchers have often turned to highly complex fine-grained techniques, resulting in efficient and elegant algorithms, which can however be often difficult to understand and prove correct. While simpler techniques exist, such as transactional memory, they can have limited performance or portability relative to their fine-grained counterparts. Approaches at both ends of this complexity-performance spectrum have been extensively explored, but relatively less is known about the middle ground: approaches that are willing to sacrifice some performance for simplicity, while remaining competitive with state-of-the-art handcrafted designs. In this paper, we explore this middle ground, and present PathCAS, a primitive that combines ideas from multi-word CAS (KCAS) and transactional memory approaches, while carefully avoiding overhead. We show how PathCAS can be used to implement efficient search data structures relatively simply, using an internal binary search tree as an example, then extending this to an AVL tree. Our best implementations outperform many handcrafted search trees: in search-heavy workloads, it rivals the BCCO tree [5], the fastest known concurrent binary tree in terms of search performance [3]. Our results suggest that PathCAS can yield concurrent data structures that are relatively easy to build and prove correct, while offering surprisingly high performance.","lang":"eng"}],"type":"conference","file":[{"file_name":"2022_PPoPP_Brown.pdf","access_level":"open_access","creator":"dernst","content_type":"application/pdf","file_size":1128343,"file_id":"11731","relation":"main_file","date_updated":"2022-08-05T09:19:29Z","date_created":"2022-08-05T09:19:29Z","success":1,"checksum":"8ceea411fa133795cd4903529498eb6b"}],"oa_version":"Published Version","ddc":["000"],"status":"public","title":"PathCAS: An efficient middle ground for concurrent search data structures","_id":"11181","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","day":"02","has_accepted_license":"1","article_processing_charge":"No","scopus_import":"1","date_published":"2022-04-02T00:00:00Z","page":"385-399","publication":"Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","citation":{"ista":"Brown TA, Sigouin W, Alistarh D-A. 2022. PathCAS: An efficient middle ground for concurrent search data structures. Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP: Sympopsium on Principles and Practice of Parallel Programming, 385–399.","apa":"Brown, T. A., Sigouin, W., & Alistarh, D.-A. (2022). PathCAS: An efficient middle ground for concurrent search data structures. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 385–399). Seoul, Republic of Korea: Association for Computing Machinery. https://doi.org/10.1145/3503221.3508410","ieee":"T. A. Brown, W. Sigouin, and D.-A. Alistarh, “PathCAS: An efficient middle ground for concurrent search data structures,” in Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, 2022, pp. 385–399.","ama":"Brown TA, Sigouin W, Alistarh D-A. PathCAS: An efficient middle ground for concurrent search data structures. In: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2022:385-399. doi:10.1145/3503221.3508410","chicago":"Brown, Trevor A, William Sigouin, and Dan-Adrian Alistarh. “PathCAS: An Efficient Middle Ground for Concurrent Search Data Structures.” In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 385–99. Association for Computing Machinery, 2022. https://doi.org/10.1145/3503221.3508410.","mla":"Brown, Trevor A., et al. “PathCAS: An Efficient Middle Ground for Concurrent Search Data Structures.” Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2022, pp. 385–99, doi:10.1145/3503221.3508410.","short":"T.A. Brown, W. Sigouin, D.-A. Alistarh, in:, Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2022, pp. 385–399."}},{"ec_funded":1,"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","year":"2022","acknowledgement":"We would like to thank the anonymous reviewers for their useful comments. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","date_created":"2022-04-17T22:01:46Z","date_updated":"2023-08-03T06:48:35Z","author":[{"full_name":"Postnikova, Anastasiia","first_name":"Anastasiia","last_name":"Postnikova"},{"full_name":"Koval, Nikita","first_name":"Nikita","last_name":"Koval","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"},{"id":"3279A00C-F248-11E8-B48F-1D18A9856A87","last_name":"Nadiradze","first_name":"Giorgi","full_name":"Nadiradze, Giorgi"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"related_material":{"record":[{"id":"13076","status":"public","relation":"research_data"}]},"month":"04","publication_identifier":{"isbn":["9781450392044"]},"quality_controlled":"1","isi":1,"project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"oa":1,"external_id":{"isi":["000883318200025"],"arxiv":["2109.00657"]},"main_file_link":[{"open_access":"1","url":" https://doi.org/10.48550/arXiv.2109.00657"}],"language":[{"iso":"eng"}],"conference":{"end_date":"2022-04-06","start_date":"2022-04-02","location":"Seoul, Republic of Korea","name":"PPoPP: Sympopsium on Principles and Practice of Parallel Programming"},"doi":"10.1145/3503221.3508432","type":"conference","abstract":[{"text":"Designing and implementing efficient parallel priority schedulers is an active research area. An intriguing proposed design is the Multi-Queue: given n threads and m ≥ n distinct priority queues, task insertions are performed uniformly at random, while, to delete, a thread picks two queues uniformly at random, and removes the observed task of higher priority. This approach scales well, and has probabilistic rank guarantees: roughly, the rank of each task removed, relative to remaining tasks in all other queues, is O (m) in expectation. Yet, the performance of this pattern is below that of well-engineered schedulers, which eschew theoretical guarantees for practical efficiency.\r\n\r\nWe investigate whether it is possible to design and implement a Multi-Queue-based task scheduler that is both highly-efficient and has analytical guarantees. We propose a new variant called the Stealing Multi-Queue (SMQ), a cache-efficient variant of the Multi-Queue, which leverages both queue affinity---each thread has a local queue, from which tasks are usually removed; but, with some probability, threads also attempt to steal higher-priority tasks from the other queues---and task batching, that is, the processing of several tasks in a single insert / remove step. These ideas are well-known for task scheduling without priorities; our theoretical contribution is showing that, despite relaxations, this design can still provide rank guarantees, which in turn implies bounds on total work performed. We provide a general SMQ implementation which can surpass state-of-the-art schedulers such as OBIM and PMOD in terms of performance on popular graph-processing benchmarks. Notably, the performance improvement comes mainly from the superior rank guarantees provided by our scheduler, confirming that analytically-reasoned approaches can still provide performance improvements for priority task scheduling.","lang":"eng"}],"title":"Multi-queues can be state-of-the-art priority schedulers","status":"public","_id":"11180","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","oa_version":"Preprint","scopus_import":"1","day":"02","article_processing_charge":"No","page":"353-367","publication":"Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","citation":{"mla":"Postnikova, Anastasiia, et al. “Multi-Queues Can Be State-of-the-Art Priority Schedulers.” Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2022, pp. 353–67, doi:10.1145/3503221.3508432.","short":"A. Postnikova, N. Koval, G. Nadiradze, D.-A. Alistarh, in:, Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2022, pp. 353–367.","chicago":"Postnikova, Anastasiia, Nikita Koval, Giorgi Nadiradze, and Dan-Adrian Alistarh. “Multi-Queues Can Be State-of-the-Art Priority Schedulers.” In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 353–67. Association for Computing Machinery, 2022. https://doi.org/10.1145/3503221.3508432.","ama":"Postnikova A, Koval N, Nadiradze G, Alistarh D-A. Multi-queues can be state-of-the-art priority schedulers. In: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2022:353-367. doi:10.1145/3503221.3508432","ista":"Postnikova A, Koval N, Nadiradze G, Alistarh D-A. 2022. Multi-queues can be state-of-the-art priority schedulers. Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP: Sympopsium on Principles and Practice of Parallel Programming, 353–367.","apa":"Postnikova, A., Koval, N., Nadiradze, G., & Alistarh, D.-A. (2022). Multi-queues can be state-of-the-art priority schedulers. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 353–367). Seoul, Republic of Korea: Association for Computing Machinery. https://doi.org/10.1145/3503221.3508432","ieee":"A. Postnikova, N. Koval, G. Nadiradze, and D.-A. Alistarh, “Multi-queues can be state-of-the-art priority schedulers,” in Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, 2022, pp. 353–367."},"date_published":"2022-04-02T00:00:00Z"},{"type":"research_data_reference","abstract":[{"lang":"eng","text":"The source code for replicating experiments presented in the paper.\r\n\r\nThe implementation of the designed priority schedulers can be found in Galois-2.2.1/include/Galois/WorkList/:\r\nStealingMultiQueue.h is the StealingMultiQueue.\r\nMQOptimized/ contains MQ Optimized variants.\r\n\r\nWe provide images that contain all the dependencies and datasets. Images can be pulled from npostnikova/mq-based-schedulers repository, or downloaded from Zenodo. See readme for more detail."}],"department":[{"_id":"DaAl"}],"publisher":"Zenodo","status":"public","ddc":["510"],"title":"Multi-queues can be state-of-the-art priority schedulers","_id":"13076","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2022","oa_version":"Published Version","date_created":"2023-05-23T17:05:40Z","date_updated":"2023-08-03T06:48:34Z","related_material":{"link":[{"url":"https://github.com/npostnikova/mq-based-schedulers/tree/v1.1","relation":"software"}],"record":[{"id":"11180","relation":"used_in_publication","status":"public"}]},"author":[{"full_name":"Postnikova, Anastasiia","first_name":"Anastasiia","last_name":"Postnikova"},{"full_name":"Koval, Nikita","last_name":"Koval","first_name":"Nikita","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Nadiradze, Giorgi","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","last_name":"Nadiradze","first_name":"Giorgi"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"}],"article_processing_charge":"No","day":"03","month":"01","oa":1,"main_file_link":[{"url":"https://doi.org/10.5281/zenodo.5813846","open_access":"1"}],"citation":{"apa":"Postnikova, A., Koval, N., Nadiradze, G., & Alistarh, D.-A. (2022). Multi-queues can be state-of-the-art priority schedulers. Zenodo. https://doi.org/10.5281/ZENODO.5733408","ieee":"A. Postnikova, N. Koval, G. Nadiradze, and D.-A. Alistarh, “Multi-queues can be state-of-the-art priority schedulers.” Zenodo, 2022.","ista":"Postnikova A, Koval N, Nadiradze G, Alistarh D-A. 2022. Multi-queues can be state-of-the-art priority schedulers, Zenodo, 10.5281/ZENODO.5733408.","ama":"Postnikova A, Koval N, Nadiradze G, Alistarh D-A. Multi-queues can be state-of-the-art priority schedulers. 2022. doi:10.5281/ZENODO.5733408","chicago":"Postnikova, Anastasiia, Nikita Koval, Giorgi Nadiradze, and Dan-Adrian Alistarh. “Multi-Queues Can Be State-of-the-Art Priority Schedulers.” Zenodo, 2022. https://doi.org/10.5281/ZENODO.5733408.","short":"A. Postnikova, N. Koval, G. Nadiradze, D.-A. Alistarh, (2022).","mla":"Postnikova, Anastasiia, et al. Multi-Queues Can Be State-of-the-Art Priority Schedulers. Zenodo, 2022, doi:10.5281/ZENODO.5733408."},"date_published":"2022-01-03T00:00:00Z","doi":"10.5281/ZENODO.5733408"},{"date_published":"2022-09-27T00:00:00Z","citation":{"ama":"Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. How well do sparse ImageNet models transfer? In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers; 2022:12256-12266. doi:10.1109/cvpr52688.2022.01195","apa":"Iofinova, E. B., Peste, E.-A., Kurtz, M., & Alistarh, D.-A. (2022). How well do sparse ImageNet models transfer? In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12256–12266). New Orleans, LA, United States: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/cvpr52688.2022.01195","ieee":"E. B. Iofinova, E.-A. Peste, M. Kurtz, and D.-A. Alistarh, “How well do sparse ImageNet models transfer?,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, United States, 2022, pp. 12256–12266.","ista":"Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. 2022. How well do sparse ImageNet models transfer? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR: Computer Vision and Pattern Recognition, 12256–12266.","short":"E.B. Iofinova, E.-A. Peste, M. Kurtz, D.-A. Alistarh, in:, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 12256–12266.","mla":"Iofinova, Eugenia B., et al. “How Well Do Sparse ImageNet Models Transfer?” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 12256–66, doi:10.1109/cvpr52688.2022.01195.","chicago":"Iofinova, Eugenia B, Elena-Alexandra Peste, Mark Kurtz, and Dan-Adrian Alistarh. “How Well Do Sparse ImageNet Models Transfer?” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12256–66. Institute of Electrical and Electronics Engineers, 2022. https://doi.org/10.1109/cvpr52688.2022.01195."},"publication":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition","page":"12256-12266","article_processing_charge":"No","day":"27","scopus_import":"1","oa_version":"Preprint","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","_id":"12299","status":"public","title":"How well do sparse ImageNet models transfer?","abstract":[{"lang":"eng","text":"Transfer learning is a classic paradigm by which models pretrained on large “upstream” datasets are adapted to yield good results on “downstream” specialized datasets. Generally, more accurate models on the “upstream” dataset tend to provide better transfer accuracy “downstream”. In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, regrowth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods. The code is available at: https://github.com/IST-DASLab/sparse-imagenet-transfer."}],"type":"conference","doi":"10.1109/cvpr52688.2022.01195","conference":{"name":"CVPR: Computer Vision and Pattern Recognition","end_date":"2022-06-24","location":"New Orleans, LA, United States","start_date":"2022-06-18"},"language":[{"iso":"eng"}],"external_id":{"arxiv":["2111.13445"],"isi":["000870759105034"]},"oa":1,"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2111.13445"}],"project":[{"grant_number":" W1260-N35","_id":"9B9290DE-BA93-11EA-9121-9846C619BF3A","name":"Vienna Graduate School on Computational Optimization"},{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"isi":1,"quality_controlled":"1","publication_identifier":{"eissn":["2575-7075"]},"month":"09","related_material":{"record":[{"id":"13074","relation":"dissertation_contains","status":"public"}]},"author":[{"full_name":"Iofinova, Eugenia B","last_name":"Iofinova","first_name":"Eugenia B","orcid":"0000-0002-7778-3221","id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117"},{"full_name":"Peste, Elena-Alexandra","id":"32D78294-F248-11E8-B48F-1D18A9856A87","first_name":"Elena-Alexandra","last_name":"Peste"},{"full_name":"Kurtz, Mark","last_name":"Kurtz","first_name":"Mark"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"date_updated":"2023-08-04T10:33:28Z","date_created":"2023-01-16T10:06:00Z","year":"2022","acknowledgement":"he authors would like to sincerely thank Christoph Lampert and Nir Shavit for fruitful discussions during the development of this work, and Eldar Kurtic for experimental support. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35, while AP and DA acknowledge generous support by the ERC, via Starting Grant 805223 ScaleML.","publisher":"Institute of Electrical and Electronics Engineers","department":[{"_id":"DaAl"},{"_id":"ChLa"}],"publication_status":"published","ec_funded":1},{"day":"01","article_processing_charge":"No","has_accepted_license":"1","scopus_import":"1","date_published":"2021-09-01T00:00:00Z","article_type":"original","page":"1-124","publication":"Journal of Machine Learning Research","citation":{"chicago":"Hoefler, Torsten, Dan-Adrian Alistarh, Tal Ben-Nun, Nikoli Dryden, and Elena-Alexandra Peste. “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks.” Journal of Machine Learning Research. Journal of Machine Learning Research, 2021.","short":"T. Hoefler, D.-A. Alistarh, T. Ben-Nun, N. Dryden, E.-A. Peste, Journal of Machine Learning Research 22 (2021) 1–124.","mla":"Hoefler, Torsten, et al. “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks.” Journal of Machine Learning Research, vol. 22, no. 241, Journal of Machine Learning Research, 2021, pp. 1–124.","apa":"Hoefler, T., Alistarh, D.-A., Ben-Nun, T., Dryden, N., & Peste, E.-A. (2021). Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research. Journal of Machine Learning Research.","ieee":"T. Hoefler, D.-A. Alistarh, T. Ben-Nun, N. Dryden, and E.-A. Peste, “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks,” Journal of Machine Learning Research, vol. 22, no. 241. Journal of Machine Learning Research, pp. 1–124, 2021.","ista":"Hoefler T, Alistarh D-A, Ben-Nun T, Dryden N, Peste E-A. 2021. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research. 22(241), 1–124.","ama":"Hoefler T, Alistarh D-A, Ben-Nun T, Dryden N, Peste E-A. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research. 2021;22(241):1-124."},"abstract":[{"text":"The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.","lang":"eng"}],"issue":"241","type":"journal_article","file":[{"relation":"main_file","file_id":"10192","date_updated":"2021-10-27T15:34:18Z","date_created":"2021-10-27T15:34:18Z","checksum":"3389d9d01fc58f8fb4c1a53e14a8abbf","success":1,"file_name":"2021_JMachLearnRes_Hoefler.pdf","access_level":"open_access","file_size":3527521,"content_type":"application/pdf","creator":"cziletti"}],"oa_version":"Published Version","title":"Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks","ddc":["000"],"status":"public","intvolume":" 22","_id":"10180","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"09","publication_identifier":{"issn":["1532-4435"],"eissn":["1533-7928"]},"language":[{"iso":"eng"}],"quality_controlled":"1","external_id":{"arxiv":["2102.00554"]},"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"oa":1,"main_file_link":[{"url":"https://www.jmlr.org/papers/v22/21-0366.html","open_access":"1"}],"file_date_updated":"2021-10-27T15:34:18Z","date_updated":"2022-05-13T09:36:08Z","date_created":"2021-10-24T22:01:34Z","volume":22,"author":[{"full_name":"Hoefler, Torsten","last_name":"Hoefler","first_name":"Torsten"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"full_name":"Ben-Nun, Tal","last_name":"Ben-Nun","first_name":"Tal"},{"last_name":"Dryden","first_name":"Nikoli","full_name":"Dryden, Nikoli"},{"first_name":"Elena-Alexandra","last_name":"Peste","id":"32D78294-F248-11E8-B48F-1D18A9856A87","full_name":"Peste, Elena-Alexandra"}],"publication_status":"published","publisher":"Journal of Machine Learning Research","department":[{"_id":"DaAl"}],"year":"2021","acknowledgement":"We thank Doug Burger, Steve Scott, Marco Heddes, and the respective teams at Microsoft for inspiring discussions on the topic. We thank Angelika Steger for uplifting debates about the connections to biological brains, Sidak Pal Singh for his support regarding experimental results, and Utku Evci as well as Xin Wang for comments on previous versions of this\r\nwork. Special thanks go to Bernhard Schölkopf, our JMLR editor Samy Bengio, and the three anonymous reviewers who provided excellent comprehensive, pointed, and deep review comments that improved the quality of our manuscript significantly."},{"date_published":"2021-10-04T00:00:00Z","citation":{"ama":"Alistarh D-A, Gelashvili R, Rybicki J. Brief announcement: Fast graphical population protocols. In: 35th International Symposium on Distributed Computing. Vol 209. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2021. doi:10.4230/LIPIcs.DISC.2021.43","ista":"Alistarh D-A, Gelashvili R, Rybicki J. 2021. Brief announcement: Fast graphical population protocols. 35th International Symposium on Distributed Computing. DISC: Distributed Computing , LIPIcs, vol. 209, 43.","ieee":"D.-A. Alistarh, R. Gelashvili, and J. Rybicki, “Brief announcement: Fast graphical population protocols,” in 35th International Symposium on Distributed Computing, Freiburg, Germany, 2021, vol. 209.","apa":"Alistarh, D.-A., Gelashvili, R., & Rybicki, J. (2021). Brief announcement: Fast graphical population protocols. In 35th International Symposium on Distributed Computing (Vol. 209). Freiburg, Germany: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.DISC.2021.43","mla":"Alistarh, Dan-Adrian, et al. “Brief Announcement: Fast Graphical Population Protocols.” 35th International Symposium on Distributed Computing, vol. 209, 43, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021, doi:10.4230/LIPIcs.DISC.2021.43.","short":"D.-A. Alistarh, R. Gelashvili, J. Rybicki, in:, 35th International Symposium on Distributed Computing, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.","chicago":"Alistarh, Dan-Adrian, Rati Gelashvili, and Joel Rybicki. “Brief Announcement: Fast Graphical Population Protocols.” In 35th International Symposium on Distributed Computing, Vol. 209. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. https://doi.org/10.4230/LIPIcs.DISC.2021.43."},"publication":"35th International Symposium on Distributed Computing","article_processing_charge":"No","has_accepted_license":"1","day":"04","scopus_import":"1","oa_version":"Published Version","file":[{"file_name":"2021_LIPIcsDISC_Alistarh.pdf","access_level":"open_access","creator":"cchlebak","content_type":"application/pdf","file_size":534219,"file_id":"10274","relation":"main_file","date_created":"2021-11-12T08:16:44Z","date_updated":"2021-11-12T08:16:44Z","success":1,"checksum":"fd2a690f6856d21247e9aa952b0e2885"}],"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","_id":"10218","intvolume":" 209","title":"Brief announcement: Fast graphical population protocols","status":"public","ddc":["000"],"abstract":[{"lang":"eng","text":"Let G be a graph on n nodes. In the stochastic population protocol model, a collection of n indistinguishable, resource-limited nodes collectively solve tasks via pairwise interactions. In each interaction, two randomly chosen neighbors first read each other’s states, and then update their local states. A rich line of research has established tight upper and lower bounds on the complexity of fundamental tasks, such as majority and leader election, in this model, when G is a clique. Specifically, in the clique, these tasks can be solved fast, i.e., in n polylog n pairwise interactions, with high probability, using at most polylog n states per node. In this work, we consider the more general setting where G is an arbitrary graph, and present a technique for simulating protocols designed for fully-connected networks in any connected regular graph. Our main result is a simulation that is efficient on many interesting graph families: roughly, the simulation overhead is polylogarithmic in the number of nodes, and quadratic in the conductance of the graph. As an example, this implies that, in any regular graph with conductance φ, both leader election and exact majority can be solved in φ^{-2} ⋅ n polylog n pairwise interactions, with high probability, using at most φ^{-2} ⋅ polylog n states per node. This shows that there are fast and space-efficient population protocols for leader election and exact majority on graphs with good expansion properties."}],"type":"conference","alternative_title":["LIPIcs"],"doi":"10.4230/LIPIcs.DISC.2021.43","conference":{"name":"DISC: Distributed Computing ","end_date":"2021-10-08","start_date":"2021-10-04","location":"Freiburg, Germany"},"language":[{"iso":"eng"}],"external_id":{"arxiv":["2102.08808"]},"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"oa":1,"project":[{"name":"Coordination in constrained and natural distributed systems","call_identifier":"H2020","_id":"26A5D39A-B435-11E9-9278-68D0E5697425","grant_number":"840605"}],"quality_controlled":"1","publication_identifier":{"issn":["1868-8969"],"isbn":["9-783-9597-7210-5"]},"month":"10","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"},{"first_name":"Joel","last_name":"Rybicki","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0002-6432-6646","full_name":"Rybicki, Joel"}],"volume":209,"date_created":"2021-11-07T23:01:24Z","date_updated":"2023-02-21T09:24:08Z","year":"2021","acknowledgement":"This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 840605.","publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","department":[{"_id":"DaAl"}],"publication_status":"published","ec_funded":1,"file_date_updated":"2021-11-12T08:16:44Z","article_number":"43"},{"day":"04","has_accepted_license":"1","article_processing_charge":"No","scopus_import":"1","date_published":"2021-10-04T00:00:00Z","publication":"35th International Symposium on Distributed Computing","citation":{"ama":"Alistarh D-A, Gelashvili R, Nadiradze G. Lower bounds for shared-memory leader election under bounded write contention. In: 35th International Symposium on Distributed Computing. Vol 209. Schloss Dagstuhl - Leibniz Zentrum für Informatik; 2021. doi:10.4230/LIPIcs.DISC.2021.4","ista":"Alistarh D-A, Gelashvili R, Nadiradze G. 2021. Lower bounds for shared-memory leader election under bounded write contention. 35th International Symposium on Distributed Computing. DISC: Distributed Computing, LIPIcs, vol. 209, 4.","apa":"Alistarh, D.-A., Gelashvili, R., & Nadiradze, G. (2021). Lower bounds for shared-memory leader election under bounded write contention. In 35th International Symposium on Distributed Computing (Vol. 209). Freiburg, Germany: Schloss Dagstuhl - Leibniz Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.DISC.2021.4","ieee":"D.-A. Alistarh, R. Gelashvili, and G. Nadiradze, “Lower bounds for shared-memory leader election under bounded write contention,” in 35th International Symposium on Distributed Computing, Freiburg, Germany, 2021, vol. 209.","mla":"Alistarh, Dan-Adrian, et al. “Lower Bounds for Shared-Memory Leader Election under Bounded Write Contention.” 35th International Symposium on Distributed Computing, vol. 209, 4, Schloss Dagstuhl - Leibniz Zentrum für Informatik, 2021, doi:10.4230/LIPIcs.DISC.2021.4.","short":"D.-A. Alistarh, R. Gelashvili, G. Nadiradze, in:, 35th International Symposium on Distributed Computing, Schloss Dagstuhl - Leibniz Zentrum für Informatik, 2021.","chicago":"Alistarh, Dan-Adrian, Rati Gelashvili, and Giorgi Nadiradze. “Lower Bounds for Shared-Memory Leader Election under Bounded Write Contention.” In 35th International Symposium on Distributed Computing, Vol. 209. Schloss Dagstuhl - Leibniz Zentrum für Informatik, 2021. https://doi.org/10.4230/LIPIcs.DISC.2021.4."},"abstract":[{"lang":"eng","text":"This paper gives tight logarithmic lower bounds on the solo step complexity of leader election in an asynchronous shared-memory model with single-writer multi-reader (SWMR) registers, for both deterministic and randomized obstruction-free algorithms. The approach extends to lower bounds for deterministic and randomized obstruction-free algorithms using multi-writer registers under bounded write concurrency, showing a trade-off between the solo step complexity of a leader election algorithm, and the worst-case number of stalls incurred by a processor in an execution."}],"alternative_title":["LIPIcs"],"type":"conference","file":[{"file_name":"2021_LIPIcsDISC_Alistarh.pdf","access_level":"open_access","content_type":"application/pdf","file_size":706791,"creator":"cchlebak","relation":"main_file","file_id":"10277","date_updated":"2021-11-12T09:33:26Z","date_created":"2021-11-12T09:33:26Z","checksum":"b4cdc6668c899a601c5e6a96b8ca54d9","success":1}],"oa_version":"Published Version","status":"public","title":"Lower bounds for shared-memory leader election under bounded write contention","ddc":["000"],"intvolume":" 209","_id":"10217","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"10","publication_identifier":{"issn":["1868-8969"],"isbn":["9-783-9597-7210-5"]},"language":[{"iso":"eng"}],"conference":{"location":"Freiburg, Germany","start_date":"2021-10-04","end_date":"2021-10-08","name":"DISC: Distributed Computing"},"doi":"10.4230/LIPIcs.DISC.2021.4","quality_controlled":"1","project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"oa":1,"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"file_date_updated":"2021-11-12T09:33:26Z","ec_funded":1,"article_number":"4","date_created":"2021-11-07T23:01:23Z","date_updated":"2022-08-19T07:23:28Z","volume":209,"author":[{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"},{"full_name":"Nadiradze, Giorgi","last_name":"Nadiradze","first_name":"Giorgi","id":"3279A00C-F248-11E8-B48F-1D18A9856A87"}],"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Schloss Dagstuhl - Leibniz Zentrum für Informatik","year":"2021","acknowledgement":"Dan Alistarh: Supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). Giorgi Nadiradze: Supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). The authors would like to thank the DISC anonymous reviewers for their useful\r\nfeedback and comments."},{"abstract":[{"lang":"eng","text":"Dynamic Connectivity is a fundamental algorithmic graph problem, motivated by a wide range of applications to social and communication networks and used as a building block in various other algorithms, such as the bi-connectivity and the dynamic minimal spanning tree problems. In brief, we wish to maintain the connected components of the graph under dynamic edge insertions and deletions. In the sequential case, the problem has been well-studied from both theoretical and practical perspectives. However, much less is known about efficient concurrent solutions to this problem. This is the gap we address in this paper. We start from one of the classic data structures used to solve this problem, the Euler Tour Tree. Our first contribution is a non-blocking single-writer implementation of it. We leverage this data structure to obtain the first truly concurrent generalization of dynamic connectivity, which preserves the time complexity of its sequential counterpart, but is also scalable in practice. To achieve this, we rely on three main techniques. The first is to ensure that connectivity queries, which usually dominate real-world workloads, are non-blocking. The second non-trivial technique expands the above idea by making all queries that do not change the connectivity structure non-blocking. The third ingredient is applying fine-grained locking for updating the connected components, which allows operations on disjoint components to occur in parallel. We evaluate the resulting algorithm on various workloads, executing on both real and synthetic graphs. The results show the efficiency of each of the proposed optimizations; the most efficient variant improves the performance of a coarse-grained based implementation on realistic scenarios up to 6x on average and up to 30x when connectivity queries dominate."}],"type":"conference","oa_version":"Preprint","date_updated":"2022-03-18T08:45:46Z","date_created":"2022-03-18T08:21:47Z","author":[{"full_name":"Fedorov, Alexander","last_name":"Fedorov","first_name":"Alexander"},{"last_name":"Koval","first_name":"Nikita","full_name":"Koval, Nikita"},{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"}],"publisher":"Association for Computing Machinery","department":[{"_id":"DaAl"}],"title":"A scalable concurrent algorithm for dynamic connectivity","publication_status":"published","status":"public","_id":"10853","year":"2021","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","article_processing_charge":"No","publication_identifier":{"isbn":["9781450380706"]},"day":"01","month":"07","scopus_import":"1","language":[{"iso":"eng"}],"doi":"10.1145/3409964.3461810","date_published":"2021-07-01T00:00:00Z","conference":{"start_date":"2021-07-06","location":"Virtual, Online","end_date":"2021-07-08","name":"SPAA: Symposium on Parallelism in Algorithms and Architectures"},"page":"208-220","quality_controlled":"1","citation":{"ama":"Fedorov A, Koval N, Alistarh D-A. A scalable concurrent algorithm for dynamic connectivity. In: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery; 2021:208-220. doi:10.1145/3409964.3461810","ista":"Fedorov A, Koval N, Alistarh D-A. 2021. A scalable concurrent algorithm for dynamic connectivity. Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures. SPAA: Symposium on Parallelism in Algorithms and Architectures, 208–220.","ieee":"A. Fedorov, N. Koval, and D.-A. Alistarh, “A scalable concurrent algorithm for dynamic connectivity,” in Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, Virtual, Online, 2021, pp. 208–220.","apa":"Fedorov, A., Koval, N., & Alistarh, D.-A. (2021). A scalable concurrent algorithm for dynamic connectivity. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (pp. 208–220). Virtual, Online: Association for Computing Machinery. https://doi.org/10.1145/3409964.3461810","mla":"Fedorov, Alexander, et al. “A Scalable Concurrent Algorithm for Dynamic Connectivity.” Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, Association for Computing Machinery, 2021, pp. 208–20, doi:10.1145/3409964.3461810.","short":"A. Fedorov, N. Koval, D.-A. Alistarh, in:, Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, Association for Computing Machinery, 2021, pp. 208–220.","chicago":"Fedorov, Alexander, Nikita Koval, and Dan-Adrian Alistarh. “A Scalable Concurrent Algorithm for Dynamic Connectivity.” In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 208–20. Association for Computing Machinery, 2021. https://doi.org/10.1145/3409964.3461810."},"external_id":{"arxiv":["2105.08098"]},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/2105.08098"}],"oa":1,"publication":"Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures"},{"day":"18","article_processing_charge":"No","scopus_import":"1","date_published":"2021-05-18T00:00:00Z","publication":"35th AAAI Conference on Artificial Intelligence, AAAI 2021","citation":{"short":"V. Kungurtsev, M. Egan, B. Chatterjee, D.-A. Alistarh, in:, 35th AAAI Conference on Artificial Intelligence, AAAI 2021, AAAI Press, 2021, pp. 8209–8216.","mla":"Kungurtsev, Vyacheslav, et al. “Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees.” 35th AAAI Conference on Artificial Intelligence, AAAI 2021, vol. 35, no. 9B, AAAI Press, 2021, pp. 8209–16.","chicago":"Kungurtsev, Vyacheslav, Malcolm Egan, Bapi Chatterjee, and Dan-Adrian Alistarh. “Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees.” In 35th AAAI Conference on Artificial Intelligence, AAAI 2021, 35:8209–16. AAAI Press, 2021.","ama":"Kungurtsev V, Egan M, Chatterjee B, Alistarh D-A. Asynchronous optimization methods for efficient training of deep neural networks with guarantees. In: 35th AAAI Conference on Artificial Intelligence, AAAI 2021. Vol 35. AAAI Press; 2021:8209-8216.","ieee":"V. Kungurtsev, M. Egan, B. Chatterjee, and D.-A. Alistarh, “Asynchronous optimization methods for efficient training of deep neural networks with guarantees,” in 35th AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual, Online, 2021, vol. 35, no. 9B, pp. 8209–8216.","apa":"Kungurtsev, V., Egan, M., Chatterjee, B., & Alistarh, D.-A. (2021). Asynchronous optimization methods for efficient training of deep neural networks with guarantees. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 35, pp. 8209–8216). Virtual, Online: AAAI Press.","ista":"Kungurtsev V, Egan M, Chatterjee B, Alistarh D-A. 2021. Asynchronous optimization methods for efficient training of deep neural networks with guarantees. 35th AAAI Conference on Artificial Intelligence, AAAI 2021. AAAI: Conference on Artificial Intelligence vol. 35, 8209–8216."},"page":"8209-8216","abstract":[{"lang":"eng","text":"Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees exist beyond cases where closed-form proximal operator solutions are available. As training most popular deep neural networks corresponds to optimizing nonsmooth and nonconvex objectives, there is a pressing need for such convergence guarantees. In this paper, we analyze for the first time the convergence of stochastic asynchronous optimization for this general class of objectives. In particular, we focus on stochastic subgradient methods allowing for block variable partitioning, where the shared model is asynchronously updated by concurrent processes. To this end, we use a probabilistic model which captures key features of real asynchronous scheduling between concurrent processes. Under this model, we establish convergence with probability one to an invariant set for stochastic subgradient methods with momentum. From a practical perspective, one issue with the family of algorithms that we consider is that they are not efficiently supported by machine learning frameworks, which mostly focus on distributed data-parallel strategies. To address this, we propose a new implementation strategy for shared-memory based training of deep neural networks for a partitioned but shared model in single- and multi-GPU settings. Based on this implementation, we achieve on average1.2x speed-up in comparison to state-of-the-art training methods for popular image classification tasks, without compromising accuracy."}],"issue":"9B","type":"conference","oa_version":"Preprint","_id":"11436","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Asynchronous optimization methods for efficient training of deep neural networks with guarantees","status":"public","intvolume":" 35","month":"05","publication_identifier":{"eissn":["2374-3468"],"isbn":["9781713835974"],"issn":["2159-5399"]},"conference":{"end_date":"2021-02-09","location":"Virtual, Online","start_date":"2021-02-02","name":"AAAI: Conference on Artificial Intelligence"},"language":[{"iso":"eng"}],"oa":1,"main_file_link":[{"url":" https://doi.org/10.48550/arXiv.1905.11845","open_access":"1"}],"external_id":{"arxiv":["1905.11845"]},"quality_controlled":"1","project":[{"call_identifier":"H2020","name":"ISTplus - Postdoctoral Fellowships","_id":"260C2330-B435-11E9-9278-68D0E5697425","grant_number":"754411"},{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"ec_funded":1,"author":[{"first_name":"Vyacheslav","last_name":"Kungurtsev","full_name":"Kungurtsev, Vyacheslav"},{"full_name":"Egan, Malcolm","last_name":"Egan","first_name":"Malcolm"},{"last_name":"Chatterjee","first_name":"Bapi","id":"3C41A08A-F248-11E8-B48F-1D18A9856A87","full_name":"Chatterjee, Bapi"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"date_created":"2022-06-05T22:01:52Z","date_updated":"2022-06-07T06:53:36Z","volume":35,"acknowledgement":"Vyacheslav Kungurtsev was supported by the OP VVV project CZ.02.1.01/0.0/0.0/16 019/0000765 “Research Center for Informatics. Bapi Chatterjee was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 754411 (ISTPlus). Dan Alistarh has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","year":"2021","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"AAAI Press"},{"oa_version":"Published Version","title":"Distributed principal component analysis with limited communication","status":"public","intvolume":" 4","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"11452","abstract":[{"lang":"eng","text":"We study efficient distributed algorithms for the fundamental problem of principal component analysis and leading eigenvector computation on the sphere, when the data are randomly distributed among a set of computational nodes. We propose a new quantized variant of Riemannian gradient descent to solve this problem, and prove that the algorithm converges with high probability under a set of necessary spherical-convexity properties. We give bounds on the number of bits transmitted by the algorithm under common initialization schemes, and investigate the dependency on the problem dimension in each case."}],"type":"conference","date_published":"2021-12-01T00:00:00Z","page":"2823-2834","publication":"Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems","citation":{"chicago":"Alimisis, Foivos, Peter Davies, Bart Vandereycken, and Dan-Adrian Alistarh. “Distributed Principal Component Analysis with Limited Communication.” In Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems, 4:2823–34. Neural Information Processing Systems Foundation, 2021.","short":"F. Alimisis, P. Davies, B. Vandereycken, D.-A. Alistarh, in:, Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2021, pp. 2823–2834.","mla":"Alimisis, Foivos, et al. “Distributed Principal Component Analysis with Limited Communication.” Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems, vol. 4, Neural Information Processing Systems Foundation, 2021, pp. 2823–34.","apa":"Alimisis, F., Davies, P., Vandereycken, B., & Alistarh, D.-A. (2021). Distributed principal component analysis with limited communication. In Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems (Vol. 4, pp. 2823–2834). Virtual, Online: Neural Information Processing Systems Foundation.","ieee":"F. Alimisis, P. Davies, B. Vandereycken, and D.-A. Alistarh, “Distributed principal component analysis with limited communication,” in Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems, Virtual, Online, 2021, vol. 4, pp. 2823–2834.","ista":"Alimisis F, Davies P, Vandereycken B, Alistarh D-A. 2021. Distributed principal component analysis with limited communication. Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems vol. 4, 2823–2834.","ama":"Alimisis F, Davies P, Vandereycken B, Alistarh D-A. Distributed principal component analysis with limited communication. In: Advances in Neural Information Processing Systems - 35th Conference on Neural Information Processing Systems. Vol 4. Neural Information Processing Systems Foundation; 2021:2823-2834."},"day":"01","article_processing_charge":"No","scopus_import":"1","date_created":"2022-06-19T22:01:58Z","date_updated":"2022-06-20T08:31:52Z","volume":4,"author":[{"full_name":"Alimisis, Foivos","first_name":"Foivos","last_name":"Alimisis"},{"full_name":"Davies, Peter","first_name":"Peter","last_name":"Davies","id":"11396234-BB50-11E9-B24C-90FCE5697425","orcid":"0000-0002-5646-9524"},{"full_name":"Vandereycken, Bart","last_name":"Vandereycken","first_name":"Bart"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"publication_status":"published","publisher":"Neural Information Processing Systems Foundation","department":[{"_id":"DaAl"}],"year":"2021","acknowledgement":"We would like to thank the anonymous reviewers for helpful comments and suggestions. We also thank Aurelien Lucchi and Antonio Orvieto for fruitful discussions at an early stage of this work. FA is partially supported by the SNSF under research project No. 192363 and conducted part of this work while at IST Austria under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 805223 ScaleML). PD partly conducted this work while at IST Austria and was supported by the European Union’s Horizon 2020 programme under the Marie Skłodowska-Curie grant agreement No. 754411.","ec_funded":1,"language":[{"iso":"eng"}],"conference":{"name":"NeurIPS: Neural Information Processing Systems","end_date":"2021-12-14","location":"Virtual, Online","start_date":"2021-12-06"},"quality_controlled":"1","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"},{"call_identifier":"H2020","name":"ISTplus - Postdoctoral Fellowships","grant_number":"754411","_id":"260C2330-B435-11E9-9278-68D0E5697425"}],"external_id":{"arxiv":["2110.14391"]},"main_file_link":[{"url":"https://proceedings.neurips.cc/paper/2021/file/1680e9fa7b4dd5d62ece800239bb53bd-Paper.pdf","open_access":"1"}],"oa":1,"month":"12","publication_identifier":{"isbn":["9781713845393"],"issn":["1049-5258"]}},{"project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"quality_controlled":"1","external_id":{"arxiv":["2010.08222"]},"oa":1,"main_file_link":[{"url":"https://proceedings.neurips.cc/paper/2021/file/7cfd5df443b4eb0d69886a583b33de4c-Paper.pdf","open_access":"1"}],"language":[{"iso":"eng"}],"conference":{"name":"NeurIPS: Neural Information Processing Systems","end_date":"2021-12-14","start_date":"2021-12-06","location":"Virtual, Online"},"publication_identifier":{"issn":["1049-5258"],"isbn":["9781713845393"]},"month":"12","publisher":"Curran Associates","department":[{"_id":"DaAl"}],"publication_status":"published","acknowledgement":"We gratefully acknowledge funding the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), as well as computational support from Amazon Web Services (AWS) EC2.","year":"2021","volume":34,"date_created":"2022-06-26T22:01:35Z","date_updated":"2022-06-27T07:05:12Z","author":[{"id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","first_name":"Elias","last_name":"Frantar","full_name":"Frantar, Elias"},{"full_name":"Kurtic, Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218","last_name":"Kurtic","first_name":"Eldar"},{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"}],"ec_funded":1,"page":"14873-14886","citation":{"ama":"Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations of second-order information. In: 35th Conference on Neural Information Processing Systems. Vol 34. Curran Associates; 2021:14873-14886.","ieee":"E. Frantar, E. Kurtic, and D.-A. Alistarh, “M-FAC: Efficient matrix-free approximations of second-order information,” in 35th Conference on Neural Information Processing Systems, Virtual, Online, 2021, vol. 34, pp. 14873–14886.","apa":"Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Curran Associates.","ista":"Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations of second-order information. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems vol. 34, 14873–14886.","short":"E. Frantar, E. Kurtic, D.-A. Alistarh, in:, 35th Conference on Neural Information Processing Systems, Curran Associates, 2021, pp. 14873–14886.","mla":"Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” 35th Conference on Neural Information Processing Systems, vol. 34, Curran Associates, 2021, pp. 14873–86.","chicago":"Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Curran Associates, 2021."},"publication":"35th Conference on Neural Information Processing Systems","date_published":"2021-12-06T00:00:00Z","scopus_import":"1","article_processing_charge":"No","day":"06","intvolume":" 34","title":"M-FAC: Efficient matrix-free approximations of second-order information","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"11463","oa_version":"Published Version","type":"conference","abstract":[{"text":"Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational\r\nor storage costs, which limits their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms: the first is tailored towards network compression and can compute the IHVP for dimension d, if the Hessian is given as a sum of m rank-one matrices, using O(dm2) precomputation, O(dm) cost for computing the IHVP, and query cost O(m) for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost O(dm + m2) for computing the IHVP and O(dm + m3) for adding or removing any gradient from the sliding window. These\r\ntwo algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [9] and [17].","lang":"eng"}]},{"month":"12","publication_identifier":{"isbn":["9781713845393"],"issn":["1049-5258"]},"quality_controlled":"1","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"main_file_link":[{"open_access":"1","url":"https://proceedings.neurips.cc/paper/2021/file/3b92d18aa7a6176dd37d372bc2f1eb71-Paper.pdf"}],"oa":1,"external_id":{"arxiv":["2010.08222"]},"language":[{"iso":"eng"}],"conference":{"name":"NeurIPS: Neural Information Processing Systems","end_date":"2021-12-14","start_date":"2021-12-06","location":"Virtual, Online"},"ec_funded":1,"publication_status":"published","publisher":"Curran Associates","department":[{"_id":"DaAl"}],"year":"2021","acknowledgement":"We thank the NeurIPS reviewers for insightful comments that helped us improve the positioning of our results, as well as for pointing out the subsampling approach for complementing the randomised lower bound. We also thank Foivos Alimisis and Peter Davies for useful discussions. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","date_updated":"2022-06-27T06:54:31Z","date_created":"2022-06-26T22:01:35Z","volume":34,"author":[{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Korhonen","first_name":"Janne","id":"C5402D42-15BC-11E9-A202-CA2BE6697425","full_name":"Korhonen, Janne"}],"scopus_import":"1","day":"06","article_processing_charge":"No","page":"7254-7266","publication":"35th Conference on Neural Information Processing Systems","citation":{"chicago":"Alistarh, Dan-Adrian, and Janne Korhonen. “Towards Tight Communication Lower Bounds for Distributed Optimisation.” In 35th Conference on Neural Information Processing Systems, 34:7254–66. Curran Associates, 2021.","short":"D.-A. Alistarh, J. Korhonen, in:, 35th Conference on Neural Information Processing Systems, Curran Associates, 2021, pp. 7254–7266.","mla":"Alistarh, Dan-Adrian, and Janne Korhonen. “Towards Tight Communication Lower Bounds for Distributed Optimisation.” 35th Conference on Neural Information Processing Systems, vol. 34, Curran Associates, 2021, pp. 7254–66.","ieee":"D.-A. Alistarh and J. Korhonen, “Towards tight communication lower bounds for distributed optimisation,” in 35th Conference on Neural Information Processing Systems, Virtual, Online, 2021, vol. 34, pp. 7254–7266.","apa":"Alistarh, D.-A., & Korhonen, J. (2021). Towards tight communication lower bounds for distributed optimisation. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 7254–7266). Virtual, Online: Curran Associates.","ista":"Alistarh D-A, Korhonen J. 2021. Towards tight communication lower bounds for distributed optimisation. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems vol. 34, 7254–7266.","ama":"Alistarh D-A, Korhonen J. Towards tight communication lower bounds for distributed optimisation. In: 35th Conference on Neural Information Processing Systems. Vol 34. Curran Associates; 2021:7254-7266."},"date_published":"2021-12-06T00:00:00Z","type":"conference","abstract":[{"text":"We consider a standard distributed optimisation setting where N machines, each holding a d-dimensional function\r\nfi, aim to jointly minimise the sum of the functions ∑Ni=1fi(x). This problem arises naturally in large-scale distributed optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. We focus on the communication complexity of this problem: our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the N machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that Ω(Ndlogd/Nε) total bits need to be communicated between the machines to find an additive ϵ-approximation to the minimum of ∑Ni=1fi(x). The result holds for both deterministic and randomised algorithms, and, importantly, requires no assumptions on the algorithm structure. The lower bound is tight under certain restrictions on parameter values, and is matched within constant factors for quadratic objectives by a new variant of quantised gradient descent, which we describe and analyse. Our results bring over tools from communication complexity to distributed optimisation, which has potential for further applications.","lang":"eng"}],"title":"Towards tight communication lower bounds for distributed optimisation","status":"public","intvolume":" 34","_id":"11464","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","oa_version":"Published Version"},{"day":"01","month":"05","article_processing_charge":"No","language":[{"iso":"eng"}],"conference":{"name":" ICLR: International Conference on Learning Representations","end_date":"2021-05-07","location":"Virtual","start_date":"2021-05-03"},"date_published":"2021-05-01T00:00:00Z","quality_controlled":"1","project":[{"call_identifier":"H2020","name":"ISTplus - Postdoctoral Fellowships","grant_number":"754411","_id":"260C2330-B435-11E9-9278-68D0E5697425"}],"publication":"9th International Conference on Learning Representations","main_file_link":[{"url":"https://openreview.net/pdf?id=t86MwoUCCNe","open_access":"1"}],"oa":1,"external_id":{"arxiv":["2002.09268"]},"citation":{"ama":"Davies P, Gurunanthan V, Moshrefi N, Ashkboos S, Alistarh D-A. New bounds for distributed mean estimation and variance reduction. In: 9th International Conference on Learning Representations. ; 2021.","apa":"Davies, P., Gurunanthan, V., Moshrefi, N., Ashkboos, S., & Alistarh, D.-A. (2021). New bounds for distributed mean estimation and variance reduction. In 9th International Conference on Learning Representations. Virtual.","ieee":"P. Davies, V. Gurunanthan, N. Moshrefi, S. Ashkboos, and D.-A. Alistarh, “New bounds for distributed mean estimation and variance reduction,” in 9th International Conference on Learning Representations, Virtual, 2021.","ista":"Davies P, Gurunanthan V, Moshrefi N, Ashkboos S, Alistarh D-A. 2021. New bounds for distributed mean estimation and variance reduction. 9th International Conference on Learning Representations. ICLR: International Conference on Learning Representations.","short":"P. Davies, V. Gurunanthan, N. Moshrefi, S. Ashkboos, D.-A. Alistarh, in:, 9th International Conference on Learning Representations, 2021.","mla":"Davies, Peter, et al. “New Bounds for Distributed Mean Estimation and Variance Reduction.” 9th International Conference on Learning Representations, 2021.","chicago":"Davies, Peter, Vijaykrishna Gurunanthan, Niusha Moshrefi, Saleh Ashkboos, and Dan-Adrian Alistarh. “New Bounds for Distributed Mean Estimation and Variance Reduction.” In 9th International Conference on Learning Representations, 2021."},"abstract":[{"lang":"eng","text":"We consider the problem ofdistributed mean estimation (DME), in which n machines are each given a local d-dimensional vector xv∈Rd, and must cooperate to estimate the mean of their inputs μ=1n∑nv=1xv, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants of this problem, especially in the context of distributed variance reduction for stochastic gradients in parallel SGD. Previous work typically assumes an upper bound on the norm of the input vectors, and achieves an error bound in terms of this norm. However, in many real applications, the input vectors are concentrated around the correct output μ, but μ itself has large norm. In such cases, previous output error bounds perform poorly. In this paper, we show that output error bounds need not depend on input norm. We provide a method of quantization which allows distributed mean estimation to be performed with solution quality dependent only on the distance between inputs, not on input norm, and show an analogous result for distributed variance reduction. The technique is based on a new connection with lattice theory. We also provide lower bounds showing that the communication to error trade-off of our algorithms is asymptotically optimal. As the lattices achieving optimal bounds under l2-norm can be computationally impractical, we also present an extension which leverages easy-to-use cubic lattices, and is loose only up to a logarithmic factor ind. We show experimentally that our method yields practical improvements for common applications, relative to prior approaches."}],"ec_funded":1,"type":"conference","date_created":"2021-06-10T19:46:08Z","date_updated":"2023-02-23T14:00:40Z","oa_version":"Published Version","author":[{"full_name":"Davies, Peter","id":"11396234-BB50-11E9-B24C-90FCE5697425","orcid":"0000-0002-5646-9524","first_name":"Peter","last_name":"Davies"},{"full_name":"Gurunanthan, Vijaykrishna","first_name":"Vijaykrishna","last_name":"Gurunanthan"},{"full_name":"Moshrefi, Niusha ","id":"4db776ff-ce15-11eb-96e3-bc2b90b01c16","first_name":"Niusha ","last_name":"Moshrefi"},{"full_name":"Ashkboos, Saleh","id":"0D0A9058-257B-11EA-A937-9341C3D8BC8A","last_name":"Ashkboos","first_name":"Saleh"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"status":"public","title":"New bounds for distributed mean estimation and variance reduction","publication_status":"published","department":[{"_id":"DaAl"}],"year":"2021","_id":"9543","user_id":"D865714E-FA4E-11E9-B85B-F5C5E5697425"},{"article_processing_charge":"No","has_accepted_license":"1","day":"20","date_published":"2021-06-20T00:00:00Z","page":"3-12","citation":{"ama":"Alistarh D-A, Davies P. Collecting coupons is faster with friends. In: Structural Information and Communication Complexity. Vol 12810. Springer Nature; 2021:3-12. doi:10.1007/978-3-030-79527-6_1","apa":"Alistarh, D.-A., & Davies, P. (2021). Collecting coupons is faster with friends. In Structural Information and Communication Complexity (Vol. 12810, pp. 3–12). Wrocław, Poland: Springer Nature. https://doi.org/10.1007/978-3-030-79527-6_1","ieee":"D.-A. Alistarh and P. Davies, “Collecting coupons is faster with friends,” in Structural Information and Communication Complexity, Wrocław, Poland, 2021, vol. 12810, pp. 3–12.","ista":"Alistarh D-A, Davies P. 2021. Collecting coupons is faster with friends. Structural Information and Communication Complexity. SIROCCO: International Colloquium on Structural Information and Communication Complexity, LNCS, vol. 12810, 3–12.","short":"D.-A. Alistarh, P. Davies, in:, Structural Information and Communication Complexity, Springer Nature, 2021, pp. 3–12.","mla":"Alistarh, Dan-Adrian, and Peter Davies. “Collecting Coupons Is Faster with Friends.” Structural Information and Communication Complexity, vol. 12810, Springer Nature, 2021, pp. 3–12, doi:10.1007/978-3-030-79527-6_1.","chicago":"Alistarh, Dan-Adrian, and Peter Davies. “Collecting Coupons Is Faster with Friends.” In Structural Information and Communication Complexity, 12810:3–12. Springer Nature, 2021. https://doi.org/10.1007/978-3-030-79527-6_1."},"publication":"Structural Information and Communication Complexity","abstract":[{"text":"In this note, we introduce a distributed twist on the classic coupon collector problem: a set of m collectors wish to each obtain a set of n coupons; for this, they can each sample coupons uniformly at random, but can also meet in pairwise interactions, during which they can exchange coupons. By doing so, they hope to reduce the number of coupons that must be sampled by each collector in order to obtain a full set. This extension is natural when considering real-world manifestations of the coupon collector phenomenon, and has been remarked upon and studied empirically (Hayes and Hannigan 2006, Ahmad et al. 2014, Delmarcelle 2019).\r\n\r\nWe provide the first theoretical analysis for such a scenario. We find that “coupon collecting with friends” can indeed significantly reduce the number of coupons each collector must sample, and raises interesting connections to the more traditional variants of the problem. While our analysis is in most cases asymptotically tight, there are several open questions raised, regarding finer-grained analysis of both “coupon collecting with friends,” and of a long-studied variant of the original problem in which a collector requires multiple full sets of coupons.","lang":"eng"}],"alternative_title":["LNCS"],"type":"conference","file":[{"relation":"main_file","file_id":"9621","date_updated":"2021-07-01T11:21:40Z","date_created":"2021-07-01T11:21:40Z","checksum":"fe37fb9af3f5016c1084af9d6e7109bd","file_name":"Population_Coupon_Collector.pdf","access_level":"open_access","file_size":319728,"content_type":"application/pdf","creator":"pdavies"}],"oa_version":"Preprint","intvolume":" 12810","title":"Collecting coupons is faster with friends","ddc":["000"],"status":"public","_id":"9620","user_id":"D865714E-FA4E-11E9-B85B-F5C5E5697425","publication_identifier":{"issn":["0302-9743"],"eisbn":["9783030795276"],"isbn":["9783030795269"],"eissn":["1611-3349"]},"month":"06","language":[{"iso":"eng"}],"doi":"10.1007/978-3-030-79527-6_1","conference":{"location":"Wrocław, Poland","start_date":"2021-06-28","end_date":"2021-07-01","name":" SIROCCO: International Colloquium on Structural Information and Communication Complexity"},"project":[{"call_identifier":"H2020","name":"ISTplus - Postdoctoral Fellowships","_id":"260C2330-B435-11E9-9278-68D0E5697425","grant_number":"754411"}],"quality_controlled":"1","oa":1,"ec_funded":1,"file_date_updated":"2021-07-01T11:21:40Z","volume":12810,"date_updated":"2023-02-23T14:02:46Z","date_created":"2021-07-01T11:04:43Z","author":[{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"orcid":"0000-0002-5646-9524","id":"11396234-BB50-11E9-B24C-90FCE5697425","last_name":"Davies","first_name":"Peter","full_name":"Davies, Peter"}],"publisher":"Springer Nature","department":[{"_id":"DaAl"}],"publication_status":"published","year":"2021","acknowledgement":"Peter Davies is supported by the European Union’s Horizon2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 754411."},{"department":[{"_id":"DaAl"}],"publisher":"Springer Nature","publication_status":"published","year":"2021","volume":12810,"date_updated":"2023-02-23T14:09:49Z","date_created":"2021-08-08T22:01:29Z","author":[{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"},{"full_name":"Ellen, Faith","first_name":"Faith","last_name":"Ellen"},{"full_name":"Rybicki, Joel","orcid":"0000-0002-6432-6646","id":"334EFD2E-F248-11E8-B48F-1D18A9856A87","last_name":"Rybicki","first_name":"Joel"}],"publication_identifier":{"issn":["03029743"],"isbn":["9783030795269"],"eissn":["16113349"]},"month":"06","quality_controlled":"1","oa":1,"main_file_link":[{"url":"https://arxiv.org/abs/2103.08949","open_access":"1"}],"external_id":{"arxiv":["2103.08949"]},"language":[{"iso":"eng"}],"doi":"10.1007/978-3-030-79527-6_6","conference":{"start_date":"2021-06-28","location":"Wrocław, Poland","end_date":"2021-07-01","name":"SIROCCO: Structural Information and Communication Complexity"},"alternative_title":["LNCS"],"type":"conference","abstract":[{"lang":"eng","text":"Approximate agreement is one of the few variants of consensus that can be solved in a wait-free manner in asynchronous systems where processes communicate by reading and writing to shared memory. In this work, we consider a natural generalisation of approximate agreement on arbitrary undirected connected graphs. Each process is given a vertex of the graph as input and, if non-faulty, must output a vertex such that\r\nall the outputs are within distance 1 of one another, and\r\n\r\neach output value lies on a shortest path between two input values.\r\n\r\nFrom prior work, it is known that there is no wait-free algorithm among 𝑛≥3 processes for this problem on any cycle of length 𝑐≥4 , by reduction from 2-set agreement (Castañeda et al. 2018).\r\n\r\nIn this work, we investigate the solvability and complexity of this task on general graphs. We give a new, direct proof of the impossibility of approximate agreement on cycles of length 𝑐≥4 , via a generalisation of Sperner’s Lemma to convex polygons. We also extend the reduction from 2-set agreement to a larger class of graphs, showing that approximate agreement on these graphs is unsolvable. On the positive side, we present a wait-free algorithm for a class of graphs that properly contains the class of chordal graphs."}],"intvolume":" 12810","title":"Wait-free approximate agreement on graphs","status":"public","user_id":"6785fbc1-c503-11eb-8a32-93094b40e1cf","_id":"9823","oa_version":"Preprint","scopus_import":"1","article_processing_charge":"No","day":"20","page":"87-105","citation":{"ama":"Alistarh D-A, Ellen F, Rybicki J. Wait-free approximate agreement on graphs. In: Structural Information and Communication Complexity. Vol 12810. Springer Nature; 2021:87-105. doi:10.1007/978-3-030-79527-6_6","ieee":"D.-A. Alistarh, F. Ellen, and J. Rybicki, “Wait-free approximate agreement on graphs,” in Structural Information and Communication Complexity, Wrocław, Poland, 2021, vol. 12810, pp. 87–105.","apa":"Alistarh, D.-A., Ellen, F., & Rybicki, J. (2021). Wait-free approximate agreement on graphs. In Structural Information and Communication Complexity (Vol. 12810, pp. 87–105). Wrocław, Poland: Springer Nature. https://doi.org/10.1007/978-3-030-79527-6_6","ista":"Alistarh D-A, Ellen F, Rybicki J. 2021. Wait-free approximate agreement on graphs. Structural Information and Communication Complexity. SIROCCO: Structural Information and Communication Complexity, LNCS, vol. 12810, 87–105.","short":"D.-A. Alistarh, F. Ellen, J. Rybicki, in:, Structural Information and Communication Complexity, Springer Nature, 2021, pp. 87–105.","mla":"Alistarh, Dan-Adrian, et al. “Wait-Free Approximate Agreement on Graphs.” Structural Information and Communication Complexity, vol. 12810, Springer Nature, 2021, pp. 87–105, doi:10.1007/978-3-030-79527-6_6.","chicago":"Alistarh, Dan-Adrian, Faith Ellen, and Joel Rybicki. “Wait-Free Approximate Agreement on Graphs.” In Structural Information and Communication Complexity, 12810:87–105. Springer Nature, 2021. https://doi.org/10.1007/978-3-030-79527-6_6."},"publication":"Structural Information and Communication Complexity","date_published":"2021-06-20T00:00:00Z"},{"abstract":[{"lang":"eng","text":"The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training. Existing sparse training methods are often empirical and can have lower accuracy relative to the dense baseline. In this paper, we present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse–dense model pairs at the end of the training process. This is useful in practice, where compressed variants may be desirable for deployment in resource-constrained settings without re-doing the entire training flow, and also provides us with insights into the accuracy gap between dense and compressed models. The code is available at: https://github.com/IST-DASLab/ACDC."}],"type":"conference","oa_version":"Published Version","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"11458","status":"public","title":"AC/DC: Alternating Compressed/DeCompressed training of deep neural networks","intvolume":" 34","day":"6","article_processing_charge":"No","scopus_import":"1","date_published":"2021-12-06T00:00:00Z","publication":"35th Conference on Neural Information Processing Systems","citation":{"ista":"Peste E-A, Iofinova EB, Vladu A, Alistarh D-A. 2021. AC/DC: Alternating Compressed/DeCompressed training of deep neural networks. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems vol. 34, 8557–8570.","ieee":"E.-A. Peste, E. B. Iofinova, A. Vladu, and D.-A. Alistarh, “AC/DC: Alternating Compressed/DeCompressed training of deep neural networks,” in 35th Conference on Neural Information Processing Systems, Virtual, Online, 2021, vol. 34, pp. 8557–8570.","apa":"Peste, E.-A., Iofinova, E. B., Vladu, A., & Alistarh, D.-A. (2021). AC/DC: Alternating Compressed/DeCompressed training of deep neural networks. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 8557–8570). Virtual, Online: Curran Associates.","ama":"Peste E-A, Iofinova EB, Vladu A, Alistarh D-A. AC/DC: Alternating Compressed/DeCompressed training of deep neural networks. In: 35th Conference on Neural Information Processing Systems. Vol 34. Curran Associates; 2021:8557-8570.","chicago":"Peste, Elena-Alexandra, Eugenia B Iofinova, Adrian Vladu, and Dan-Adrian Alistarh. “AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks.” In 35th Conference on Neural Information Processing Systems, 34:8557–70. Curran Associates, 2021.","mla":"Peste, Elena-Alexandra, et al. “AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks.” 35th Conference on Neural Information Processing Systems, vol. 34, Curran Associates, 2021, pp. 8557–70.","short":"E.-A. Peste, E.B. Iofinova, A. Vladu, D.-A. Alistarh, in:, 35th Conference on Neural Information Processing Systems, Curran Associates, 2021, pp. 8557–8570."},"page":"8557-8570","ec_funded":1,"author":[{"id":"32D78294-F248-11E8-B48F-1D18A9856A87","first_name":"Elena-Alexandra","last_name":"Peste","full_name":"Peste, Elena-Alexandra"},{"full_name":"Iofinova, Eugenia B","id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117","orcid":"0000-0002-7778-3221","first_name":"Eugenia B","last_name":"Iofinova"},{"last_name":"Vladu","first_name":"Adrian","full_name":"Vladu, Adrian"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"related_material":{"record":[{"relation":"dissertation_contains","status":"public","id":"13074"}]},"date_updated":"2023-06-01T12:54:45Z","date_created":"2022-06-20T12:11:53Z","volume":34,"year":"2021","acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), and a CNRS PEPS grant. This research was supported by the Scientific Service Units (SSU) of IST Austria through resources provided by Scientific Computing (SciComp). We would also like to thank Christoph Lampert for his feedback on an earlier version of this work, as well as for providing hardware for the Transformer-XL experiments.","publication_status":"published","publisher":"Curran Associates","department":[{"_id":"GradSch"},{"_id":"DaAl"}],"month":"12","publication_identifier":{"issn":["1049-5258"],"isbn":["9781713845393"]},"conference":{"end_date":"2021-12-14","start_date":"2021-12-06","location":"Virtual, Online","name":"NeurIPS: Neural Information Processing Systems"},"acknowledged_ssus":[{"_id":"ScienComp"}],"language":[{"iso":"eng"}],"main_file_link":[{"url":"https://proceedings.neurips.cc/paper/2021/file/48000647b315f6f00f913caa757a70b3-Paper.pdf","open_access":"1"}],"external_id":{"arxiv":["2106.12379"]},"oa":1,"quality_controlled":"1","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}]},{"day":"01","article_processing_charge":"No","has_accepted_license":"1","scopus_import":"1","date_published":"2021-07-01T00:00:00Z","publication":"Proceedings of the 38th International Conference on Machine Learning","citation":{"ama":"Alimisis F, Davies P, Alistarh D-A. Communication-efficient distributed optimization with quantized preconditioners. In: Proceedings of the 38th International Conference on Machine Learning. Vol 139. ML Research Press; 2021:196-206.","ista":"Alimisis F, Davies P, Alistarh D-A. 2021. Communication-efficient distributed optimization with quantized preconditioners. Proceedings of the 38th International Conference on Machine Learning. International Conference on Machine Learning vol. 139, 196–206.","ieee":"F. Alimisis, P. Davies, and D.-A. Alistarh, “Communication-efficient distributed optimization with quantized preconditioners,” in Proceedings of the 38th International Conference on Machine Learning, Virtual, 2021, vol. 139, pp. 196–206.","apa":"Alimisis, F., Davies, P., & Alistarh, D.-A. (2021). Communication-efficient distributed optimization with quantized preconditioners. In Proceedings of the 38th International Conference on Machine Learning (Vol. 139, pp. 196–206). Virtual: ML Research Press.","mla":"Alimisis, Foivos, et al. “Communication-Efficient Distributed Optimization with Quantized Preconditioners.” Proceedings of the 38th International Conference on Machine Learning, vol. 139, ML Research Press, 2021, pp. 196–206.","short":"F. Alimisis, P. Davies, D.-A. Alistarh, in:, Proceedings of the 38th International Conference on Machine Learning, ML Research Press, 2021, pp. 196–206.","chicago":"Alimisis, Foivos, Peter Davies, and Dan-Adrian Alistarh. “Communication-Efficient Distributed Optimization with Quantized Preconditioners.” In Proceedings of the 38th International Conference on Machine Learning, 139:196–206. ML Research Press, 2021."},"page":"196-206","abstract":[{"text":"We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among n\r\n different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \\emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton’s method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.","lang":"eng"}],"type":"conference","oa_version":"Published Version","file":[{"creator":"dernst","file_size":429087,"content_type":"application/pdf","file_name":"2021_PMLR_Alimisis.pdf","access_level":"open_access","date_created":"2023-06-19T10:41:05Z","date_updated":"2023-06-19T10:41:05Z","success":1,"checksum":"7ec0d59bac268b49c76bf2e036dedd7a","file_id":"13154","relation":"main_file"}],"_id":"13147","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Communication-efficient distributed optimization with quantized preconditioners","status":"public","ddc":["000"],"intvolume":" 139","month":"07","publication_identifier":{"isbn":["9781713845065"],"eissn":["2640-3498"]},"conference":{"name":"International Conference on Machine Learning","start_date":"2021-07-18","location":"Virtual","end_date":"2021-07-24"},"language":[{"iso":"eng"}],"external_id":{"arxiv":["2102.07214"]},"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"oa":1,"quality_controlled":"1","project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"},{"name":"ISTplus - Postdoctoral Fellowships","call_identifier":"H2020","_id":"260C2330-B435-11E9-9278-68D0E5697425","grant_number":"754411"}],"file_date_updated":"2023-06-19T10:41:05Z","ec_funded":1,"author":[{"full_name":"Alimisis, Foivos","last_name":"Alimisis","first_name":"Foivos"},{"last_name":"Davies","first_name":"Peter","orcid":"0000-0002-5646-9524","id":"11396234-BB50-11E9-B24C-90FCE5697425","full_name":"Davies, Peter"},{"full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian"}],"date_updated":"2023-06-19T10:44:38Z","date_created":"2023-06-18T22:00:48Z","volume":139,"acknowledgement":"The authors would like to thank Janne Korhonen, Aurelien Lucchi, Celestine MendlerDunner and Antonio Orvieto for helpful discussions. FA ¨and DA were supported during this work by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). PD was supported by the European Union’s Horizon 2020 programme under the Marie Skłodowska-Curie grant agreement No. 754411.","year":"2021","publication_status":"published","publisher":"ML Research Press","department":[{"_id":"DaAl"}]},{"issue":"7","abstract":[{"lang":"eng","text":"Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates similar to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput (e.g., 2.1× on 1,024 GPUs for reinforcement learning), and achieves the fastest time-to-solution (e.g., the highest score using the shortest training time for Transformer)."}],"type":"journal_article","oa_version":"Preprint","_id":"8723","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","intvolume":" 32","status":"public","title":"Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging","article_processing_charge":"No","day":"01","scopus_import":"1","date_published":"2021-07-01T00:00:00Z","citation":{"ista":"Li S, Tal Ben-Nun TB-N, Nadiradze G, Girolamo SD, Dryden N, Alistarh D-A, Hoefler T. 2021. Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. 32(7), 9271898.","apa":"Li, S., Tal Ben-Nun, T. B.-N., Nadiradze, G., Girolamo, S. D., Dryden, N., Alistarh, D.-A., & Hoefler, T. (2021). Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. IEEE. https://doi.org/10.1109/TPDS.2020.3040606","ieee":"S. Li et al., “Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7. IEEE, 2021.","ama":"Li S, Tal Ben-Nun TB-N, Nadiradze G, et al. Breaking (global) barriers in parallel stochastic optimization with wait-avoiding group averaging. IEEE Transactions on Parallel and Distributed Systems. 2021;32(7). doi:10.1109/TPDS.2020.3040606","chicago":"Li, Shigang, Tal Ben-Nun Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo, Nikoli Dryden, Dan-Adrian Alistarh, and Torsten Hoefler. “Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.” IEEE Transactions on Parallel and Distributed Systems. IEEE, 2021. https://doi.org/10.1109/TPDS.2020.3040606.","mla":"Li, Shigang, et al. “Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, 9271898, IEEE, 2021, doi:10.1109/TPDS.2020.3040606.","short":"S. Li, T.B.-N. Tal Ben-Nun, G. Nadiradze, S.D. Girolamo, N. Dryden, D.-A. Alistarh, T. Hoefler, IEEE Transactions on Parallel and Distributed Systems 32 (2021)."},"publication":"IEEE Transactions on Parallel and Distributed Systems","article_type":"original","ec_funded":1,"article_number":"9271898","author":[{"full_name":"Li, Shigang","first_name":"Shigang","last_name":"Li"},{"first_name":"Tal Ben-Nun","last_name":"Tal Ben-Nun","full_name":"Tal Ben-Nun, Tal Ben-Nun"},{"id":"3279A00C-F248-11E8-B48F-1D18A9856A87","last_name":"Nadiradze","first_name":"Giorgi","full_name":"Nadiradze, Giorgi"},{"first_name":"Salvatore Di","last_name":"Girolamo","full_name":"Girolamo, Salvatore Di"},{"last_name":"Dryden","first_name":"Nikoli","full_name":"Dryden, Nikoli"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Torsten","last_name":"Hoefler","full_name":"Hoefler, Torsten"}],"volume":32,"date_created":"2020-11-05T15:25:43Z","date_updated":"2023-08-04T11:08:52Z","year":"2021","acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Hori-\r\nzon 2020 programme under Grant DAPP, Grant 678880; EPi-GRAM-HS, Grant 801039; and ERC Starting Grant ScaleML, Grant 805223. The work of Tal Ben-Nun is supported by the Swiss National Science Foundation (Ambizione Project No. 185778). The work of Nikoli Dryden is supported by the ETH Postdoctoral Fellowship. The authors would like to thank the Swiss National Supercomputing Center for providing the computing resources and technical support.","publisher":"IEEE","department":[{"_id":"DaAl"}],"publication_status":"published","publication_identifier":{"issn":["10459219"]},"month":"07","doi":"10.1109/TPDS.2020.3040606","language":[{"iso":"eng"}],"main_file_link":[{"url":"https://arxiv.org/abs/2005.00124","open_access":"1"}],"external_id":{"isi":["000621405200019"],"arxiv":["2005.00124"]},"oa":1,"project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"isi":1,"quality_controlled":"1"},{"conference":{"end_date":"2021-07-30","location":"Virtual, Italy","start_date":"2021-07-26","name":"PODC: Symposium on Principles of Distributed Computing"},"doi":"10.1145/3465084.3467915","date_published":"2021-07-21T00:00:00Z","language":[{"iso":"eng"}],"publication":"Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing","external_id":{"isi":["000744439800005"]},"citation":{"chicago":"Alistarh, Dan-Adrian, Martin Töpfer, and Przemysław Uznański. “Comparison Dynamics in Population Protocols.” In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, 55–65. Association for Computing Machinery, 2021. https://doi.org/10.1145/3465084.3467915.","short":"D.-A. Alistarh, M. Töpfer, P. Uznański, in:, Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2021, pp. 55–65.","mla":"Alistarh, Dan-Adrian, et al. “Comparison Dynamics in Population Protocols.” Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2021, pp. 55–65, doi:10.1145/3465084.3467915.","ieee":"D.-A. Alistarh, M. Töpfer, and P. Uznański, “Comparison dynamics in population protocols,” in Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, Virtual, Italy, 2021, pp. 55–65.","apa":"Alistarh, D.-A., Töpfer, M., & Uznański, P. (2021). Comparison dynamics in population protocols. In Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing (pp. 55–65). Virtual, Italy: Association for Computing Machinery. https://doi.org/10.1145/3465084.3467915","ista":"Alistarh D-A, Töpfer M, Uznański P. 2021. Comparison dynamics in population protocols. Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing. PODC: Symposium on Principles of Distributed Computing, 55–65.","ama":"Alistarh D-A, Töpfer M, Uznański P. Comparison dynamics in population protocols. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing. Association for Computing Machinery; 2021:55-65. doi:10.1145/3465084.3467915"},"isi":1,"quality_controlled":"1","page":"55-65","day":"21","month":"07","article_processing_charge":"No","publication_identifier":{"isbn":["9781450385480"]},"scopus_import":"1","author":[{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"},{"full_name":"Töpfer, Martin","first_name":"Martin","last_name":"Töpfer","id":"4B865388-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Uznański, Przemysław","first_name":"Przemysław","last_name":"Uznański"}],"date_updated":"2023-08-11T10:56:04Z","date_created":"2021-08-22T22:01:20Z","oa_version":"None","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","_id":"9951","acknowledgement":"We would like to thank Rati Gelashvili for very useful discussions, and the PODC anonymous reviewers for their careful reading of our paper, and for their useful remarks. This work is partially supported by the Polish National Science Center (NCN) grant UMO2017/25/B/ST6/02010.","year":"2021","title":"Comparison dynamics in population protocols","status":"public","publication_status":"published","publisher":"Association for Computing Machinery","department":[{"_id":"DaAl"}],"abstract":[{"lang":"eng","text":"There has recently been a surge of interest in the computational and complexity properties of the population model, which assumes n anonymous, computationally-bounded nodes, interacting at random, with the goal of jointly computing global predicates. Significant work has gone towards investigating majority or consensus dynamics in this model: that is, assuming that every node is initially in one of two states X or Y, determine which state had higher initial count.\r\n\r\nIn this paper, we consider a natural generalization of majority/consensus, which we call comparison : in its simplest formulation, we are given two baseline states, X and Y, present in any initial configuration in fixed, but possibly small counts. One of these states has higher count than the other: we will assume |X_0| > C |Y_0| for some constant C > 1. The challenge is to design a protocol by which nodes can quickly and reliably decide on which of the baseline states X_0 and Y_0 has higher initial count. We begin by analyzing a simple and general dynamics solving the above comparison problem, which uses O( log n ) states per node, and converges in O(log n) (parallel) time, with high probability, to a state where the whole population votes on opinions X or Y at rates proportional to the initial concentrations of |X_0| vs. |Y_0|. We then describe how this procedure can be bootstrapped to solve comparison, i.e. have every node in the population reach the \"correct'' decision, with probability 1 - o(1), at the cost of O (log log n) additional states. Further, we prove that this dynamics is self-stabilizing, in the sense that it converges to the correct decision from arbitrary initial states, and leak-robust, in the sense that it can withstand spurious faulty reactions, which are known to occur in practical implementations of population protocols. Our analysis is based on a new martingale concentration result relating the discrete-time evolution of a population protocol to its expected (steady-state) analysis, which should be a useful tool when analyzing opinion dynamics and epidemic dissemination in the population model."}],"type":"conference"},{"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","_id":"10432","status":"public","title":"Elastic consistency: A practical consistency model for distributed stochastic gradient descent","intvolume":" 35","oa_version":"Published Version","type":"conference","abstract":[{"lang":"eng","text":"One key element behind the recent progress of machine learning has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Most of these models are trained employing variants of stochastic gradient descent (SGD) based optimization, but most methods involve some type of consistency relaxation relative to sequential SGD, to mitigate its large communication or synchronization costs at scale. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency, decouples the system-specific aspects of the implementation from the SGD convergence requirements, giving a general way to obtain convergence bounds for a wide variety of distributed SGD methods used in practice. Elastic consistency can be used to re-derive or improve several previous convergence bounds in message-passing and shared-memory settings, but also to analyze new models and distribution schemes. As a direct application, we propose and analyze a new synchronization-avoiding scheduling scheme for distributed SGD, and show that it can be used to efficiently train deep convolutional models for image classification."}],"issue":"10","publication":"Proceedings of the AAAI Conference on Artificial Intelligence","citation":{"ama":"Nadiradze G, Markov I, Chatterjee B, Kungurtsev V, Alistarh D-A. Elastic consistency: A practical consistency model for distributed stochastic gradient descent. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol 35. ; 2021:9037-9045.","ieee":"G. Nadiradze, I. Markov, B. Chatterjee, V. Kungurtsev, and D.-A. Alistarh, “Elastic consistency: A practical consistency model for distributed stochastic gradient descent,” in Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2021, vol. 35, no. 10, pp. 9037–9045.","apa":"Nadiradze, G., Markov, I., Chatterjee, B., Kungurtsev, V., & Alistarh, D.-A. (2021). Elastic consistency: A practical consistency model for distributed stochastic gradient descent. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, pp. 9037–9045). Virtual.","ista":"Nadiradze G, Markov I, Chatterjee B, Kungurtsev V, Alistarh D-A. 2021. Elastic consistency: A practical consistency model for distributed stochastic gradient descent. Proceedings of the AAAI Conference on Artificial Intelligence. AAAI: Association for the Advancement of Artificial Intelligence vol. 35, 9037–9045.","short":"G. Nadiradze, I. Markov, B. Chatterjee, V. Kungurtsev, D.-A. Alistarh, in:, Proceedings of the AAAI Conference on Artificial Intelligence, 2021, pp. 9037–9045.","mla":"Nadiradze, Giorgi, et al. “Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent.” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10, 2021, pp. 9037–45.","chicago":"Nadiradze, Giorgi, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, and Dan-Adrian Alistarh. “Elastic Consistency: A Practical Consistency Model for Distributed Stochastic Gradient Descent.” In Proceedings of the AAAI Conference on Artificial Intelligence, 35:9037–45, 2021."},"page":"9037-9045","date_published":"2021-05-18T00:00:00Z","day":"18","article_processing_charge":"No","year":"2021","acknowledgement":"We would like to thank Christopher De Sa for his feedback on an earlier draft of this paper, as well as the anonymous AAAI reviewers for their useful comments. This project has received\r\nfunding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). Bapi\r\nChatterjee was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 754411 (ISTPlus).","publication_status":"published","department":[{"_id":"DaAl"}],"author":[{"full_name":"Nadiradze, Giorgi","last_name":"Nadiradze","first_name":"Giorgi","orcid":"0000-0001-5634-0731","id":"3279A00C-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Markov, Ilia","last_name":"Markov","first_name":"Ilia","id":"D0CF4148-C985-11E9-8066-0BDEE5697425"},{"first_name":"Bapi","last_name":"Chatterjee","id":"3C41A08A-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0002-2742-4028","full_name":"Chatterjee, Bapi"},{"full_name":"Kungurtsev, Vyacheslav ","last_name":"Kungurtsev","first_name":"Vyacheslav "},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"}],"related_material":{"record":[{"id":"10429","status":"public","relation":"dissertation_contains"}]},"date_updated":"2023-09-07T13:31:39Z","date_created":"2021-12-09T09:21:35Z","volume":35,"ec_funded":1,"oa":1,"external_id":{"arxiv":["2001.05918"]},"main_file_link":[{"open_access":"1","url":"https://ojs.aaai.org/index.php/AAAI/article/view/17092"}],"quality_controlled":"1","project":[{"_id":"260C2330-B435-11E9-9278-68D0E5697425","grant_number":"754411","name":"ISTplus - Postdoctoral Fellowships","call_identifier":"H2020"},{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"conference":{"end_date":"2021-02-09","start_date":"2021-02-02","location":"Virtual","name":"AAAI: Association for the Advancement of Artificial Intelligence"},"language":[{"iso":"eng"}],"month":"05"},{"publication_status":"published","title":"Asynchronous decentralized SGD with quantized and local updates","status":"public","department":[{"_id":"DaAl"}],"publisher":"Neural Information Processing Systems Foundation","acknowledgement":"We gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). PD partly conducted this work while at IST Austria and was supported by the European Union’s Horizon 2020 programme under the Marie Skłodowska-Curie grant agreement No. 754411. SL was funded in part by European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement DAPP, No. 678880, and EPiGRAM-HS, No. 801039).\r\n","_id":"10435","year":"2021","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","date_created":"2021-12-09T10:59:12Z","date_updated":"2023-10-17T11:48:56Z","oa_version":"Published Version","author":[{"full_name":"Nadiradze, Giorgi","orcid":"0000-0001-5634-0731","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","last_name":"Nadiradze","first_name":"Giorgi"},{"first_name":"Amirmojtaba","last_name":"Sabour","id":"bcc145fd-e77f-11ea-ae8b-80d661dbff67","full_name":"Sabour, Amirmojtaba"},{"last_name":"Davies","first_name":"Peter","orcid":"0000-0002-5646-9524","id":"11396234-BB50-11E9-B24C-90FCE5697425","full_name":"Davies, Peter"},{"full_name":"Li, Shigang","last_name":"Li","first_name":"Shigang"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"related_material":{"record":[{"id":"10429","relation":"dissertation_contains","status":"public"}]},"type":"conference","abstract":[{"text":"Decentralized optimization is emerging as a viable alternative for scalable distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in such a relaxed setting, this line of work often assumes \\emph{global} communication rounds, which require additional synchronization. In this paper, we consider decentralized optimization in the simpler, but harder to analyze, \\emph{asynchronous gossip} model, in which communication occurs in discrete, randomly chosen pairings among nodes. Perhaps surprisingly, we show that a variant of SGD called \\emph{SwarmSGD} still converges in this setting, even if \\emph{non-blocking communication}, \\emph{quantization}, and \\emph{local steps} are all applied \\emph{in conjunction}, and even if the node data distributions and underlying graph topology are both \\emph{heterogenous}. Our analysis is based on a new connection with multi-dimensional load-balancing processes. We implement this algorithm and deploy it in a super-computing environment, showing that it can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for certain tasks.","lang":"eng"}],"ec_funded":1,"quality_controlled":"1","project":[{"grant_number":"754411","_id":"260C2330-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"ISTplus - Postdoctoral Fellowships"},{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"publication":"35th Conference on Neural Information Processing Systems","citation":{"short":"G. Nadiradze, A. Sabour, P. Davies, S. Li, D.-A. Alistarh, in:, 35th Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2021.","mla":"Nadiradze, Giorgi, et al. “Asynchronous Decentralized SGD with Quantized and Local Updates.” 35th Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2021.","chicago":"Nadiradze, Giorgi, Amirmojtaba Sabour, Peter Davies, Shigang Li, and Dan-Adrian Alistarh. “Asynchronous Decentralized SGD with Quantized and Local Updates.” In 35th Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation, 2021.","ama":"Nadiradze G, Sabour A, Davies P, Li S, Alistarh D-A. Asynchronous decentralized SGD with quantized and local updates. In: 35th Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation; 2021.","ieee":"G. Nadiradze, A. Sabour, P. Davies, S. Li, and D.-A. Alistarh, “Asynchronous decentralized SGD with quantized and local updates,” in 35th Conference on Neural Information Processing Systems, Sydney, Australia, 2021.","apa":"Nadiradze, G., Sabour, A., Davies, P., Li, S., & Alistarh, D.-A. (2021). Asynchronous decentralized SGD with quantized and local updates. In 35th Conference on Neural Information Processing Systems. Sydney, Australia: Neural Information Processing Systems Foundation.","ista":"Nadiradze G, Sabour A, Davies P, Li S, Alistarh D-A. 2021. Asynchronous decentralized SGD with quantized and local updates. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems."},"oa":1,"external_id":{"arxiv":["1910.12308"]},"main_file_link":[{"open_access":"1","url":"https://papers.nips.cc/paper/2021/hash/362c99307cdc3f2d8b410652386a9dd1-Abstract.html"}],"language":[{"iso":"eng"}],"conference":{"name":"NeurIPS: Neural Information Processing Systems","location":"Sydney, Australia","start_date":"2021-12-06","end_date":"2021-12-14"},"date_published":"2021-12-01T00:00:00Z","month":"12","day":"01","article_processing_charge":"No"},{"has_accepted_license":"1","article_processing_charge":"Yes (via OA deal)","day":"24","scopus_import":"1","date_published":"2021-12-24T00:00:00Z","article_type":"original","citation":{"ieee":"D.-A. Alistarh, G. Nadiradze, and A. Sabour, “Dynamic averaging load balancing on cycles,” Algorithmica. Springer Nature, 2021.","apa":"Alistarh, D.-A., Nadiradze, G., & Sabour, A. (2021). Dynamic averaging load balancing on cycles. Algorithmica. Virtual, Online; Germany: Springer Nature. https://doi.org/10.1007/s00453-021-00905-9","ista":"Alistarh D-A, Nadiradze G, Sabour A. 2021. Dynamic averaging load balancing on cycles. Algorithmica.","ama":"Alistarh D-A, Nadiradze G, Sabour A. Dynamic averaging load balancing on cycles. Algorithmica. 2021. doi:10.1007/s00453-021-00905-9","chicago":"Alistarh, Dan-Adrian, Giorgi Nadiradze, and Amirmojtaba Sabour. “Dynamic Averaging Load Balancing on Cycles.” Algorithmica. Springer Nature, 2021. https://doi.org/10.1007/s00453-021-00905-9.","short":"D.-A. Alistarh, G. Nadiradze, A. Sabour, Algorithmica (2021).","mla":"Alistarh, Dan-Adrian, et al. “Dynamic Averaging Load Balancing on Cycles.” Algorithmica, Springer Nature, 2021, doi:10.1007/s00453-021-00905-9."},"publication":"Algorithmica","abstract":[{"lang":"eng","text":"We consider the following dynamic load-balancing process: given an underlying graph G with n nodes, in each step t≥ 0, one unit of load is created, and placed at a randomly chosen graph node. In the same step, the chosen node picks a random neighbor, and the two nodes balance their loads by averaging them. We are interested in the expected gap between the minimum and maximum loads at nodes as the process progresses, and its dependence on n and on the graph structure. Variants of the above graphical balanced allocation process have been studied previously by Peres, Talwar, and Wieder [Peres et al., 2015], and by Sauerwald and Sun [Sauerwald and Sun, 2015]. These authors left as open the question of characterizing the gap in the case of cycle graphs in the dynamic case, where weights are created during the algorithm’s execution. For this case, the only known upper bound is of 𝒪(n log n), following from a majorization argument due to [Peres et al., 2015], which analyzes a related graphical allocation process. In this paper, we provide an upper bound of 𝒪 (√n log n) on the expected gap of the above process for cycles of length n. We introduce a new potential analysis technique, which enables us to bound the difference in load between k-hop neighbors on the cycle, for any k ≤ n/2. We complement this with a \"gap covering\" argument, which bounds the maximum value of the gap by bounding its value across all possible subsets of a certain structure, and recursively bounding the gaps within each subset. We provide analytical and experimental evidence that our upper bound on the gap is tight up to a logarithmic factor. "}],"type":"journal_article","oa_version":"Published Version","file":[{"access_level":"open_access","file_name":"2021_Algorithmica_Alistarh.pdf","file_size":525950,"content_type":"application/pdf","creator":"cchlebak","relation":"main_file","file_id":"10577","checksum":"21169b25b0c8e17b21e12af22bff9870","success":1,"date_updated":"2021-12-27T10:36:40Z","date_created":"2021-12-27T10:36:40Z"}],"status":"public","title":"Dynamic averaging load balancing on cycles","ddc":["000"],"user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","_id":"8286","publication_identifier":{"issn":["0178-4617"],"eissn":["1432-0541"]},"month":"12","language":[{"iso":"eng"}],"doi":"10.1007/s00453-021-00905-9","conference":{"start_date":"2020-07-08","location":"Virtual, Online; Germany","end_date":"2020-07-11","name":"ICALP: International Colloquium on Automata, Languages, and Programming "},"project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"},{"_id":"B67AFEDC-15C9-11EA-A837-991A96BB2854","name":"IST Austria Open Access Fund"}],"isi":1,"quality_controlled":"1","tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"oa":1,"external_id":{"arxiv":["2003.09297"],"isi":["000734004600001"]},"ec_funded":1,"file_date_updated":"2021-12-27T10:36:40Z","date_created":"2020-08-24T06:24:04Z","date_updated":"2024-03-05T07:35:53Z","related_material":{"link":[{"url":"https://doi.org/10.4230/LIPIcs.ICALP.2020.7","relation":"earlier_version"}],"record":[{"id":"15077","relation":"earlier_version","status":"public"}]},"author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Nadiradze, Giorgi","first_name":"Giorgi","last_name":"Nadiradze","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-5634-0731"},{"id":"bcc145fd-e77f-11ea-ae8b-80d661dbff67","last_name":"Sabour","first_name":"Amirmojtaba","full_name":"Sabour, Amirmojtaba"}],"publisher":"Springer Nature","department":[{"_id":"DaAl"}],"publication_status":"published","acknowledgement":"The authors sincerely thank Thomas Sauerwald and George Giakkoupis for insightful discussions, and Mohsen Ghaffari, Yuval Peres, and Udi Wieder for feedback on earlier versions of this draft. We also thank the ICALP anonymous reviewers for their very useful comments. Open access funding provided by Institute of Science and Technology (IST Austria). Funding was provided by European Research Council (Grant No. PR1042ERC01).","year":"2021"},{"type":"journal_article","abstract":[{"text":"As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.","lang":"eng"}],"issue":"114","_id":"9571","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","ddc":["000"],"title":"NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization","status":"public","intvolume":" 22","oa_version":"Published Version","file":[{"file_id":"9595","relation":"main_file","success":1,"checksum":"6428aa8bcb67768b6949c99b55d5281d","date_created":"2021-06-23T07:09:41Z","date_updated":"2021-06-23T07:09:41Z","access_level":"open_access","file_name":"2021_JournalOfMachineLearningResearch_Ramezani-Kebrya.pdf","creator":"asandaue","file_size":11237154,"content_type":"application/pdf"}],"scopus_import":"1","day":"01","has_accepted_license":"1","article_processing_charge":"No","publication":"Journal of Machine Learning Research","citation":{"chicago":"Ramezani-Kebrya, Ali, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan-Adrian Alistarh, and Daniel M. Roy. “NUQSGD: Provably Communication-Efficient Data-Parallel SGD via Nonuniform Quantization.” Journal of Machine Learning Research. Journal of Machine Learning Research, 2021.","mla":"Ramezani-Kebrya, Ali, et al. “NUQSGD: Provably Communication-Efficient Data-Parallel SGD via Nonuniform Quantization.” Journal of Machine Learning Research, vol. 22, no. 114, Journal of Machine Learning Research, 2021, p. 1−43.","short":"A. Ramezani-Kebrya, F. Faghri, I. Markov, V. Aksenov, D.-A. Alistarh, D.M. Roy, Journal of Machine Learning Research 22 (2021) 1−43.","ista":"Ramezani-Kebrya A, Faghri F, Markov I, Aksenov V, Alistarh D-A, Roy DM. 2021. NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization. Journal of Machine Learning Research. 22(114), 1−43.","ieee":"A. Ramezani-Kebrya, F. Faghri, I. Markov, V. Aksenov, D.-A. Alistarh, and D. M. Roy, “NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization,” Journal of Machine Learning Research, vol. 22, no. 114. Journal of Machine Learning Research, p. 1−43, 2021.","apa":"Ramezani-Kebrya, A., Faghri, F., Markov, I., Aksenov, V., Alistarh, D.-A., & Roy, D. M. (2021). NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization. Journal of Machine Learning Research. Journal of Machine Learning Research.","ama":"Ramezani-Kebrya A, Faghri F, Markov I, Aksenov V, Alistarh D-A, Roy DM. NUQSGD: Provably communication-efficient data-parallel SGD via nonuniform quantization. Journal of Machine Learning Research. 2021;22(114):1−43."},"article_type":"original","page":"1−43","date_published":"2021-04-01T00:00:00Z","file_date_updated":"2021-06-23T07:09:41Z","year":"2021","publication_status":"published","publisher":"Journal of Machine Learning Research","department":[{"_id":"DaAl"}],"author":[{"full_name":"Ramezani-Kebrya, Ali","last_name":"Ramezani-Kebrya","first_name":"Ali"},{"full_name":"Faghri, Fartash","first_name":"Fartash","last_name":"Faghri"},{"last_name":"Markov","first_name":"Ilya","full_name":"Markov, Ilya"},{"full_name":"Aksenov, Vitalii","first_name":"Vitalii","last_name":"Aksenov","id":"2980135A-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian"},{"full_name":"Roy, Daniel M.","last_name":"Roy","first_name":"Daniel M."}],"date_created":"2021-06-20T22:01:33Z","date_updated":"2024-03-06T12:22:07Z","volume":22,"month":"04","publication_identifier":{"issn":["15324435"],"eissn":["15337928"]},"external_id":{"arxiv":["1908.06077"]},"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"main_file_link":[{"open_access":"1","url":"https://www.jmlr.org/papers/v22/20-255.html"}],"oa":1,"quality_controlled":"1","language":[{"iso":"eng"}]},{"scopus_import":"1","day":"01","has_accepted_license":"1","article_processing_charge":"No","page":"15:1-15:16","publication":"23rd International Conference on Principles of Distributed Systems","citation":{"apa":"Alistarh, D.-A., Fedorov, A., & Koval, N. (2020). In search of the fastest concurrent union-find algorithm. In 23rd International Conference on Principles of Distributed Systems (Vol. 153, p. 15:1-15:16). Neuchatal, Switzerland: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.OPODIS.2019.15","ieee":"D.-A. Alistarh, A. Fedorov, and N. Koval, “In search of the fastest concurrent union-find algorithm,” in 23rd International Conference on Principles of Distributed Systems, Neuchatal, Switzerland, 2020, vol. 153, p. 15:1-15:16.","ista":"Alistarh D-A, Fedorov A, Koval N. 2020. In search of the fastest concurrent union-find algorithm. 23rd International Conference on Principles of Distributed Systems. OPODIS: International Conference on Principles of Distributed Systems, LIPIcs, vol. 153, 15:1-15:16.","ama":"Alistarh D-A, Fedorov A, Koval N. In search of the fastest concurrent union-find algorithm. In: 23rd International Conference on Principles of Distributed Systems. Vol 153. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2020:15:1-15:16. doi:10.4230/LIPIcs.OPODIS.2019.15","chicago":"Alistarh, Dan-Adrian, Alexander Fedorov, and Nikita Koval. “In Search of the Fastest Concurrent Union-Find Algorithm.” In 23rd International Conference on Principles of Distributed Systems, 153:15:1-15:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. https://doi.org/10.4230/LIPIcs.OPODIS.2019.15.","short":"D.-A. Alistarh, A. Fedorov, N. Koval, in:, 23rd International Conference on Principles of Distributed Systems, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020, p. 15:1-15:16.","mla":"Alistarh, Dan-Adrian, et al. “In Search of the Fastest Concurrent Union-Find Algorithm.” 23rd International Conference on Principles of Distributed Systems, vol. 153, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020, p. 15:1-15:16, doi:10.4230/LIPIcs.OPODIS.2019.15."},"date_published":"2020-02-01T00:00:00Z","alternative_title":["LIPIcs"],"type":"conference","abstract":[{"lang":"eng","text":"Union-Find (or Disjoint-Set Union) is one of the fundamental problems in computer science; it has been well-studied from both theoretical and practical perspectives in the sequential case. Recently, there has been mounting interest in analyzing this problem in the concurrent scenario, and several asymptotically-efficient algorithms have been proposed. Yet, to date, there is very little known about the practical performance of concurrent Union-Find. This work addresses this gap. We evaluate and analyze the performance of several concurrent Union-Find algorithms and optimization strategies across a wide range of platforms (Intel, AMD, and ARM) and workloads (social, random, and road networks, as well as integrations into more complex algorithms). We first observe that, due to the limited computational cost, the number of induced cache misses is the critical determining factor for the performance of existing algorithms. We introduce new techniques to reduce this cost by storing node priorities implicitly and by using plain reads and writes in a way that does not affect the correctness of the algorithms. Finally, we show that Union-Find implementations are an interesting application for Transactional Memory (TM): one of the fastest algorithm variants we discovered is a sequential one that uses coarse-grained locking with the lock elision optimization to reduce synchronization cost and increase scalability. "}],"title":"In search of the fastest concurrent union-find algorithm","ddc":["000"],"status":"public","intvolume":" 153","_id":"7605","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","file":[{"file_id":"7609","relation":"main_file","date_updated":"2020-07-14T12:48:01Z","date_created":"2020-03-23T09:22:48Z","checksum":"d66f07ecb609d9f02433e39f80a447e9","file_name":"2019_LIPIcs_Alistarh.pdf","access_level":"open_access","creator":"dernst","content_type":"application/pdf","file_size":13074131}],"oa_version":"Published Version","month":"02","publication_identifier":{"issn":["18688969"],"isbn":["9783959771337"]},"quality_controlled":"1","external_id":{"arxiv":["1911.06347"]},"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/3.0/legalcode","name":"Creative Commons Attribution 3.0 Unported (CC BY 3.0)","short":"CC BY (3.0)","image":"/images/cc_by.png"},"oa":1,"language":[{"iso":"eng"}],"conference":{"name":"OPODIS: International Conference on Principles of Distributed Systems","start_date":"2019-12-17","location":"Neuchatal, Switzerland","end_date":"2019-12-19"},"doi":"10.4230/LIPIcs.OPODIS.2019.15","license":"https://creativecommons.org/licenses/by/3.0/","file_date_updated":"2020-07-14T12:48:01Z","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","year":"2020","date_updated":"2023-02-23T13:12:12Z","date_created":"2020-03-22T23:00:46Z","volume":153,"author":[{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Alexander","last_name":"Fedorov","full_name":"Fedorov, Alexander"},{"full_name":"Koval, Nikita","first_name":"Nikita","last_name":"Koval","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"}]},{"publication_identifier":{"issn":["1868-8969"],"isbn":["9783959771689"]},"month":"08","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"quality_controlled":"1","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/3.0/legalcode","name":"Creative Commons Attribution 3.0 Unported (CC BY 3.0)","short":"CC BY (3.0)","image":"/images/cc_by.png"},"external_id":{"arxiv":["2008.01009"]},"oa":1,"language":[{"iso":"eng"}],"doi":"10.4230/LIPIcs.DISC.2020.3","conference":{"name":"DISC: Symposium on Distributed Computing","end_date":"2020-10-16","start_date":"2020-10-12","location":"Freiburg, Germany"},"ec_funded":1,"file_date_updated":"2021-03-11T12:33:35Z","publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","department":[{"_id":"DaAl"}],"publication_status":"published","year":"2020","acknowledgement":"Vitaly Aksenov: Government of Russian Federation (Grant 08-08).\r\nDan Alistarh: ERC Starting Grant 805223 ScaleML.","volume":179,"date_created":"2020-11-05T15:26:17Z","date_updated":"2023-02-23T13:41:40Z","author":[{"first_name":"Vitaly","last_name":"Aksenov","full_name":"Aksenov, Vitaly"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Alexandra","last_name":"Drozdova","full_name":"Drozdova, Alexandra"},{"first_name":"Amirkeivan","last_name":"Mohtashami","full_name":"Mohtashami, Amirkeivan"}],"series_title":"LIPIcs","has_accepted_license":"1","article_processing_charge":"No","day":"03","page":"3:1-3:18","citation":{"ista":"Aksenov V, Alistarh D-A, Drozdova A, Mohtashami A. 2020. The splay-list: A distribution-adaptive concurrent skip-list. 34th International Symposium on Distributed Computing. DISC: Symposium on Distributed ComputingLIPIcs vol. 179, 3:1-3:18.","ieee":"V. Aksenov, D.-A. Alistarh, A. Drozdova, and A. Mohtashami, “The splay-list: A distribution-adaptive concurrent skip-list,” in 34th International Symposium on Distributed Computing, Freiburg, Germany, 2020, vol. 179, p. 3:1-3:18.","apa":"Aksenov, V., Alistarh, D.-A., Drozdova, A., & Mohtashami, A. (2020). The splay-list: A distribution-adaptive concurrent skip-list. In 34th International Symposium on Distributed Computing (Vol. 179, p. 3:1-3:18). Freiburg, Germany: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.DISC.2020.3","ama":"Aksenov V, Alistarh D-A, Drozdova A, Mohtashami A. The splay-list: A distribution-adaptive concurrent skip-list. In: 34th International Symposium on Distributed Computing. Vol 179. LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2020:3:1-3:18. doi:10.4230/LIPIcs.DISC.2020.3","chicago":"Aksenov, Vitaly, Dan-Adrian Alistarh, Alexandra Drozdova, and Amirkeivan Mohtashami. “The Splay-List: A Distribution-Adaptive Concurrent Skip-List.” In 34th International Symposium on Distributed Computing, 179:3:1-3:18. LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. https://doi.org/10.4230/LIPIcs.DISC.2020.3.","mla":"Aksenov, Vitaly, et al. “The Splay-List: A Distribution-Adaptive Concurrent Skip-List.” 34th International Symposium on Distributed Computing, vol. 179, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020, p. 3:1-3:18, doi:10.4230/LIPIcs.DISC.2020.3.","short":"V. Aksenov, D.-A. Alistarh, A. Drozdova, A. Mohtashami, in:, 34th International Symposium on Distributed Computing, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020, p. 3:1-3:18."},"publication":"34th International Symposium on Distributed Computing","date_published":"2020-08-03T00:00:00Z","type":"conference","abstract":[{"text":"The design and implementation of efficient concurrent data structures have\r\nseen significant attention. However, most of this work has focused on\r\nconcurrent data structures providing good \\emph{worst-case} guarantees. In real\r\nworkloads, objects are often accessed at different rates, since access\r\ndistributions may be non-uniform. Efficient distribution-adaptive data\r\nstructures are known in the sequential case, e.g. the splay-trees; however,\r\nthey often are hard to translate efficiently in the concurrent case.\r\n In this paper, we investigate distribution-adaptive concurrent data\r\nstructures and propose a new design called the splay-list. At a high level, the\r\nsplay-list is similar to a standard skip-list, with the key distinction that\r\nthe height of each element adapts dynamically to its access rate: popular\r\nelements ``move up,'' whereas rarely-accessed elements decrease in height. We\r\nshow that the splay-list provides order-optimal amortized complexity bounds for\r\na subset of operations while being amenable to efficient concurrent\r\nimplementation. Experimental results show that the splay-list can leverage\r\ndistribution-adaptivity to improve on the performance of classic concurrent\r\ndesigns, and can outperform the only previously-known distribution-adaptive\r\ndesign in certain settings.","lang":"eng"}],"intvolume":" 179","status":"public","ddc":["000"],"title":"The splay-list: A distribution-adaptive concurrent skip-list","_id":"8725","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","oa_version":"Published Version","file":[{"success":1,"checksum":"a626a9c47df52b6f6d97edd910dae4ba","date_created":"2021-03-11T12:33:35Z","date_updated":"2021-03-11T12:33:35Z","file_id":"9237","relation":"main_file","creator":"dernst","content_type":"application/pdf","file_size":740358,"access_level":"open_access","file_name":"2020_LIPIcs_Aksenov.pdf"}]},{"month":"12","publication_identifier":{"issn":["10495258"],"isbn":["9781713829546"]},"language":[{"iso":"eng"}],"conference":{"end_date":"2020-12-12","start_date":"2020-12-06","location":"Vancouver, Canada","name":"NeurIPS: Conference on Neural Information Processing Systems"},"quality_controlled":"1","project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"main_file_link":[{"open_access":"1","url":"https://proceedings.neurips.cc/paper/2020/hash/d1ff1ec86b62cd5f3903ff19c3a326b2-Abstract.html"}],"oa":1,"external_id":{"arxiv":["2004.14340"]},"ec_funded":1,"date_created":"2021-07-04T22:01:26Z","date_updated":"2023-02-23T14:03:06Z","volume":33,"author":[{"id":"DD138E24-D89D-11E9-9DC0-DEF6E5697425","first_name":"Sidak Pal","last_name":"Singh","full_name":"Singh, Sidak Pal"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"publication_status":"published","publisher":"Curran Associates","department":[{"_id":"DaAl"},{"_id":"ToHe"}],"year":"2020","acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). Also, we would like to thank Alexander Shevchenko, Alexandra Peste, and other members of the group for fruitful discussions.","day":"06","article_processing_charge":"No","scopus_import":"1","date_published":"2020-12-06T00:00:00Z","page":"18098-18109","publication":"Advances in Neural Information Processing Systems","citation":{"mla":"Singh, Sidak Pal, and Dan-Adrian Alistarh. “WoodFisher: Efficient Second-Order Approximation for Neural Network Compression.” Advances in Neural Information Processing Systems, vol. 33, Curran Associates, 2020, pp. 18098–109.","short":"S.P. Singh, D.-A. Alistarh, in:, Advances in Neural Information Processing Systems, Curran Associates, 2020, pp. 18098–18109.","chicago":"Singh, Sidak Pal, and Dan-Adrian Alistarh. “WoodFisher: Efficient Second-Order Approximation for Neural Network Compression.” In Advances in Neural Information Processing Systems, 33:18098–109. Curran Associates, 2020.","ama":"Singh SP, Alistarh D-A. WoodFisher: Efficient second-order approximation for neural network compression. In: Advances in Neural Information Processing Systems. Vol 33. Curran Associates; 2020:18098-18109.","ista":"Singh SP, Alistarh D-A. 2020. WoodFisher: Efficient second-order approximation for neural network compression. Advances in Neural Information Processing Systems. NeurIPS: Conference on Neural Information Processing Systems vol. 33, 18098–18109.","ieee":"S. P. Singh and D.-A. Alistarh, “WoodFisher: Efficient second-order approximation for neural network compression,” in Advances in Neural Information Processing Systems, Vancouver, Canada, 2020, vol. 33, pp. 18098–18109.","apa":"Singh, S. P., & Alistarh, D.-A. (2020). WoodFisher: Efficient second-order approximation for neural network compression. In Advances in Neural Information Processing Systems (Vol. 33, pp. 18098–18109). Vancouver, Canada: Curran Associates."},"abstract":[{"text":"Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep\r\nneural networks; however, relatively little is known about the quality of existing approximations in this context. Our work examines this question, identifies issues with existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for oneshot pruning. Further, even when iterative, gradual pruning is allowed, our method results in a gain in test accuracy over the state-of-the-art approaches, for standard image classification datasets such as ImageNet ILSVRC. We examine how our method can be extended to take into account first-order information, as well as\r\nillustrate its ability to automatically set layer-wise pruning thresholds and perform compression in the limited-data regime. The code is available at the following link, https://github.com/IST-DASLab/WoodFisher.","lang":"eng"}],"type":"conference","oa_version":"Published Version","status":"public","title":"WoodFisher: Efficient second-order approximation for neural network compression","intvolume":" 33","user_id":"6785fbc1-c503-11eb-8a32-93094b40e1cf","_id":"9632"},{"abstract":[{"lang":"eng","text":"The ability to leverage large-scale hardware parallelism has been one of the key enablers of the accelerated recent progress in machine learning. Consequently, there has been considerable effort invested into developing efficient parallel variants of classic machine learning algorithms. However, despite the wealth of knowledge on parallelization, some classic machine learning algorithms often prove hard to parallelize efficiently while maintaining convergence. In this paper, we focus on efficient parallel algorithms for the key machine learning task of inference on graphical models, in particular on the fundamental belief propagation algorithm. We address the challenge of efficiently parallelizing this classic paradigm by showing how to leverage scalable relaxed schedulers in this context. We present an extensive empirical study, showing that our approach outperforms previous parallel belief propagation implementations both in terms of scalability and in terms of wall-clock convergence time, on a range of practical applications."}],"type":"conference","oa_version":"Published Version","_id":"9631","user_id":"6785fbc1-c503-11eb-8a32-93094b40e1cf","status":"public","title":"Scalable belief propagation via relaxed scheduling","intvolume":" 33","day":"06","article_processing_charge":"No","scopus_import":"1","date_published":"2020-12-06T00:00:00Z","publication":"Advances in Neural Information Processing Systems","citation":{"chicago":"Aksenov, Vitaly, Dan-Adrian Alistarh, and Janne Korhonen. “Scalable Belief Propagation via Relaxed Scheduling.” In Advances in Neural Information Processing Systems, 33:22361–72. Curran Associates, 2020.","mla":"Aksenov, Vitaly, et al. “Scalable Belief Propagation via Relaxed Scheduling.” Advances in Neural Information Processing Systems, vol. 33, Curran Associates, 2020, pp. 22361–72.","short":"V. Aksenov, D.-A. Alistarh, J. Korhonen, in:, Advances in Neural Information Processing Systems, Curran Associates, 2020, pp. 22361–22372.","ista":"Aksenov V, Alistarh D-A, Korhonen J. 2020. Scalable belief propagation via relaxed scheduling. Advances in Neural Information Processing Systems. NeurIPS: Conference on Neural Information Processing Systems vol. 33, 22361–22372.","ieee":"V. Aksenov, D.-A. Alistarh, and J. Korhonen, “Scalable belief propagation via relaxed scheduling,” in Advances in Neural Information Processing Systems, Vancouver, Canada, 2020, vol. 33, pp. 22361–22372.","apa":"Aksenov, V., Alistarh, D.-A., & Korhonen, J. (2020). Scalable belief propagation via relaxed scheduling. In Advances in Neural Information Processing Systems (Vol. 33, pp. 22361–22372). Vancouver, Canada: Curran Associates.","ama":"Aksenov V, Alistarh D-A, Korhonen J. Scalable belief propagation via relaxed scheduling. In: Advances in Neural Information Processing Systems. Vol 33. Curran Associates; 2020:22361-22372."},"page":"22361-22372","ec_funded":1,"author":[{"last_name":"Aksenov","first_name":"Vitaly","full_name":"Aksenov, Vitaly"},{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"},{"last_name":"Korhonen","first_name":"Janne","id":"C5402D42-15BC-11E9-A202-CA2BE6697425","full_name":"Korhonen, Janne"}],"date_created":"2021-07-04T22:01:26Z","date_updated":"2023-02-23T14:03:03Z","volume":33,"acknowledgement":"We thank Marco Mondelli for discussions related to LDPC decoding, and Giorgi Nadiradze for discussions on analysis of relaxed schedulers. This project has received funding from the European Research Council (ERC) under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML).","year":"2020","publication_status":"published","publisher":"Curran Associates","department":[{"_id":"DaAl"}],"month":"12","publication_identifier":{"isbn":["9781713829546"],"issn":["10495258"]},"conference":{"name":"NeurIPS: Conference on Neural Information Processing Systems","end_date":"2020-12-12","location":"Vancouver, Canada","start_date":"2020-12-06"},"language":[{"iso":"eng"}],"oa":1,"main_file_link":[{"url":"https://proceedings.neurips.cc/paper/2020/hash/fdb2c3bab9d0701c4a050a4d8d782c7f-Abstract.html","open_access":"1"}],"external_id":{"arxiv":["2002.11505"]},"quality_controlled":"1","project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}]},{"publication_identifier":{"issn":["2640-3498"]},"month":"07","oa":1,"quality_controlled":"1","conference":{"start_date":"2020-07-12","location":"Online","end_date":"2020-07-18","name":"ICML: International Conference on Machine Learning"},"language":[{"iso":"eng"}],"file_date_updated":"2021-05-25T09:51:36Z","year":"2020","department":[{"_id":"DaAl"}],"author":[{"full_name":"Kurtz, Mark","last_name":"Kurtz","first_name":"Mark"},{"last_name":"Kopinsky","first_name":"Justin","full_name":"Kopinsky, Justin"},{"first_name":"Rati","last_name":"Gelashvili","full_name":"Gelashvili, Rati"},{"first_name":"Alexander","last_name":"Matveev","full_name":"Matveev, Alexander"},{"full_name":"Carr, John","last_name":"Carr","first_name":"John"},{"last_name":"Goin","first_name":"Michael","full_name":"Goin, Michael"},{"full_name":"Leiserson, William","first_name":"William","last_name":"Leiserson"},{"full_name":"Moore, Sage","first_name":"Sage","last_name":"Moore"},{"first_name":"Bill","last_name":"Nell","full_name":"Nell, Bill"},{"first_name":"Nir","last_name":"Shavit","full_name":"Shavit, Nir"},{"full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian"}],"volume":119,"date_created":"2021-05-23T22:01:45Z","date_updated":"2023-02-23T13:57:24Z","scopus_import":"1","has_accepted_license":"1","article_processing_charge":"No","day":"12","citation":{"apa":"Kurtz, M., Kopinsky, J., Gelashvili, R., Matveev, A., Carr, J., Goin, M., … Alistarh, D.-A. (2020). Inducing and exploiting activation sparsity for fast neural network inference. In 37th International Conference on Machine Learning, ICML 2020 (Vol. 119, pp. 5533–5543). Online.","ieee":"M. Kurtz et al., “Inducing and exploiting activation sparsity for fast neural network inference,” in 37th International Conference on Machine Learning, ICML 2020, Online, 2020, vol. 119, pp. 5533–5543.","ista":"Kurtz M, Kopinsky J, Gelashvili R, Matveev A, Carr J, Goin M, Leiserson W, Moore S, Nell B, Shavit N, Alistarh D-A. 2020. Inducing and exploiting activation sparsity for fast neural network inference. 37th International Conference on Machine Learning, ICML 2020. ICML: International Conference on Machine Learning vol. 119, 5533–5543.","ama":"Kurtz M, Kopinsky J, Gelashvili R, et al. Inducing and exploiting activation sparsity for fast neural network inference. In: 37th International Conference on Machine Learning, ICML 2020. Vol 119. ; 2020:5533-5543.","chicago":"Kurtz, Mark, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” In 37th International Conference on Machine Learning, ICML 2020, 119:5533–43, 2020.","short":"M. Kurtz, J. Kopinsky, R. Gelashvili, A. Matveev, J. Carr, M. Goin, W. Leiserson, S. Moore, B. Nell, N. Shavit, D.-A. Alistarh, in:, 37th International Conference on Machine Learning, ICML 2020, 2020, pp. 5533–5543.","mla":"Kurtz, Mark, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” 37th International Conference on Machine Learning, ICML 2020, vol. 119, 2020, pp. 5533–43."},"publication":"37th International Conference on Machine Learning, ICML 2020","page":"5533-5543","date_published":"2020-07-12T00:00:00Z","type":"conference","abstract":[{"text":"Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost. ","lang":"eng"}],"_id":"9415","user_id":"3E5EF7F0-F248-11E8-B48F-1D18A9856A87","intvolume":" 119","ddc":["000"],"title":"Inducing and exploiting activation sparsity for fast neural network inference","status":"public","file":[{"creator":"kschuh","file_size":741899,"content_type":"application/pdf","access_level":"open_access","file_name":"2020_PMLR_Kurtz.pdf","success":1,"checksum":"2aaaa7d7226e49161311d91627cf783b","date_created":"2021-05-25T09:51:36Z","date_updated":"2021-05-25T09:51:36Z","file_id":"9421","relation":"main_file"}],"oa_version":"Published Version"},{"doi":"10.1109/TSP.2020.3010355","language":[{"iso":"eng"}],"oa":1,"external_id":{"arxiv":["1802.04907"],"isi":["000562044500001"]},"main_file_link":[{"url":"https://arxiv.org/abs/1802.04907","open_access":"1"}],"isi":1,"quality_controlled":"1","month":"07","publication_identifier":{"eissn":["19410476"],"issn":["1053587X"]},"author":[{"full_name":"Gurel, Nezihe Merve","first_name":"Nezihe Merve","last_name":"Gurel"},{"full_name":"Kara, Kaan","last_name":"Kara","first_name":"Kaan"},{"last_name":"Stojanov","first_name":"Alen","full_name":"Stojanov, Alen"},{"first_name":"Tyler","last_name":"Smith","full_name":"Smith, Tyler"},{"full_name":"Lemmin, Thomas","last_name":"Lemmin","first_name":"Thomas"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Markus","last_name":"Puschel","full_name":"Puschel, Markus"},{"first_name":"Ce","last_name":"Zhang","full_name":"Zhang, Ce"}],"date_created":"2020-08-16T22:00:56Z","date_updated":"2023-08-22T08:40:08Z","volume":68,"acknowledgement":"The authors would like to thank Dr. Michiel Brentjens at the Netherlands Institute for Radio Astronomy (ASTRON) for providing radio interferometer data and Dr. Josip Marjanovic and Dr. Franciszek Hennel at the Magnetic Resonance Technology of ETH Zurich for providing their insights on the experiments. CZ and the DS3Lab gratefully acknowledge the support from the Swiss Data Science Center, Alibaba, Google Focused Research Awards, Huawei, MeteoSwiss, Oracle Labs, Swisscom, Zurich Insurance, Chinese Scholarship Council, and the Department of Computer Science at ETH Zurich.","year":"2020","publication_status":"published","publisher":"IEEE","department":[{"_id":"DaAl"}],"date_published":"2020-07-20T00:00:00Z","publication":"IEEE Transactions on Signal Processing","citation":{"short":"N.M. Gurel, K. Kara, A. Stojanov, T. Smith, T. Lemmin, D.-A. Alistarh, M. Puschel, C. Zhang, IEEE Transactions on Signal Processing 68 (2020) 4268–4282.","mla":"Gurel, Nezihe Merve, et al. “Compressive Sensing Using Iterative Hard Thresholding with Low Precision Data Representation: Theory and Applications.” IEEE Transactions on Signal Processing, vol. 68, IEEE, 2020, pp. 4268–82, doi:10.1109/TSP.2020.3010355.","chicago":"Gurel, Nezihe Merve, Kaan Kara, Alen Stojanov, Tyler Smith, Thomas Lemmin, Dan-Adrian Alistarh, Markus Puschel, and Ce Zhang. “Compressive Sensing Using Iterative Hard Thresholding with Low Precision Data Representation: Theory and Applications.” IEEE Transactions on Signal Processing. IEEE, 2020. https://doi.org/10.1109/TSP.2020.3010355.","ama":"Gurel NM, Kara K, Stojanov A, et al. Compressive sensing using iterative hard thresholding with low precision data representation: Theory and applications. IEEE Transactions on Signal Processing. 2020;68:4268-4282. doi:10.1109/TSP.2020.3010355","apa":"Gurel, N. M., Kara, K., Stojanov, A., Smith, T., Lemmin, T., Alistarh, D.-A., … Zhang, C. (2020). Compressive sensing using iterative hard thresholding with low precision data representation: Theory and applications. IEEE Transactions on Signal Processing. IEEE. https://doi.org/10.1109/TSP.2020.3010355","ieee":"N. M. Gurel et al., “Compressive sensing using iterative hard thresholding with low precision data representation: Theory and applications,” IEEE Transactions on Signal Processing, vol. 68. IEEE, pp. 4268–4282, 2020.","ista":"Gurel NM, Kara K, Stojanov A, Smith T, Lemmin T, Alistarh D-A, Puschel M, Zhang C. 2020. Compressive sensing using iterative hard thresholding with low precision data representation: Theory and applications. IEEE Transactions on Signal Processing. 68, 4268–4282."},"article_type":"original","page":"4268-4282","day":"20","article_processing_charge":"No","scopus_import":"1","oa_version":"Preprint","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","_id":"8268","status":"public","title":"Compressive sensing using iterative hard thresholding with low precision data representation: Theory and applications","intvolume":" 68","abstract":[{"text":"Modern scientific instruments produce vast amounts of data, which can overwhelm the processing ability of computer systems. Lossy compression of data is an intriguing solution, but comes with its own drawbacks, such as potential signal loss, and the need for careful optimization of the compression ratio. In this work, we focus on a setting where this problem is especially acute: compressive sensing frameworks for interferometry and medical imaging. We ask the following question: can the precision of the data representation be lowered for all inputs, with recovery guarantees and practical performance Our first contribution is a theoretical analysis of the normalized Iterative Hard Thresholding (IHT) algorithm when all input data, meaning both the measurement matrix and the observation vector are quantized aggressively. We present a variant of low precision normalized IHT that, under mild conditions, can still provide recovery guarantees. The second contribution is the application of our quantization framework to radio astronomy and magnetic resonance imaging. We show that lowering the precision of the data can significantly accelerate image recovery. We evaluate our approach on telescope data and samples of brain images using CPU and FPGA implementations achieving up to a 9x speedup with negligible loss of recovery quality.","lang":"eng"}],"type":"journal_article"},{"date_published":"2020-02-01T00:00:00Z","page":"45-61","publication":"Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","citation":{"ieee":"S. Li, T. B.-N. Tal Ben-Nun, S. D. Girolamo, D.-A. Alistarh, and T. Hoefler, “Taming unbalanced training workloads in deep learning with partial collective operations,” in Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, United States, 2020, pp. 45–61.","apa":"Li, S., Tal Ben-Nun, T. B.-N., Girolamo, S. D., Alistarh, D.-A., & Hoefler, T. (2020). Taming unbalanced training workloads in deep learning with partial collective operations. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 45–61). San Diego, CA, United States: Association for Computing Machinery. https://doi.org/10.1145/3332466.3374528","ista":"Li S, Tal Ben-Nun TB-N, Girolamo SD, Alistarh D-A, Hoefler T. 2020. Taming unbalanced training workloads in deep learning with partial collective operations. Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP: Sympopsium on Principles and Practice of Parallel Programming, 45–61.","ama":"Li S, Tal Ben-Nun TB-N, Girolamo SD, Alistarh D-A, Hoefler T. Taming unbalanced training workloads in deep learning with partial collective operations. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2020:45-61. doi:10.1145/3332466.3374528","chicago":"Li, Shigang, Tal Ben-Nun Tal Ben-Nun, Salvatore Di Girolamo, Dan-Adrian Alistarh, and Torsten Hoefler. “Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations.” In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 45–61. Association for Computing Machinery, 2020. https://doi.org/10.1145/3332466.3374528.","short":"S. Li, T.B.-N. Tal Ben-Nun, S.D. Girolamo, D.-A. Alistarh, T. Hoefler, in:, Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2020, pp. 45–61.","mla":"Li, Shigang, et al. “Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations.” Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2020, pp. 45–61, doi:10.1145/3332466.3374528."},"day":"01","article_processing_charge":"No","oa_version":"Preprint","title":"Taming unbalanced training workloads in deep learning with partial collective operations","status":"public","_id":"8722","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","abstract":[{"lang":"eng","text":"Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself. Traditional synchronous Stochastic Gradient Descent (SGD)\r\nachieves good accuracy for a wide variety of tasks, but relies on global synchronization to accumulate the gradients at every training step. In this paper, we propose eager-SGD, which relaxes the global synchronization for\r\ndecentralized accumulation. To implement eager-SGD, we propose to use two partial collectives: solo and majority. With solo allreduce, the faster processes contribute their gradients eagerly without waiting for the slower processes, whereas with majority allreduce, at least half of the participants must contribute gradients before continuing, all without using a central parameter server. We theoretically prove the convergence of the algorithms and describe the partial collectives in detail. Experimental results on load-imbalanced environments (CIFAR-10, ImageNet, and UCF101 datasets) show\r\nthat eager-SGD achieves 1.27x speedup over the state-of-the-art synchronous SGD, without losing accuracy."}],"type":"conference","language":[{"iso":"eng"}],"conference":{"name":"PPoPP: Sympopsium on Principles and Practice of Parallel Programming","end_date":"2020-02-26","location":"San Diego, CA, United States","start_date":"2020-02-22"},"doi":"10.1145/3332466.3374528","quality_controlled":"1","isi":1,"project":[{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"oa":1,"external_id":{"arxiv":["1908.04207"],"isi":["000564476500004"]},"main_file_link":[{"url":"https://arxiv.org/abs/1908.04207","open_access":"1"}],"month":"02","date_updated":"2023-08-22T12:13:48Z","date_created":"2020-11-05T15:25:30Z","author":[{"first_name":"Shigang","last_name":"Li","full_name":"Li, Shigang"},{"last_name":"Tal Ben-Nun","first_name":"Tal Ben-Nun","full_name":"Tal Ben-Nun, Tal Ben-Nun"},{"last_name":"Girolamo","first_name":"Salvatore Di","full_name":"Girolamo, Salvatore Di"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Hoefler","first_name":"Torsten","full_name":"Hoefler, Torsten"}],"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","year":"2020","ec_funded":1},{"oa_version":"Published Version","file":[{"relation":"main_file","file_id":"9120","date_created":"2021-02-15T09:00:01Z","date_updated":"2021-02-15T09:00:01Z","checksum":"cc755d0054bc4b2be778ea7aa7884d2f","success":1,"file_name":"2020_PMLR_Konstantinov.pdf","access_level":"open_access","content_type":"application/pdf","file_size":281286,"creator":"dernst"}],"title":"On the sample complexity of adversarial multi-source PAC learning","status":"public","ddc":["000"],"intvolume":" 119","user_id":"3E5EF7F0-F248-11E8-B48F-1D18A9856A87","_id":"8724","abstract":[{"text":"We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is\r\nknown that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily\r\ncorrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some\r\nparticipants are malicious. ","lang":"eng"}],"type":"conference","date_published":"2020-07-12T00:00:00Z","page":"5416-5425","publication":"Proceedings of the 37th International Conference on Machine Learning","citation":{"apa":"Konstantinov, N. H., Frantar, E., Alistarh, D.-A., & Lampert, C. (2020). On the sample complexity of adversarial multi-source PAC learning. In Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 5416–5425). Online: ML Research Press.","ieee":"N. H. Konstantinov, E. Frantar, D.-A. Alistarh, and C. Lampert, “On the sample complexity of adversarial multi-source PAC learning,” in Proceedings of the 37th International Conference on Machine Learning, Online, 2020, vol. 119, pp. 5416–5425.","ista":"Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. 2020. On the sample complexity of adversarial multi-source PAC learning. Proceedings of the 37th International Conference on Machine Learning. ICML: International Conference on Machine Learning vol. 119, 5416–5425.","ama":"Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. On the sample complexity of adversarial multi-source PAC learning. In: Proceedings of the 37th International Conference on Machine Learning. Vol 119. ML Research Press; 2020:5416-5425.","chicago":"Konstantinov, Nikola H, Elias Frantar, Dan-Adrian Alistarh, and Christoph Lampert. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” In Proceedings of the 37th International Conference on Machine Learning, 119:5416–25. ML Research Press, 2020.","short":"N.H. Konstantinov, E. Frantar, D.-A. Alistarh, C. Lampert, in:, Proceedings of the 37th International Conference on Machine Learning, ML Research Press, 2020, pp. 5416–5425.","mla":"Konstantinov, Nikola H., et al. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” Proceedings of the 37th International Conference on Machine Learning, vol. 119, ML Research Press, 2020, pp. 5416–25."},"day":"12","article_processing_charge":"No","has_accepted_license":"1","scopus_import":"1","date_created":"2020-11-05T15:25:58Z","date_updated":"2023-09-07T13:42:08Z","volume":119,"author":[{"full_name":"Konstantinov, Nikola H","first_name":"Nikola H","last_name":"Konstantinov","id":"4B9D76E4-F248-11E8-B48F-1D18A9856A87"},{"first_name":"Elias","last_name":"Frantar","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","full_name":"Frantar, Elias"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"orcid":"0000-0001-8622-7887","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87","last_name":"Lampert","first_name":"Christoph","full_name":"Lampert, Christoph"}],"related_material":{"record":[{"status":"public","relation":"dissertation_contains","id":"10799"}],"link":[{"relation":"supplementary_material","url":"http://proceedings.mlr.press/v119/konstantinov20a/konstantinov20a-supp.pdf"}]},"publication_status":"published","department":[{"_id":"DaAl"},{"_id":"ChLa"}],"publisher":"ML Research Press","year":"2020","acknowledgement":"Dan Alistarh is supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). This research was supported by the Scientific Service Units (SSU) of IST Austria through resources provided by Scientific Computing (SciComp).","file_date_updated":"2021-02-15T09:00:01Z","ec_funded":1,"acknowledged_ssus":[{"_id":"ScienComp"}],"language":[{"iso":"eng"}],"conference":{"end_date":"2020-07-18","start_date":"2020-07-12","location":"Online","name":"ICML: International Conference on Machine Learning"},"quality_controlled":"1","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"oa":1,"external_id":{"arxiv":["2002.10384"]},"month":"07","publication_identifier":{"issn":["2640-3498"]}},{"day":"19","article_processing_charge":"No","scopus_import":"1","date_published":"2020-02-19T00:00:00Z","page":"276-291","publication":"Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","citation":{"short":"T.A. Brown, A. Prokopec, D.-A. Alistarh, in:, Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2020, pp. 276–291.","mla":"Brown, Trevor A., et al. “Non-Blocking Interpolation Search Trees with Doubly-Logarithmic Running Time.” Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2020, pp. 276–91, doi:10.1145/3332466.3374542.","chicago":"Brown, Trevor A, Aleksandar Prokopec, and Dan-Adrian Alistarh. “Non-Blocking Interpolation Search Trees with Doubly-Logarithmic Running Time.” In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 276–91. Association for Computing Machinery, 2020. https://doi.org/10.1145/3332466.3374542.","ama":"Brown TA, Prokopec A, Alistarh D-A. Non-blocking interpolation search trees with doubly-logarithmic running time. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2020:276-291. doi:10.1145/3332466.3374542","ieee":"T. A. Brown, A. Prokopec, and D.-A. Alistarh, “Non-blocking interpolation search trees with doubly-logarithmic running time,” in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, United States, 2020, pp. 276–291.","apa":"Brown, T. A., Prokopec, A., & Alistarh, D.-A. (2020). Non-blocking interpolation search trees with doubly-logarithmic running time. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 276–291). San Diego, CA, United States: Association for Computing Machinery. https://doi.org/10.1145/3332466.3374542","ista":"Brown TA, Prokopec A, Alistarh D-A. 2020. Non-blocking interpolation search trees with doubly-logarithmic running time. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPOPP: Principles and Practice of Parallel Programming, 276–291."},"abstract":[{"text":"Balanced search trees typically use key comparisons to guide their operations, and achieve logarithmic running time. By relying on numerical properties of the keys, interpolation search achieves lower search complexity and better performance. Although interpolation-based data structures were investigated in the past, their non-blocking concurrent variants have received very little attention so far.\r\nIn this paper, we propose the first non-blocking implementation of the classic interpolation search tree (IST) data structure. For arbitrary key distributions, the data structure ensures worst-case O(log n + p) amortized time for search, insertion and deletion traversals. When the input key distributions are smooth, lookups run in expected O(log log n + p) time, and insertion and deletion run in expected amortized O(log log n + p) time, where p is a bound on the number of threads. To improve the scalability of concurrent insertion and deletion, we propose a novel parallel rebuilding technique, which should be of independent interest.\r\nWe evaluate whether the theoretical improvements translate to practice by implementing the concurrent interpolation search tree, and benchmarking it on uniform and nonuniform key distributions, for dataset sizes in the millions to billions of keys. Relative to the state-of-the-art concurrent data structures, the concurrent interpolation search tree achieves performance improvements of up to 15% under high update rates, and of up to 50% under moderate update rates. Further, ISTs exhibit up to 2X less cache-misses, and consume 1.2 -- 2.6X less memory compared to the next best alternative on typical dataset sizes. We find that the results are surprisingly robust to distributional skew, which suggests that our data structure can be a promising alternative to classic concurrent search structures.","lang":"eng"}],"type":"conference","oa_version":"Published Version","status":"public","title":"Non-blocking interpolation search trees with doubly-logarithmic running time","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"7636","month":"02","publication_identifier":{"isbn":["9781450368186"]},"language":[{"iso":"eng"}],"conference":{"end_date":"2020-02-26","location":"San Diego, CA, United States","start_date":"2020-02-22","name":"PPOPP: Principles and Practice of Parallel Programming"},"doi":"10.1145/3332466.3374542","quality_controlled":"1","isi":1,"project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"external_id":{"isi":["000564476500020"]},"oa":1,"main_file_link":[{"open_access":"1","url":"https://doi.org/10.1145/3332466.3374542"}],"ec_funded":1,"date_created":"2020-04-05T22:00:49Z","date_updated":"2024-02-28T12:55:14Z","author":[{"full_name":"Brown, Trevor A","id":"3569F0A0-F248-11E8-B48F-1D18A9856A87","first_name":"Trevor A","last_name":"Brown"},{"full_name":"Prokopec, Aleksandar","first_name":"Aleksandar","last_name":"Prokopec"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"}],"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation program, grant agreement No 805223, ERC Starting Grant ScaleML. We acknowledge the support of the Natural Sciences and\r\nEngineering Research Council of Canada (NSERC). ","year":"2020"},{"type":"conference","abstract":[{"text":"There has been a significant amount of research on hardware and software support for efficient concurrent data structures; yet, the question of how to build correct, simple, and scalable data structures has not yet been definitively settled. In this paper, we revisit this question from a minimalist perspective, and ask: what is the smallest amount of synchronization required for correct and efficient concurrent search data structures, and how could this minimal synchronization support be provided in hardware?\r\n\r\nTo address these questions, we introduce memory tagging, a simple hardware mechanism which enables the programmer to \"tag\" a dynamic set of memory locations, at cache-line granularity, and later validate whether the memory has been concurrently modified, with the possibility of updating one of the underlying locations atomically if validation succeeds. We provide several examples showing that this mechanism can enable fast and arguably simple concurrent data structure designs, such as lists, binary search trees, balanced search trees, range queries, and Software Transactional Memory (STM) implementations. We provide an implementation of memory tags in the Graphite multi-core simulator, showing that the mechanism can be implemented entirely at the level of L1 cache, and that it can enable non-trivial speedups versus existing implementations of the above data structures.","lang":"eng"}],"issue":"7","title":"Memory tagging: Minimalist synchronization for scalable concurrent data structures","status":"public","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"8191","year":"2020","date_created":"2020-08-02T22:00:58Z","date_updated":"2024-02-28T12:56:32Z","oa_version":"None","author":[{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"},{"first_name":"Trevor A","last_name":"Brown","id":"3569F0A0-F248-11E8-B48F-1D18A9856A87","full_name":"Brown, Trevor A"},{"full_name":"Singhal, Nandini","last_name":"Singhal","first_name":"Nandini"}],"scopus_import":"1","day":"06","month":"07","publication_identifier":{"isbn":["9781450369350"]},"article_processing_charge":"No","isi":1,"quality_controlled":"1","page":"37-49","publication":"Annual ACM Symposium on Parallelism in Algorithms and Architectures","citation":{"mla":"Alistarh, Dan-Adrian, et al. “Memory Tagging: Minimalist Synchronization for Scalable Concurrent Data Structures.” Annual ACM Symposium on Parallelism in Algorithms and Architectures, no. 7, Association for Computing Machinery, 2020, pp. 37–49, doi:10.1145/3350755.3400213.","short":"D.-A. Alistarh, T.A. Brown, N. Singhal, in:, Annual ACM Symposium on Parallelism in Algorithms and Architectures, Association for Computing Machinery, 2020, pp. 37–49.","chicago":"Alistarh, Dan-Adrian, Trevor A Brown, and Nandini Singhal. “Memory Tagging: Minimalist Synchronization for Scalable Concurrent Data Structures.” In Annual ACM Symposium on Parallelism in Algorithms and Architectures, 37–49. Association for Computing Machinery, 2020. https://doi.org/10.1145/3350755.3400213.","ama":"Alistarh D-A, Brown TA, Singhal N. Memory tagging: Minimalist synchronization for scalable concurrent data structures. In: Annual ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery; 2020:37-49. doi:10.1145/3350755.3400213","ista":"Alistarh D-A, Brown TA, Singhal N. 2020. Memory tagging: Minimalist synchronization for scalable concurrent data structures. Annual ACM Symposium on Parallelism in Algorithms and Architectures. SPAA: Symposium on Parallelism in Algorithms and Architectures, 37–49.","ieee":"D.-A. Alistarh, T. A. Brown, and N. Singhal, “Memory tagging: Minimalist synchronization for scalable concurrent data structures,” in Annual ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event, United States, 2020, no. 7, pp. 37–49.","apa":"Alistarh, D.-A., Brown, T. A., & Singhal, N. (2020). Memory tagging: Minimalist synchronization for scalable concurrent data structures. In Annual ACM Symposium on Parallelism in Algorithms and Architectures (pp. 37–49). Virtual Event, United States: Association for Computing Machinery. https://doi.org/10.1145/3350755.3400213"},"external_id":{"isi":["000744436200004"]},"language":[{"iso":"eng"}],"conference":{"location":"Virtual Event, United States","start_date":"2020-07-15","end_date":"2020-07-17","name":"SPAA: Symposium on Parallelism in Algorithms and Architectures"},"doi":"10.1145/3350755.3400213","date_published":"2020-07-06T00:00:00Z"},{"citation":{"mla":"Koval, Nikita, et al. “Testing Concurrency on the JVM with Lincheck.” Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, Association for Computing Machinery, 2020, pp. 423–24, doi:10.1145/3332466.3374503.","short":"N. Koval, M. Sokolova, A. Fedorov, D.-A. Alistarh, D. Tsitelov, in:, Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, Association for Computing Machinery, 2020, pp. 423–424.","chicago":"Koval, Nikita, Mariia Sokolova, Alexander Fedorov, Dan-Adrian Alistarh, and Dmitry Tsitelov. “Testing Concurrency on the JVM with Lincheck.” In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, 423–24. Association for Computing Machinery, 2020. https://doi.org/10.1145/3332466.3374503.","ama":"Koval N, Sokolova M, Fedorov A, Alistarh D-A, Tsitelov D. Testing concurrency on the JVM with Lincheck. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP. Association for Computing Machinery; 2020:423-424. doi:10.1145/3332466.3374503","ista":"Koval N, Sokolova M, Fedorov A, Alistarh D-A, Tsitelov D. 2020. Testing concurrency on the JVM with Lincheck. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP. PPOPP: Principles and Practice of Parallel Programming, 423–424.","apa":"Koval, N., Sokolova, M., Fedorov, A., Alistarh, D.-A., & Tsitelov, D. (2020). Testing concurrency on the JVM with Lincheck. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (pp. 423–424). San Diego, CA, United States: Association for Computing Machinery. https://doi.org/10.1145/3332466.3374503","ieee":"N. Koval, M. Sokolova, A. Fedorov, D.-A. Alistarh, and D. Tsitelov, “Testing concurrency on the JVM with Lincheck,” in Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, San Diego, CA, United States, 2020, pp. 423–424."},"publication":"Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP","page":"423-424","quality_controlled":"1","doi":"10.1145/3332466.3374503","date_published":"2020-02-19T00:00:00Z","conference":{"name":"PPOPP: Principles and Practice of Parallel Programming","location":"San Diego, CA, United States","start_date":"2020-02-22","end_date":"2020-02-26"},"language":[{"iso":"eng"}],"scopus_import":"1","article_processing_charge":"No","publication_identifier":{"isbn":["9781450368186"]},"month":"02","day":"19","year":"2020","_id":"7635","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","publication_status":"published","title":"Testing concurrency on the JVM with Lincheck","status":"public","author":[{"id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87","last_name":"Koval","first_name":"Nikita","full_name":"Koval, Nikita"},{"full_name":"Sokolova, Mariia","id":"26217AE4-77FF-11EA-8101-AD24D49E41F4","first_name":"Mariia","last_name":"Sokolova"},{"full_name":"Fedorov, Alexander","first_name":"Alexander","last_name":"Fedorov"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"first_name":"Dmitry","last_name":"Tsitelov","full_name":"Tsitelov, Dmitry"}],"oa_version":"None","date_created":"2020-04-05T22:00:48Z","date_updated":"2024-02-28T12:53:46Z","type":"conference","abstract":[{"text":"Concurrent programming can be notoriously complex and error-prone. Programming bugs can arise from a variety of sources, such as operation re-reordering, or incomplete understanding of the memory model. A variety of formal and model checking methods have been developed to address this fundamental difficulty. While technically interesting, existing academic methods are still hard to apply to the large codebases typical of industrial deployments, which limits their practical impact.","lang":"eng"}]},{"language":[{"iso":"eng"}],"conference":{"name":"PODC: Principles of Distributed Computing","end_date":"2020-08-07","location":"Virtual, Italy","start_date":"2020-08-03"},"date_published":"2020-07-31T00:00:00Z","doi":"10.1145/3382734.3405743","quality_controlled":"1","page":"54-56","publication":"Proceedings of the 39th Symposium on Principles of Distributed Computing","citation":{"chicago":"Alistarh, Dan-Adrian, James Aspnes, Faith Ellen, Rati Gelashvili, and Leqi Zhu. “Brief Announcement: Why Extension-Based Proofs Fail.” In Proceedings of the 39th Symposium on Principles of Distributed Computing, 54–56. Association for Computing Machinery, 2020. https://doi.org/10.1145/3382734.3405743.","mla":"Alistarh, Dan-Adrian, et al. “Brief Announcement: Why Extension-Based Proofs Fail.” Proceedings of the 39th Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2020, pp. 54–56, doi:10.1145/3382734.3405743.","short":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, L. Zhu, in:, Proceedings of the 39th Symposium on Principles of Distributed Computing, Association for Computing Machinery, 2020, pp. 54–56.","ista":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. 2020. Brief Announcement: Why Extension-Based Proofs Fail. Proceedings of the 39th Symposium on Principles of Distributed Computing. PODC: Principles of Distributed Computing, 54–56.","apa":"Alistarh, D.-A., Aspnes, J., Ellen, F., Gelashvili, R., & Zhu, L. (2020). Brief Announcement: Why Extension-Based Proofs Fail. In Proceedings of the 39th Symposium on Principles of Distributed Computing (pp. 54–56). Virtual, Italy: Association for Computing Machinery. https://doi.org/10.1145/3382734.3405743","ieee":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, and L. Zhu, “Brief Announcement: Why Extension-Based Proofs Fail,” in Proceedings of the 39th Symposium on Principles of Distributed Computing, Virtual, Italy, 2020, pp. 54–56.","ama":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. Brief Announcement: Why Extension-Based Proofs Fail. In: Proceedings of the 39th Symposium on Principles of Distributed Computing. Association for Computing Machinery; 2020:54-56. doi:10.1145/3382734.3405743"},"month":"07","day":"31","article_processing_charge":"No","publication_identifier":{"isbn":["9781450375825"]},"scopus_import":"1","date_created":"2020-09-13T22:01:18Z","date_updated":"2024-02-28T12:54:19Z","oa_version":"None","author":[{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"},{"first_name":"James","last_name":"Aspnes","full_name":"Aspnes, James"},{"full_name":"Ellen, Faith","first_name":"Faith","last_name":"Ellen"},{"first_name":"Rati","last_name":"Gelashvili","full_name":"Gelashvili, Rati"},{"last_name":"Zhu","first_name":"Leqi","full_name":"Zhu, Leqi"}],"publication_status":"published","title":"Brief Announcement: Why Extension-Based Proofs Fail","status":"public","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","year":"2020","_id":"8383","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","abstract":[{"lang":"eng","text":"We introduce extension-based proofs, a class of impossibility proofs that includes valency arguments. They are modelled as an interaction between a prover and a protocol. Using proofs based on combinatorial topology, it has been shown that it is impossible to deterministically solve k-set agreement among n > k ≥ 2 processes in a wait-free manner. However, it was unknown whether proofs based on simpler techniques were possible. We explain why this impossibility result cannot be obtained by an extension-based proof and, hence, extension-based proofs are limited in power."}],"type":"conference"},{"alternative_title":["LIPIcs"],"type":"conference","abstract":[{"lang":"eng","text":"We consider the following dynamic load-balancing process: given an underlying graph G with n nodes, in each step t≥ 0, one unit of load is created, and placed at a randomly chosen graph node. In the same step, the chosen node picks a random neighbor, and the two nodes balance their loads by averaging them. We are interested in the expected gap between the minimum and maximum loads at nodes as the process progresses, and its dependence on n and on the graph structure. Variants of the above graphical balanced allocation process have been studied previously by Peres, Talwar, and Wieder [Peres et al., 2015], and by Sauerwald and Sun [Sauerwald and Sun, 2015]. These authors left as open the question of characterizing the gap in the case of cycle graphs in the dynamic case, where weights are created during the algorithm’s execution. For this case, the only known upper bound is of 𝒪(n log n), following from a majorization argument due to [Peres et al., 2015], which analyzes a related graphical allocation process. In this paper, we provide an upper bound of 𝒪 (√n log n) on the expected gap of the above process for cycles of length n. We introduce a new potential analysis technique, which enables us to bound the difference in load between k-hop neighbors on the cycle, for any k ≤ n/2. We complement this with a \"gap covering\" argument, which bounds the maximum value of the gap by bounding its value across all possible subsets of a certain structure, and recursively bounding the gaps within each subset. We provide analytical and experimental evidence that our upper bound on the gap is tight up to a logarithmic factor."}],"intvolume":" 168","ddc":["000"],"title":"Dynamic averaging load balancing on cycles","status":"public","_id":"15077","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","oa_version":"Published Version","file":[{"date_updated":"2024-03-05T07:25:15Z","date_created":"2024-03-05T07:25:15Z","success":1,"checksum":"e5eb16199f4ccfd77a321977eb3f026f","file_id":"15078","relation":"main_file","creator":"dernst","content_type":"application/pdf","file_size":782987,"file_name":"2020_LIPIcs_Alistarh.pdf","access_level":"open_access"}],"scopus_import":"1","article_processing_charge":"No","has_accepted_license":"1","day":"29","citation":{"ama":"Alistarh D-A, Nadiradze G, Sabour A. Dynamic averaging load balancing on cycles. In: 47th International Colloquium on Automata, Languages, and Programming. Vol 168. Schloss Dagstuhl - Leibniz-Zentrum für Informatik; 2020. doi:10.4230/LIPIcs.ICALP.2020.7","ieee":"D.-A. Alistarh, G. Nadiradze, and A. Sabour, “Dynamic averaging load balancing on cycles,” in 47th International Colloquium on Automata, Languages, and Programming, Saarbrücken, Germany, Virtual, 2020, vol. 168.","apa":"Alistarh, D.-A., Nadiradze, G., & Sabour, A. (2020). Dynamic averaging load balancing on cycles. In 47th International Colloquium on Automata, Languages, and Programming (Vol. 168). Saarbrücken, Germany, Virtual: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.ICALP.2020.7","ista":"Alistarh D-A, Nadiradze G, Sabour A. 2020. Dynamic averaging load balancing on cycles. 47th International Colloquium on Automata, Languages, and Programming. ICALP: International Colloquium on Automata, Languages, and Programming, LIPIcs, vol. 168, 7.","short":"D.-A. Alistarh, G. Nadiradze, A. Sabour, in:, 47th International Colloquium on Automata, Languages, and Programming, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.","mla":"Alistarh, Dan-Adrian, et al. “Dynamic Averaging Load Balancing on Cycles.” 47th International Colloquium on Automata, Languages, and Programming, vol. 168, 7, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020, doi:10.4230/LIPIcs.ICALP.2020.7.","chicago":"Alistarh, Dan-Adrian, Giorgi Nadiradze, and Amirmojtaba Sabour. “Dynamic Averaging Load Balancing on Cycles.” In 47th International Colloquium on Automata, Languages, and Programming, Vol. 168. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. https://doi.org/10.4230/LIPIcs.ICALP.2020.7."},"publication":"47th International Colloquium on Automata, Languages, and Programming","date_published":"2020-06-29T00:00:00Z","article_number":"7","ec_funded":1,"file_date_updated":"2024-03-05T07:25:15Z","department":[{"_id":"DaAl"}],"publisher":"Schloss Dagstuhl - Leibniz-Zentrum für Informatik","publication_status":"published","year":"2020","acknowledgement":"The authors sincerely thank Thomas Sauerwald and George Giakkoupis for insightful discussions, and Mohsen Ghaffari, Yuval Peres, and Udi Wieder for feedback on earlier\r\nversions of this draft. We also thank the ICALP anonymous reviewers for their very useful comments.\r\nFunding: European Research Council funding award PR1042ERC01","volume":168,"date_created":"2024-03-05T07:25:37Z","date_updated":"2024-03-05T07:35:53Z","related_material":{"record":[{"status":"public","relation":"later_version","id":"8286"}]},"author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"id":"3279A00C-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-5634-0731","first_name":"Giorgi","last_name":"Nadiradze","full_name":"Nadiradze, Giorgi"},{"id":"bcc145fd-e77f-11ea-ae8b-80d661dbff67","last_name":"Sabour","first_name":"Amirmojtaba","full_name":"Sabour, Amirmojtaba"}],"month":"06","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"quality_controlled":"1","oa":1,"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/3.0/legalcode","name":"Creative Commons Attribution 3.0 Unported (CC BY 3.0)","short":"CC BY (3.0)","image":"/images/cc_by.png"},"external_id":{"arxiv":["2003.09297"]},"language":[{"iso":"eng"}],"doi":"10.4230/LIPIcs.ICALP.2020.7","conference":{"name":"ICALP: International Colloquium on Automata, Languages, and Programming","end_date":"2020-07-11","start_date":"2020-07-08","location":"Saarbrücken, Germany, Virtual"}},{"_id":"6485","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","year":"2019","publication_status":"published","title":"Lock-free channels for programming via communicating sequential processes","status":"public","publisher":"ACM Press","department":[{"_id":"DaAl"}],"author":[{"id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87","first_name":"Nikita","last_name":"Koval","full_name":"Koval, Nikita"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Elizarov, Roman","first_name":"Roman","last_name":"Elizarov"}],"date_updated":"2023-08-25T10:41:20Z","date_created":"2019-05-24T10:09:12Z","oa_version":"None","type":"conference_poster","abstract":[{"text":"Traditional concurrent programming involves manipulating shared mutable state. Alternatives to this programming style are communicating sequential processes (CSP) [1] and actor [2] models, which share data via explicit communication. Rendezvous channelis the common abstraction for communication between several processes, where senders and receivers perform a rendezvous handshake as a part of their protocol (senders wait for receivers and vice versa). Additionally to this, channels support the select expression. In this work, we present the first efficient lock-free channel algorithm, and compare it against Go [3] and Kotlin [4] baseline implementations.","lang":"eng"}],"publication":"Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming","external_id":{"isi":["000587604600044"]},"citation":{"chicago":"Koval, Nikita, Dan-Adrian Alistarh, and Roman Elizarov. Lock-Free Channels for Programming via Communicating Sequential Processes. Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. ACM Press, 2019. https://doi.org/10.1145/3293883.3297000.","mla":"Koval, Nikita, et al. “Lock-Free Channels for Programming via Communicating Sequential Processes.” Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ACM Press, 2019, pp. 417–18, doi:10.1145/3293883.3297000.","short":"N. Koval, D.-A. Alistarh, R. Elizarov, Lock-Free Channels for Programming via Communicating Sequential Processes, ACM Press, 2019.","ista":"Koval N, Alistarh D-A, Elizarov R. 2019. Lock-free channels for programming via communicating sequential processes, ACM Press,p.","apa":"Koval, N., Alistarh, D.-A., & Elizarov, R. (2019). Lock-free channels for programming via communicating sequential processes. Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (pp. 417–418). Washington, NY, United States: ACM Press. https://doi.org/10.1145/3293883.3297000","ieee":"N. Koval, D.-A. Alistarh, and R. Elizarov, Lock-free channels for programming via communicating sequential processes. ACM Press, 2019, pp. 417–418.","ama":"Koval N, Alistarh D-A, Elizarov R. Lock-Free Channels for Programming via Communicating Sequential Processes. ACM Press; 2019:417-418. doi:10.1145/3293883.3297000"},"quality_controlled":"1","isi":1,"page":"417-418","conference":{"end_date":"2019-02-20","start_date":"2019-02-16","location":"Washington, NY, United States","name":"PPoPP: Principles and Practice of Parallel Programming"},"date_published":"2019-02-01T00:00:00Z","doi":"10.1145/3293883.3297000","language":[{"iso":"eng"}],"month":"02","day":"01","publication_identifier":{"isbn":["9781450362252"]},"article_processing_charge":"No"},{"month":"01","day":"21","article_processing_charge":"No","publication_identifier":{"issn":["0743-1546"],"isbn":["9781538613955"]},"scopus_import":"1","language":[{"iso":"eng"}],"conference":{"name":"CDC: Conference on Decision and Control","end_date":"2018-12-19","location":"Miami Beach, FL, United States","start_date":"2018-12-17"},"date_published":"2019-01-21T00:00:00Z","doi":"10.1109/cdc.2018.8619625","isi":1,"quality_controlled":"1","publication":"2018 IEEE Conference on Decision and Control","citation":{"chicago":"Khirirat, Sarit, Mikael Johansson, and Dan-Adrian Alistarh. “Gradient Compression for Communication-Limited Convex Optimization.” In 2018 IEEE Conference on Decision and Control. IEEE, 2019. https://doi.org/10.1109/cdc.2018.8619625.","mla":"Khirirat, Sarit, et al. “Gradient Compression for Communication-Limited Convex Optimization.” 2018 IEEE Conference on Decision and Control, 8619625, IEEE, 2019, doi:10.1109/cdc.2018.8619625.","short":"S. Khirirat, M. Johansson, D.-A. Alistarh, in:, 2018 IEEE Conference on Decision and Control, IEEE, 2019.","ista":"Khirirat S, Johansson M, Alistarh D-A. 2019. Gradient compression for communication-limited convex optimization. 2018 IEEE Conference on Decision and Control. CDC: Conference on Decision and Control, 8619625.","ieee":"S. Khirirat, M. Johansson, and D.-A. Alistarh, “Gradient compression for communication-limited convex optimization,” in 2018 IEEE Conference on Decision and Control, Miami Beach, FL, United States, 2019.","apa":"Khirirat, S., Johansson, M., & Alistarh, D.-A. (2019). Gradient compression for communication-limited convex optimization. In 2018 IEEE Conference on Decision and Control. Miami Beach, FL, United States: IEEE. https://doi.org/10.1109/cdc.2018.8619625","ama":"Khirirat S, Johansson M, Alistarh D-A. Gradient compression for communication-limited convex optimization. In: 2018 IEEE Conference on Decision and Control. IEEE; 2019. doi:10.1109/cdc.2018.8619625"},"external_id":{"isi":["000458114800023"]},"abstract":[{"lang":"eng","text":"Data-rich applications in machine-learning and control have motivated an intense research on large-scale optimization. Novel algorithms have been proposed and shown to have optimal convergence rates in terms of iteration counts. However, their practical performance is severely degraded by the cost of exchanging high-dimensional gradient vectors between computing nodes. Several gradient compression heuristics have recently been proposed to reduce communications, but few theoretical results exist that quantify how they impact algorithm convergence. This paper establishes and strengthens the convergence guarantees for gradient descent under a family of gradient compression techniques. For convex optimization problems, we derive admissible step sizes and quantify both the number of iterations and the number of bits that need to be exchanged to reach a target accuracy. Finally, we validate the performance of different gradient compression techniques in simulations. The numerical results highlight the properties of different gradient compression algorithms and confirm that fast convergence with limited information exchange is possible."}],"article_number":"8619625","type":"conference","date_updated":"2023-09-06T11:14:55Z","date_created":"2019-11-26T15:07:49Z","oa_version":"None","author":[{"last_name":"Khirirat","first_name":"Sarit","full_name":"Khirirat, Sarit"},{"full_name":"Johansson, Mikael","first_name":"Mikael","last_name":"Johansson"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"title":"Gradient compression for communication-limited convex optimization","publication_status":"published","status":"public","publisher":"IEEE","department":[{"_id":"DaAl"}],"_id":"7122","year":"2019","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1"},{"oa":1,"external_id":{"isi":["000545976800011"],"arxiv":["1802.08021"]},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1802.08021"}],"project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"isi":1,"quality_controlled":"1","doi":"10.1145/3295500.3356222","conference":{"name":"SC: Conference for High Performance Computing, Networking, Storage and Analysis","end_date":"2019-11-19","location":"Denver, CO, Unites States","start_date":"2019-11-17"},"language":[{"iso":"eng"}],"publication_identifier":{"issn":["21674329"],"eissn":["21674337"],"isbn":["9781450362290"]},"month":"11","year":"2019","department":[{"_id":"DaAl"}],"publisher":"ACM","publication_status":"published","author":[{"last_name":"Renggli","first_name":"Cedric","full_name":"Renggli, Cedric"},{"full_name":"Ashkboos, Saleh","id":"0D0A9058-257B-11EA-A937-9341C3D8BC8A","first_name":"Saleh","last_name":"Ashkboos"},{"last_name":"Aghagolzadeh","first_name":"Mehdi","full_name":"Aghagolzadeh, Mehdi"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Hoefler","first_name":"Torsten","full_name":"Hoefler, Torsten"}],"date_created":"2019-12-22T23:00:42Z","date_updated":"2023-09-06T14:37:55Z","article_number":"a11","ec_funded":1,"citation":{"ista":"Renggli C, Ashkboos S, Aghagolzadeh M, Alistarh D-A, Hoefler T. 2019. SparCML: High-performance sparse communication for machine learning. International Conference for High Performance Computing, Networking, Storage and Analysis, SC. SC: Conference for High Performance Computing, Networking, Storage and Analysis, a11.","ieee":"C. Renggli, S. Ashkboos, M. Aghagolzadeh, D.-A. Alistarh, and T. Hoefler, “SparCML: High-performance sparse communication for machine learning,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Denver, CO, Unites States, 2019.","apa":"Renggli, C., Ashkboos, S., Aghagolzadeh, M., Alistarh, D.-A., & Hoefler, T. (2019). SparCML: High-performance sparse communication for machine learning. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, Unites States: ACM. https://doi.org/10.1145/3295500.3356222","ama":"Renggli C, Ashkboos S, Aghagolzadeh M, Alistarh D-A, Hoefler T. SparCML: High-performance sparse communication for machine learning. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC. ACM; 2019. doi:10.1145/3295500.3356222","chicago":"Renggli, Cedric, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan-Adrian Alistarh, and Torsten Hoefler. “SparCML: High-Performance Sparse Communication for Machine Learning.” In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. ACM, 2019. https://doi.org/10.1145/3295500.3356222.","mla":"Renggli, Cedric, et al. “SparCML: High-Performance Sparse Communication for Machine Learning.” International Conference for High Performance Computing, Networking, Storage and Analysis, SC, a11, ACM, 2019, doi:10.1145/3295500.3356222.","short":"C. Renggli, S. Ashkboos, M. Aghagolzadeh, D.-A. Alistarh, T. Hoefler, in:, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, ACM, 2019."},"publication":"International Conference for High Performance Computing, Networking, Storage and Analysis, SC","date_published":"2019-11-17T00:00:00Z","scopus_import":"1","article_processing_charge":"No","day":"17","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"7201","status":"public","title":"SparCML: High-performance sparse communication for machine learning","oa_version":"Preprint","type":"conference","abstract":[{"text":"Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed \"data parallel\" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SparCML1, extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SparCML and its techniques will form the basis of future highly-scalable machine learning frameworks.","lang":"eng"}]},{"abstract":[{"text":"Traditional concurrent programming involves manipulating shared mutable state. Alternatives to this programming style are communicating sequential processes (CSP) and actor models, which share data via explicit communication. These models have been known for almost half a century, and have recently had started to gain significant traction among modern programming languages. The common abstraction for communication between several processes is the channel. Although channels are similar to producer-consumer data structures, they have different semantics and support additional operations, such as the select expression. Despite their growing popularity, most known implementations of channels use lock-based data structures and can be rather inefficient.\r\n\r\nIn this paper, we present the first efficient lock-free algorithm for implementing a communication channel for CSP programming. We provide implementations and experimental results in the Kotlin and Go programming languages. Our new algorithm outperforms existing implementations on many workloads, while providing non-blocking progress guarantee. Our design can serve as an example of how to construct general communication data structures for CSP and actor models. ","lang":"eng"}],"type":"conference","alternative_title":["LNCS"],"oa_version":"None","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"7228","status":"public","title":"Scalable FIFO channels for programming via communicating sequential processes","intvolume":" 11725","day":"13","article_processing_charge":"No","scopus_import":"1","date_published":"2019-08-13T00:00:00Z","publication":"25th Anniversary of Euro-Par","citation":{"ista":"Koval N, Alistarh D-A, Elizarov R. 2019. Scalable FIFO channels for programming via communicating sequential processes. 25th Anniversary of Euro-Par. Euro-Par: European Conference on Parallel Processing, LNCS, vol. 11725, 317–333.","apa":"Koval, N., Alistarh, D.-A., & Elizarov, R. (2019). Scalable FIFO channels for programming via communicating sequential processes. In 25th Anniversary of Euro-Par (Vol. 11725, pp. 317–333). Göttingen, Germany: Springer Nature. https://doi.org/10.1007/978-3-030-29400-7_23","ieee":"N. Koval, D.-A. Alistarh, and R. Elizarov, “Scalable FIFO channels for programming via communicating sequential processes,” in 25th Anniversary of Euro-Par, Göttingen, Germany, 2019, vol. 11725, pp. 317–333.","ama":"Koval N, Alistarh D-A, Elizarov R. Scalable FIFO channels for programming via communicating sequential processes. In: 25th Anniversary of Euro-Par. Vol 11725. Springer Nature; 2019:317-333. doi:10.1007/978-3-030-29400-7_23","chicago":"Koval, Nikita, Dan-Adrian Alistarh, and Roman Elizarov. “Scalable FIFO Channels for Programming via Communicating Sequential Processes.” In 25th Anniversary of Euro-Par, 11725:317–33. Springer Nature, 2019. https://doi.org/10.1007/978-3-030-29400-7_23.","mla":"Koval, Nikita, et al. “Scalable FIFO Channels for Programming via Communicating Sequential Processes.” 25th Anniversary of Euro-Par, vol. 11725, Springer Nature, 2019, pp. 317–33, doi:10.1007/978-3-030-29400-7_23.","short":"N. Koval, D.-A. Alistarh, R. Elizarov, in:, 25th Anniversary of Euro-Par, Springer Nature, 2019, pp. 317–333."},"page":"317-333","author":[{"full_name":"Koval, Nikita","first_name":"Nikita","last_name":"Koval","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87"},{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Elizarov, Roman","first_name":"Roman","last_name":"Elizarov"}],"date_updated":"2023-09-06T14:53:59Z","date_created":"2020-01-05T23:00:46Z","volume":11725,"year":"2019","publication_status":"published","publisher":"Springer Nature","department":[{"_id":"DaAl"}],"month":"08","publication_identifier":{"issn":["0302-9743"],"isbn":["978-3-0302-9399-4"],"eissn":["1611-3349"]},"conference":{"end_date":"2019-08-30","location":"Göttingen, Germany","start_date":"2019-08-26","name":"Euro-Par: European Conference on Parallel Processing"},"doi":"10.1007/978-3-030-29400-7_23","language":[{"iso":"eng"}],"external_id":{"isi":["000851061400023"]},"isi":1,"quality_controlled":"1"},{"day":"01","article_processing_charge":"No","scopus_import":"1","date_published":"2019-06-01T00:00:00Z","page":"12481-12512","publication":"36th International Conference on Machine Learning, ICML 2019","citation":{"mla":"Yu, Chen, et al. “Distributed Learning over Unreliable Networks.” 36th International Conference on Machine Learning, ICML 2019, vol. 2019–June, IMLS, 2019, pp. 12481–512.","short":"C. Yu, H. Tang, C. Renggli, S. Kassing, A. Singla, D.-A. Alistarh, C. Zhang, J. Liu, in:, 36th International Conference on Machine Learning, ICML 2019, IMLS, 2019, pp. 12481–12512.","chicago":"Yu, Chen, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan-Adrian Alistarh, Ce Zhang, and Ji Liu. “Distributed Learning over Unreliable Networks.” In 36th International Conference on Machine Learning, ICML 2019, 2019–June:12481–512. IMLS, 2019.","ama":"Yu C, Tang H, Renggli C, et al. Distributed learning over unreliable networks. In: 36th International Conference on Machine Learning, ICML 2019. Vol 2019-June. IMLS; 2019:12481-12512.","ista":"Yu C, Tang H, Renggli C, Kassing S, Singla A, Alistarh D-A, Zhang C, Liu J. 2019. Distributed learning over unreliable networks. 36th International Conference on Machine Learning, ICML 2019. ICML: International Conference on Machine Learning vol. 2019–June, 12481–12512.","ieee":"C. Yu et al., “Distributed learning over unreliable networks,” in 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, United States, 2019, vol. 2019–June, pp. 12481–12512.","apa":"Yu, C., Tang, H., Renggli, C., Kassing, S., Singla, A., Alistarh, D.-A., … Liu, J. (2019). Distributed learning over unreliable networks. In 36th International Conference on Machine Learning, ICML 2019 (Vol. 2019–June, pp. 12481–12512). Long Beach, CA, United States: IMLS."},"abstract":[{"text":"Most of today's distributed machine learning systems assume reliable networks: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: Can we design machine learning systems that are tolerant to network unreliability during training? With this motivation, we focus on a theoretical problem of independent interest-given a standard distributed parameter server architecture, if every communication between the worker and the server has a non-zero probability p of being dropped, does there exist an algorithm that still converges, and at what speed? The technical contribution of this paper is a novel theoretical analysis proving that distributed learning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks. Further, we prove that the influence of the packet drop rate diminishes with the growth of the number of parameter servers. We map this theoretical result onto a real-world scenario, training deep neural networks over an unreliable network layer, and conduct network simulation to validate the system improvement by allowing the networks to be unreliable.","lang":"eng"}],"type":"conference","oa_version":"Preprint","title":"Distributed learning over unreliable networks","status":"public","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"7437","month":"06","publication_identifier":{"isbn":["9781510886988"]},"language":[{"iso":"eng"}],"conference":{"end_date":"2019-06-15","start_date":"2019-06-10","location":"Long Beach, CA, United States","name":"ICML: International Conference on Machine Learning"},"isi":1,"quality_controlled":"1","main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1810.07766"}],"external_id":{"arxiv":["1810.07766"],"isi":["000684034307036"]},"oa":1,"date_created":"2020-02-02T23:01:06Z","date_updated":"2023-09-06T15:21:48Z","volume":"2019-June","author":[{"full_name":"Yu, Chen","last_name":"Yu","first_name":"Chen"},{"full_name":"Tang, Hanlin","first_name":"Hanlin","last_name":"Tang"},{"full_name":"Renggli, Cedric","first_name":"Cedric","last_name":"Renggli"},{"first_name":"Simon","last_name":"Kassing","full_name":"Kassing, Simon"},{"full_name":"Singla, Ankit","last_name":"Singla","first_name":"Ankit"},{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Zhang","first_name":"Ce","full_name":"Zhang, Ce"},{"first_name":"Ji","last_name":"Liu","full_name":"Liu, Ji"}],"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"IMLS","year":"2019"},{"publication_identifier":{"isbn":["9781450361842"]},"month":"06","project":[{"call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"isi":1,"quality_controlled":"1","external_id":{"arxiv":["2003.09363"],"isi":["000507618500018"]},"oa":1,"main_file_link":[{"url":"https://arxiv.org/abs/2003.09363","open_access":"1"}],"language":[{"iso":"eng"}],"doi":"10.1145/3323165.3323201","conference":{"start_date":"2019-06-22","location":"Phoenix, AZ, United States","end_date":"2019-06-24","name":"SPAA: Symposium on Parallelism in Algorithms and Architectures"},"ec_funded":1,"department":[{"_id":"DaAl"}],"publisher":"ACM Press","publication_status":"published","year":"2019","date_updated":"2023-09-07T13:31:39Z","date_created":"2019-07-24T08:59:36Z","related_material":{"record":[{"id":"10429","status":"public","relation":"dissertation_contains"}]},"author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"orcid":"0000-0001-5634-0731","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","last_name":"Nadiradze","first_name":"Giorgi","full_name":"Nadiradze, Giorgi"},{"full_name":"Koval, Nikita","id":"2F4DB10C-F248-11E8-B48F-1D18A9856A87","last_name":"Koval","first_name":"Nikita"}],"scopus_import":"1","article_processing_charge":"No","day":"01","page":"145-154","citation":{"ista":"Alistarh D-A, Nadiradze G, Koval N. 2019. Efficiency guarantees for parallel incremental algorithms under relaxed schedulers. 31st ACM Symposium on Parallelism in Algorithms and Architectures. SPAA: Symposium on Parallelism in Algorithms and Architectures, 145–154.","apa":"Alistarh, D.-A., Nadiradze, G., & Koval, N. (2019). Efficiency guarantees for parallel incremental algorithms under relaxed schedulers. In 31st ACM Symposium on Parallelism in Algorithms and Architectures (pp. 145–154). Phoenix, AZ, United States: ACM Press. https://doi.org/10.1145/3323165.3323201","ieee":"D.-A. Alistarh, G. Nadiradze, and N. Koval, “Efficiency guarantees for parallel incremental algorithms under relaxed schedulers,” in 31st ACM Symposium on Parallelism in Algorithms and Architectures, Phoenix, AZ, United States, 2019, pp. 145–154.","ama":"Alistarh D-A, Nadiradze G, Koval N. Efficiency guarantees for parallel incremental algorithms under relaxed schedulers. In: 31st ACM Symposium on Parallelism in Algorithms and Architectures. ACM Press; 2019:145-154. doi:10.1145/3323165.3323201","chicago":"Alistarh, Dan-Adrian, Giorgi Nadiradze, and Nikita Koval. “Efficiency Guarantees for Parallel Incremental Algorithms under Relaxed Schedulers.” In 31st ACM Symposium on Parallelism in Algorithms and Architectures, 145–54. ACM Press, 2019. https://doi.org/10.1145/3323165.3323201.","mla":"Alistarh, Dan-Adrian, et al. “Efficiency Guarantees for Parallel Incremental Algorithms under Relaxed Schedulers.” 31st ACM Symposium on Parallelism in Algorithms and Architectures, ACM Press, 2019, pp. 145–54, doi:10.1145/3323165.3323201.","short":"D.-A. Alistarh, G. Nadiradze, N. Koval, in:, 31st ACM Symposium on Parallelism in Algorithms and Architectures, ACM Press, 2019, pp. 145–154."},"publication":"31st ACM Symposium on Parallelism in Algorithms and Architectures","date_published":"2019-06-01T00:00:00Z","type":"conference","abstract":[{"text":"Several classic problems in graph processing and computational geometry are solved via incremental algorithms, which split computation into a series of small tasks acting on shared state, which gets updated progressively. While the sequential variant of such algorithms usually specifies a fixed (but sometimes random) order in which the tasks should be performed, a standard approach to parallelizing such algorithms is to relax this constraint to allow for out-of-order parallel execution. This is the case for parallel implementations of Dijkstra's single-source shortest-paths (SSSP) algorithm, and for parallel Delaunay mesh triangulation. While many software frameworks parallelize incremental computation in this way, it is still not well understood whether this relaxed ordering approach can still provide any complexity guarantees. In this paper, we address this problem, and analyze the efficiency guarantees provided by a range of incremental algorithms when parallelized via relaxed schedulers. We show that, for algorithms such as Delaunay mesh triangulation and sorting by insertion, schedulers with a maximum relaxation factor of k in terms of the maximum priority inversion allowed will introduce a maximum amount of wasted work of O(łog n poly(k)), where n is the number of tasks to be executed. For SSSP, we show that the additional work is O(poly(k), dmax / wmin), where dmax is the maximum distance between two nodes, and wmin is the minimum such distance. In practical settings where n >> k, this suggests that the overheads of relaxation will be outweighed by the improved scalability of the relaxed scheduler. On the negative side, we provide lower bounds showing that certain algorithms will inherently incur a non-trivial amount of wasted work due to scheduler relaxation, even for relatively benign relaxed schedulers.","lang":"eng"}],"status":"public","title":"Efficiency guarantees for parallel incremental algorithms under relaxed schedulers","_id":"6673","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8","oa_version":"Preprint"},{"ec_funded":1,"author":[{"full_name":"Wendler, Chris","last_name":"Wendler","first_name":"Chris"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Püschel","first_name":"Markus","full_name":"Püschel, Markus"}],"date_updated":"2023-09-08T11:13:52Z","date_created":"2020-02-28T10:03:24Z","volume":32,"year":"2019","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"Neural Information Processing Systems Foundation","month":"12","publication_identifier":{"issn":["1049-5258"]},"conference":{"name":"NIPS: Conference on Neural Information Processing Systems","end_date":"2019-12-14","location":"Vancouver, Canada","start_date":"2019-12-08"},"language":[{"iso":"eng"}],"external_id":{"arxiv":["1909.02253"],"isi":["000534424300084"]},"main_file_link":[{"open_access":"1","url":"http://papers.nips.cc/paper/8379-powerset-convolutional-neural-networks"}],"oa":1,"isi":1,"quality_controlled":"1","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning"}],"abstract":[{"lang":"eng","text":"We present a novel class of convolutional neural networks (CNNs) for set functions,i.e., data indexed with the powerset of a finite set. The convolutions are derivedas linear, shift-equivariant functions for various notions of shifts on set functions.The framework is fundamentally different from graph convolutions based on theLaplacian, as it provides not one but several basic shifts, one for each element inthe ground set. Prototypical experiments with several set function classificationtasks on synthetic datasets and on datasets derived from real-world hypergraphsdemonstrate the potential of our new powerset CNNs."}],"type":"conference","oa_version":"Published Version","_id":"7542","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","status":"public","title":"Powerset convolutional neural networks","intvolume":" 32","day":"01","article_processing_charge":"No","date_published":"2019-12-01T00:00:00Z","citation":{"chicago":"Wendler, Chris, Dan-Adrian Alistarh, and Markus Püschel. “Powerset Convolutional Neural Networks,” 32:927–38. Neural Information Processing Systems Foundation, 2019.","mla":"Wendler, Chris, et al. Powerset Convolutional Neural Networks. Vol. 32, Neural Information Processing Systems Foundation, 2019, pp. 927–38.","short":"C. Wendler, D.-A. Alistarh, M. Püschel, in:, Neural Information Processing Systems Foundation, 2019, pp. 927–938.","ista":"Wendler C, Alistarh D-A, Püschel M. 2019. Powerset convolutional neural networks. NIPS: Conference on Neural Information Processing Systems vol. 32, 927–938.","ieee":"C. Wendler, D.-A. Alistarh, and M. Püschel, “Powerset convolutional neural networks,” presented at the NIPS: Conference on Neural Information Processing Systems, Vancouver, Canada, 2019, vol. 32, pp. 927–938.","apa":"Wendler, C., Alistarh, D.-A., & Püschel, M. (2019). Powerset convolutional neural networks (Vol. 32, pp. 927–938). Presented at the NIPS: Conference on Neural Information Processing Systems, Vancouver, Canada: Neural Information Processing Systems Foundation.","ama":"Wendler C, Alistarh D-A, Püschel M. Powerset convolutional neural networks. In: Vol 32. Neural Information Processing Systems Foundation; 2019:927-938."},"page":"927-938"},{"month":"06","publication_identifier":{"isbn":["9781450367059"]},"language":[{"iso":"eng"}],"conference":{"location":"Phoenix, AZ, United States","start_date":"2019-06-23","end_date":"2019-06-26","name":"STOC: Symposium on Theory of Computing"},"doi":"10.1145/3313276.3316407","quality_controlled":"1","isi":1,"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1811.01421"}],"external_id":{"isi":["000523199100089"],"arxiv":["1811.01421"]},"oa":1,"date_created":"2019-07-24T09:13:05Z","date_updated":"2023-12-13T12:28:28Z","author":[{"full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian"},{"full_name":"Aspnes, James","last_name":"Aspnes","first_name":"James"},{"last_name":"Ellen","first_name":"Faith","full_name":"Ellen, Faith"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"},{"full_name":"Zhu, Leqi","last_name":"Zhu","first_name":"Leqi"}],"related_material":{"record":[{"status":"public","relation":"later_version","id":"14364"}]},"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"ACM Press","year":"2019","day":"01","article_processing_charge":"No","scopus_import":"1","date_published":"2019-06-01T00:00:00Z","page":"986-996","publication":"Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing","citation":{"chicago":"Alistarh, Dan-Adrian, James Aspnes, Faith Ellen, Rati Gelashvili, and Leqi Zhu. “Why Extension-Based Proofs Fail.” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 986–96. ACM Press, 2019. https://doi.org/10.1145/3313276.3316407.","short":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, L. Zhu, in:, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, ACM Press, 2019, pp. 986–996.","mla":"Alistarh, Dan-Adrian, et al. “Why Extension-Based Proofs Fail.” Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, ACM Press, 2019, pp. 986–96, doi:10.1145/3313276.3316407.","apa":"Alistarh, D.-A., Aspnes, J., Ellen, F., Gelashvili, R., & Zhu, L. (2019). Why extension-based proofs fail. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (pp. 986–996). Phoenix, AZ, United States: ACM Press. https://doi.org/10.1145/3313276.3316407","ieee":"D.-A. Alistarh, J. Aspnes, F. Ellen, R. Gelashvili, and L. Zhu, “Why extension-based proofs fail,” in Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, Phoenix, AZ, United States, 2019, pp. 986–996.","ista":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. 2019. Why extension-based proofs fail. Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. STOC: Symposium on Theory of Computing, 986–996.","ama":"Alistarh D-A, Aspnes J, Ellen F, Gelashvili R, Zhu L. Why extension-based proofs fail. In: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. ACM Press; 2019:986-996. doi:10.1145/3313276.3316407"},"abstract":[{"lang":"eng","text":"It is impossible to deterministically solve wait-free consensus in an asynchronous system. The classic proof uses a valency argument, which constructs an infinite execution by repeatedly extending a finite execution. We introduce extension-based proofs, a class of impossibility proofs that are modelled as an interaction between a prover and a protocol and that include valency arguments.\r\n\r\nUsing proofs based on combinatorial topology, it has been shown that it is impossible to deterministically solve k-set agreement among n > k ≥ 2 processes in a wait-free manner. However, it was unknown whether proofs based on simpler techniques were possible. We show that this impossibility result cannot be obtained by an extension-based proof and, hence, extension-based proofs are limited in power."}],"type":"conference","oa_version":"Preprint","title":"Why extension-based proofs fail","status":"public","_id":"6676","user_id":"4359f0d1-fa6c-11eb-b949-802e58b17ae8"},{"department":[{"_id":"DaAl"}],"publisher":"Springer","publication_status":"published","year":"2018","volume":31,"date_created":"2018-12-11T11:47:01Z","date_updated":"2023-02-23T12:23:25Z","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Aspnes","first_name":"James","full_name":"Aspnes, James"},{"last_name":"King","first_name":"Valerie","full_name":"King, Valerie"},{"full_name":"Saia, Jared","last_name":"Saia","first_name":"Jared"}],"publist_id":"7281","file_date_updated":"2020-07-14T12:46:38Z","project":[{"_id":"B67AFEDC-15C9-11EA-A837-991A96BB2854","name":"IST Austria Open Access Fund"}],"quality_controlled":"1","oa":1,"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png"},"language":[{"iso":"eng"}],"doi":"10.1007/s00446-017-0315-1","publication_identifier":{"issn":["01782770"]},"month":"11","intvolume":" 31","title":"Communication-efficient randomized consensus","ddc":["000"],"status":"public","_id":"536","user_id":"3E5EF7F0-F248-11E8-B48F-1D18A9856A87","file":[{"file_name":"2017_DistribComp_Alistarh.pdf","access_level":"open_access","creator":"dernst","file_size":595707,"content_type":"application/pdf","file_id":"5867","relation":"main_file","date_created":"2019-01-22T07:25:51Z","date_updated":"2020-07-14T12:46:38Z","checksum":"69b46e537acdcac745237ddb853fcbb5"}],"oa_version":"Published Version","type":"journal_article","issue":"6","abstract":[{"lang":"eng","text":"We consider the problem of consensus in the challenging classic model. In this model, the adversary is adaptive; it can choose which processors crash at any point during the course of the algorithm. Further, communication is via asynchronous message passing: there is no known upper bound on the time to send a message from one processor to another, and all messages and coin flips are seen by the adversary. We describe a new randomized consensus protocol with expected message complexity O(n2log2n) when fewer than n / 2 processes may fail by crashing. This is an almost-linear improvement over the best previously known protocol, and within logarithmic factors of a known Ω(n2) message lower bound. The protocol further ensures that no process sends more than O(nlog3n) messages in expectation, which is again within logarithmic factors of optimal. We also present a generalization of the algorithm to an arbitrary number of failures t, which uses expected O(nt+t2log2t) total messages. Our approach is to build a message-efficient, resilient mechanism for aggregating individual processor votes, implementing the message-passing equivalent of a weak shared coin. Roughly, in our protocol, a processor first announces its votes to small groups, then propagates them to increasingly larger groups as it generates more and more votes. To bound the number of messages that an individual process might have to send or receive, the protocol progressively increases the weight of generated votes. The main technical challenge is bounding the impact of votes that are still “in flight” (generated, but not fully propagated) on the final outcome of the shared coin, especially since such votes might have different weights. We achieve this by leveraging the structure of the algorithm, and a technical argument based on martingale concentration bounds. Overall, we show that it is possible to build an efficient message-passing implementation of a shared coin, and in the process (almost-optimally) solve the classic consensus problem in the asynchronous message-passing model."}],"page":"489-501","citation":{"ista":"Alistarh D-A, Aspnes J, King V, Saia J. 2018. Communication-efficient randomized consensus. Distributed Computing. 31(6), 489–501.","apa":"Alistarh, D.-A., Aspnes, J., King, V., & Saia, J. (2018). Communication-efficient randomized consensus. Distributed Computing. Springer. https://doi.org/10.1007/s00446-017-0315-1","ieee":"D.-A. Alistarh, J. Aspnes, V. King, and J. Saia, “Communication-efficient randomized consensus,” Distributed Computing, vol. 31, no. 6. Springer, pp. 489–501, 2018.","ama":"Alistarh D-A, Aspnes J, King V, Saia J. Communication-efficient randomized consensus. Distributed Computing. 2018;31(6):489-501. doi:10.1007/s00446-017-0315-1","chicago":"Alistarh, Dan-Adrian, James Aspnes, Valerie King, and Jared Saia. “Communication-Efficient Randomized Consensus.” Distributed Computing. Springer, 2018. https://doi.org/10.1007/s00446-017-0315-1.","mla":"Alistarh, Dan-Adrian, et al. “Communication-Efficient Randomized Consensus.” Distributed Computing, vol. 31, no. 6, Springer, 2018, pp. 489–501, doi:10.1007/s00446-017-0315-1.","short":"D.-A. Alistarh, J. Aspnes, V. King, J. Saia, Distributed Computing 31 (2018) 489–501."},"publication":"Distributed Computing","date_published":"2018-11-01T00:00:00Z","scopus_import":1,"has_accepted_license":"1","article_processing_charge":"Yes (via OA deal)","day":"01"},{"doi":"10.5441/002/EDBT.2018.14","conference":{"location":"Vienna, Austria","start_date":"2018-03-26","end_date":"2018-03-29","name":"EDBT: Conference on Extending Database Technology"},"language":[{"iso":"eng"}],"tmp":{"name":"Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)","legal_code_url":"https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode","short":"CC BY-NC-ND (4.0)","image":"/images/cc_by_nc_nd.png"},"oa":1,"quality_controlled":"1","publication_identifier":{"isbn":["9783893180783"],"issn":["2367-2005"]},"month":"03","author":[{"full_name":"Grubic, Demjan","last_name":"Grubic","first_name":"Demjan"},{"first_name":"Leo","last_name":"Tam","full_name":"Tam, Leo"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"last_name":"Zhang","first_name":"Ce","full_name":"Zhang, Ce"}],"date_created":"2019-11-26T14:19:11Z","date_updated":"2023-02-23T12:59:17Z","year":"2018","publisher":"OpenProceedings","department":[{"_id":"DaAl"}],"publication_status":"published","file_date_updated":"2020-07-14T12:47:49Z","license":"https://creativecommons.org/licenses/by-nc-nd/4.0/","date_published":"2018-03-26T00:00:00Z","citation":{"chicago":"Grubic, Demjan, Leo Tam, Dan-Adrian Alistarh, and Ce Zhang. “Synchronous Multi-GPU Training for Deep Learning with Low-Precision Communications: An Empirical Study.” In Proceedings of the 21st International Conference on Extending Database Technology, 145–56. OpenProceedings, 2018. https://doi.org/10.5441/002/EDBT.2018.14.","short":"D. Grubic, L. Tam, D.-A. Alistarh, C. Zhang, in:, Proceedings of the 21st International Conference on Extending Database Technology, OpenProceedings, 2018, pp. 145–156.","mla":"Grubic, Demjan, et al. “Synchronous Multi-GPU Training for Deep Learning with Low-Precision Communications: An Empirical Study.” Proceedings of the 21st International Conference on Extending Database Technology, OpenProceedings, 2018, pp. 145–56, doi:10.5441/002/EDBT.2018.14.","apa":"Grubic, D., Tam, L., Alistarh, D.-A., & Zhang, C. (2018). Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. In Proceedings of the 21st International Conference on Extending Database Technology (pp. 145–156). Vienna, Austria: OpenProceedings. https://doi.org/10.5441/002/EDBT.2018.14","ieee":"D. Grubic, L. Tam, D.-A. Alistarh, and C. Zhang, “Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study,” in Proceedings of the 21st International Conference on Extending Database Technology, Vienna, Austria, 2018, pp. 145–156.","ista":"Grubic D, Tam L, Alistarh D-A, Zhang C. 2018. Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. Proceedings of the 21st International Conference on Extending Database Technology. EDBT: Conference on Extending Database Technology, 145–156.","ama":"Grubic D, Tam L, Alistarh D-A, Zhang C. Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study. In: Proceedings of the 21st International Conference on Extending Database Technology. OpenProceedings; 2018:145-156. doi:10.5441/002/EDBT.2018.14"},"publication":"Proceedings of the 21st International Conference on Extending Database Technology","page":"145-156","has_accepted_license":"1","article_processing_charge":"No","day":"26","scopus_import":1,"file":[{"date_created":"2019-11-26T14:23:04Z","date_updated":"2020-07-14T12:47:49Z","checksum":"ec979b56abc71016d6e6adfdadbb4afe","file_id":"7118","relation":"main_file","creator":"dernst","file_size":1603204,"content_type":"application/pdf","file_name":"2018_OpenProceedings_Grubic.pdf","access_level":"open_access"}],"oa_version":"Published Version","_id":"7116","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Synchronous multi-GPU training for deep learning with low-precision communications: An empirical study","ddc":["000"],"status":"public","abstract":[{"text":"Training deep learning models has received tremendous research interest recently. In particular, there has been intensive research on reducing the communication cost of training when using multiple computational devices, through reducing the precision of the underlying data representation. Naturally, such methods induce system trade-offs—lowering communication precision could de-crease communication overheads and improve scalability; but, on the other hand, it can also reduce the accuracy of training. In this paper, we study this trade-off space, and ask:Can low-precision communication consistently improve the end-to-end performance of training modern neural networks, with no accuracy loss?From the performance point of view, the answer to this question may appear deceptively easy: compressing communication through low precision should help when the ratio between communication and computation is high. However, this answer is less straightforward when we try to generalize this principle across various neural network architectures (e.g., AlexNet vs. ResNet),number of GPUs (e.g., 2 vs. 8 GPUs), machine configurations(e.g., EC2 instances vs. NVIDIA DGX-1), communication primitives (e.g., MPI vs. NCCL), and even different GPU architectures(e.g., Kepler vs. Pascal). Currently, it is not clear how a realistic realization of all these factors maps to the speed up provided by low-precision communication. In this paper, we conduct an empirical study to answer this question and report the insights.","lang":"eng"}],"type":"conference"},{"scopus_import":1,"month":"09","day":"01","publication_identifier":{"issn":["2329-4949"]},"quality_controlled":"1","publication":"ACM Transactions on Parallel Computing","citation":{"mla":"Alistarh, Dan-Adrian, et al. “ThreadScan: Automatic and Scalable Memory Reclamation.” ACM Transactions on Parallel Computing, vol. 4, no. 4, 18, Association for Computing Machinery, 2018, doi:10.1145/3201897.","short":"D.-A. Alistarh, W. Leiserson, A. Matveev, N. Shavit, ACM Transactions on Parallel Computing 4 (2018).","chicago":"Alistarh, Dan-Adrian, William Leiserson, Alexander Matveev, and Nir Shavit. “ThreadScan: Automatic and Scalable Memory Reclamation.” ACM Transactions on Parallel Computing. Association for Computing Machinery, 2018. https://doi.org/10.1145/3201897.","ama":"Alistarh D-A, Leiserson W, Matveev A, Shavit N. ThreadScan: Automatic and scalable memory reclamation. ACM Transactions on Parallel Computing. 2018;4(4). doi:10.1145/3201897","ista":"Alistarh D-A, Leiserson W, Matveev A, Shavit N. 2018. ThreadScan: Automatic and scalable memory reclamation. ACM Transactions on Parallel Computing. 4(4), 18.","apa":"Alistarh, D.-A., Leiserson, W., Matveev, A., & Shavit, N. (2018). ThreadScan: Automatic and scalable memory reclamation. ACM Transactions on Parallel Computing. Association for Computing Machinery. https://doi.org/10.1145/3201897","ieee":"D.-A. Alistarh, W. Leiserson, A. Matveev, and N. Shavit, “ThreadScan: Automatic and scalable memory reclamation,” ACM Transactions on Parallel Computing, vol. 4, no. 4. Association for Computing Machinery, 2018."},"language":[{"iso":"eng"}],"doi":"10.1145/3201897","date_published":"2018-09-01T00:00:00Z","article_number":"18","type":"journal_article","abstract":[{"lang":"eng","text":"The concurrent memory reclamation problem is that of devising a way for a deallocating thread to verify that no other concurrent threads hold references to a memory block being deallocated. To date, in the absence of automatic garbage collection, there is no satisfactory solution to this problem; existing tracking methods like hazard pointers, reference counters, or epoch-based techniques like RCU are either prohibitively expensive or require significant programming expertise to the extent that implementing them efficiently can be worthy of a publication. None of the existing techniques are automatic or even semi-automated.\r\nIn this article, we take a new approach to concurrent memory reclamation. Instead of manually tracking access to memory locations as done in techniques like hazard pointers, or restricting shared accesses to specific epoch boundaries as in RCU, our algorithm, called ThreadScan, leverages operating system signaling to automatically detect which memory locations are being accessed by concurrent threads.\r\nInitial empirical evidence shows that ThreadScan scales surprisingly well and requires negligible programming effort beyond the standard use of Malloc and Free."}],"issue":"4","publication_status":"published","title":"ThreadScan: Automatic and scalable memory reclamation","status":"public","intvolume":" 4","department":[{"_id":"DaAl"}],"publisher":"Association for Computing Machinery","year":"2018","_id":"6001","user_id":"3E5EF7F0-F248-11E8-B48F-1D18A9856A87","date_updated":"2023-02-23T13:17:54Z","date_created":"2019-02-14T13:24:11Z","volume":4,"oa_version":"None","author":[{"full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian"},{"full_name":"Leiserson, William","first_name":"William","last_name":"Leiserson"},{"full_name":"Matveev, Alexander","first_name":"Alexander","last_name":"Matveev"},{"last_name":"Shavit","first_name":"Nir","full_name":"Shavit, Nir"}],"related_material":{"record":[{"id":"779","relation":"earlier_version","status":"public"}]}},{"month":"05","day":"01","article_processing_charge":"No","has_accepted_license":"1","scopus_import":1,"conference":{"end_date":"2018-05-03","start_date":"2018-04-30","location":"Vancouver, Canada","name":"ICLR: International Conference on Learning Representations"},"date_published":"2018-05-01T00:00:00Z","language":[{"iso":"eng"}],"publication":"6th International Conference on Learning Representations","oa":1,"external_id":{"arxiv":["1802.05668"]},"citation":{"apa":"Polino, A., Pascanu, R., & Alistarh, D.-A. (2018). Model compression via distillation and quantization. In 6th International Conference on Learning Representations. Vancouver, Canada.","ieee":"A. Polino, R. Pascanu, and D.-A. Alistarh, “Model compression via distillation and quantization,” in 6th International Conference on Learning Representations, Vancouver, Canada, 2018.","ista":"Polino A, Pascanu R, Alistarh D-A. 2018. Model compression via distillation and quantization. 6th International Conference on Learning Representations. ICLR: International Conference on Learning Representations.","ama":"Polino A, Pascanu R, Alistarh D-A. Model compression via distillation and quantization. In: 6th International Conference on Learning Representations. ; 2018.","chicago":"Polino, Antonio, Razvan Pascanu, and Dan-Adrian Alistarh. “Model Compression via Distillation and Quantization.” In 6th International Conference on Learning Representations, 2018.","short":"A. Polino, R. Pascanu, D.-A. Alistarh, in:, 6th International Conference on Learning Representations, 2018.","mla":"Polino, Antonio, et al. “Model Compression via Distillation and Quantization.” 6th International Conference on Learning Representations, 2018."},"quality_controlled":"1","file_date_updated":"2020-07-14T12:48:03Z","abstract":[{"lang":"eng","text":"Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks. The first method we propose is called quantized distillation and leverages distillation during the training process, by incorporating distillation loss, expressed with respect to the teacher, into the training of a student network whose weights are quantized to a limited set of levels. The second method, differentiable quantization, optimizes the location of quantization points through stochastic gradient descent, to better fit the behavior of the teacher model. We validate both methods through experiments on convolutional and recurrent architectures. We show that quantized shallow students can reach similar accuracy levels to full-precision teacher models, while providing order of magnitude compression, and inference speedup that is linear in the depth reduction. In sum, our results enable DNNs for resource-constrained environments to leverage architecture and accuracy advances developed on more powerful devices."}],"type":"conference","author":[{"last_name":"Polino","first_name":"Antonio","full_name":"Polino, Antonio"},{"last_name":"Pascanu","first_name":"Razvan","full_name":"Pascanu, Razvan"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian"}],"date_created":"2020-05-10T22:00:51Z","date_updated":"2023-02-23T13:18:41Z","file":[{"checksum":"a4336c167978e81891970e4e4517a8c3","date_updated":"2020-07-14T12:48:03Z","date_created":"2020-05-26T13:02:00Z","file_id":"7894","relation":"main_file","creator":"dernst","content_type":"application/pdf","file_size":308339,"access_level":"open_access","file_name":"2018_ICLR_Polino.pdf"}],"oa_version":"Published Version","_id":"7812","year":"2018","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","ddc":["000"],"publication_status":"published","title":"Model compression via distillation and quantization","department":[{"_id":"DaAl"}]},{"oa_version":"Preprint","status":"public","title":"The convergence of stochastic gradient descent in asynchronous shared memory","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"5962","abstract":[{"lang":"eng","text":"Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from execution in a distributed environment. However, surprisingly, the convergence properties of this classic algorithm in the standard shared-memory model are still not well-understood. In this work, we address this gap, and provide new convergence bounds for lock-free concurrent stochastic gradient descent, executing in the classic asynchronous shared memory model, against a strong adaptive adversary. Our results give improved upper and lower bounds on the \"price of asynchrony'' when executing the fundamental SGD algorithm in a concurrent setting. They show that this classic optimization tool can converge faster and with a wider range of parameters than previously known under asynchronous iterations. At the same time, we exhibit a fundamental trade-off between the maximum delay in the system and the rate at which SGD can converge, which governs the set of parameters under which this algorithm can still work efficiently."}],"type":"conference","date_published":"2018-07-23T00:00:00Z","page":"169-178","publication":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC '18","citation":{"ama":"Alistarh D-A, De Sa C, Konstantinov NH. The convergence of stochastic gradient descent in asynchronous shared memory. In: Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. ACM Press; 2018:169-178. doi:10.1145/3212734.3212763","ieee":"D.-A. Alistarh, C. De Sa, and N. H. Konstantinov, “The convergence of stochastic gradient descent in asynchronous shared memory,” in Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, Egham, United Kingdom, 2018, pp. 169–178.","apa":"Alistarh, D.-A., De Sa, C., & Konstantinov, N. H. (2018). The convergence of stochastic gradient descent in asynchronous shared memory. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18 (pp. 169–178). Egham, United Kingdom: ACM Press. https://doi.org/10.1145/3212734.3212763","ista":"Alistarh D-A, De Sa C, Konstantinov NH. 2018. The convergence of stochastic gradient descent in asynchronous shared memory. Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. PODC: Principles of Distributed Computing, 169–178.","short":"D.-A. Alistarh, C. De Sa, N.H. Konstantinov, in:, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 169–178.","mla":"Alistarh, Dan-Adrian, et al. “The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory.” Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 169–78, doi:10.1145/3212734.3212763.","chicago":"Alistarh, Dan-Adrian, Christopher De Sa, and Nikola H Konstantinov. “The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory.” In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, 169–78. ACM Press, 2018. https://doi.org/10.1145/3212734.3212763."},"day":"23","article_processing_charge":"No","scopus_import":"1","date_updated":"2023-09-19T10:42:53Z","date_created":"2019-02-13T09:58:58Z","author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"De Sa, Christopher","last_name":"De Sa","first_name":"Christopher"},{"full_name":"Konstantinov, Nikola H","last_name":"Konstantinov","first_name":"Nikola H","id":"4B9D76E4-F248-11E8-B48F-1D18A9856A87"}],"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"ACM Press","year":"2018","language":[{"iso":"eng"}],"conference":{"name":"PODC: Principles of Distributed Computing","end_date":"2018-07-27","start_date":"2018-07-23","location":"Egham, United Kingdom"},"doi":"10.1145/3212734.3212763","quality_controlled":"1","isi":1,"oa":1,"external_id":{"arxiv":["1803.08841"],"isi":["000458186900022"]},"main_file_link":[{"url":"https://arxiv.org/abs/1803.08841","open_access":"1"}],"month":"07","publication_identifier":{"isbn":["9781450357951"]}},{"publication_status":"published","status":"public","title":"A brief tutorial on distributed and concurrent machine learning","department":[{"_id":"DaAl"}],"publisher":"ACM Press","_id":"5961","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","year":"2018","date_created":"2019-02-13T09:48:55Z","date_updated":"2023-09-19T10:42:28Z","oa_version":"None","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"type":"conference","abstract":[{"text":"The area of machine learning has made considerable progress over the past decade, enabled by the widespread availability of large datasets, as well as by improved algorithms and models. Given the large computational demands of machine learning workloads, parallelism, implemented either through single-node concurrency or through multi-node distribution, has been a third key ingredient to advances in machine learning.\r\nThe goal of this tutorial is to provide the audience with an overview of standard distribution techniques in machine learning, with an eye towards the intriguing trade-offs between synchronization and communication costs of distributed machine learning algorithms, on the one hand, and their convergence, on the other.The tutorial will focus on parallelization strategies for the fundamental stochastic gradient descent (SGD) algorithm, which is a key tool when training machine learning models, from classical instances such as linear regression, to state-of-the-art neural network architectures.\r\nThe tutorial will describe the guarantees provided by this algorithm in the sequential case, and then move on to cover both shared-memory and message-passing parallelization strategies, together with the guarantees they provide, and corresponding trade-offs. The presentation will conclude with a broad overview of ongoing research in distributed and concurrent machine learning. The tutorial will assume no prior knowledge beyond familiarity with basic concepts in algebra and analysis.\r\n","lang":"eng"}],"quality_controlled":"1","isi":1,"page":"487-488","publication":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC '18","external_id":{"isi":["000458186900063"]},"citation":{"apa":"Alistarh, D.-A. (2018). A brief tutorial on distributed and concurrent machine learning. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18 (pp. 487–488). Egham, United Kingdom: ACM Press. https://doi.org/10.1145/3212734.3212798","ieee":"D.-A. Alistarh, “A brief tutorial on distributed and concurrent machine learning,” in Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, Egham, United Kingdom, 2018, pp. 487–488.","ista":"Alistarh D-A. 2018. A brief tutorial on distributed and concurrent machine learning. Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. PODC: Principles of Distributed Computing, 487–488.","ama":"Alistarh D-A. A brief tutorial on distributed and concurrent machine learning. In: Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. ACM Press; 2018:487-488. doi:10.1145/3212734.3212798","chicago":"Alistarh, Dan-Adrian. “A Brief Tutorial on Distributed and Concurrent Machine Learning.” In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, 487–88. ACM Press, 2018. https://doi.org/10.1145/3212734.3212798.","short":"D.-A. Alistarh, in:, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 487–488.","mla":"Alistarh, Dan-Adrian. “A Brief Tutorial on Distributed and Concurrent Machine Learning.” Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 487–88, doi:10.1145/3212734.3212798."},"language":[{"iso":"eng"}],"conference":{"end_date":"2018-07-27","location":"Egham, United Kingdom","start_date":"2018-07-23","name":"PODC: Principles of Distributed Computing"},"date_published":"2018-07-27T00:00:00Z","doi":"10.1145/3212734.3212798","scopus_import":"1","month":"07","day":"27","article_processing_charge":"No","publication_identifier":{"isbn":["9781450357951"]}},{"publication_identifier":{"isbn":["9781450357951"]},"month":"07","language":[{"iso":"eng"}],"doi":"10.1145/3212734.3212756","conference":{"name":"PODC: Principles of Distributed Computing","end_date":"2018-07-27","location":"Egham, United Kingdom","start_date":"2018-07-23"},"isi":1,"quality_controlled":"1","external_id":{"arxiv":["1808.04155"],"isi":["000458186900048"]},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1808.04155"}],"oa":1,"date_updated":"2023-09-19T10:43:21Z","date_created":"2019-02-13T10:03:25Z","author":[{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Brown, Trevor A","id":"3569F0A0-F248-11E8-B48F-1D18A9856A87","first_name":"Trevor A","last_name":"Brown"},{"full_name":"Kopinsky, Justin","first_name":"Justin","last_name":"Kopinsky"},{"full_name":"Nadiradze, Giorgi","first_name":"Giorgi","last_name":"Nadiradze"}],"department":[{"_id":"DaAl"}],"publisher":"ACM Press","publication_status":"published","year":"2018","article_processing_charge":"No","day":"23","scopus_import":"1","date_published":"2018-07-23T00:00:00Z","page":"377-386","citation":{"chicago":"Alistarh, Dan-Adrian, Trevor A Brown, Justin Kopinsky, and Giorgi Nadiradze. “Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms.” In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, 377–86. ACM Press, 2018. https://doi.org/10.1145/3212734.3212756.","mla":"Alistarh, Dan-Adrian, et al. “Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms.” Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 377–86, doi:10.1145/3212734.3212756.","short":"D.-A. Alistarh, T.A. Brown, J. Kopinsky, G. Nadiradze, in:, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 377–386.","ista":"Alistarh D-A, Brown TA, Kopinsky J, Nadiradze G. 2018. Relaxed schedulers can efficiently parallelize iterative algorithms. Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. PODC: Principles of Distributed Computing, 377–386.","apa":"Alistarh, D.-A., Brown, T. A., Kopinsky, J., & Nadiradze, G. (2018). Relaxed schedulers can efficiently parallelize iterative algorithms. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18 (pp. 377–386). Egham, United Kingdom: ACM Press. https://doi.org/10.1145/3212734.3212756","ieee":"D.-A. Alistarh, T. A. Brown, J. Kopinsky, and G. Nadiradze, “Relaxed schedulers can efficiently parallelize iterative algorithms,” in Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, Egham, United Kingdom, 2018, pp. 377–386.","ama":"Alistarh D-A, Brown TA, Kopinsky J, Nadiradze G. Relaxed schedulers can efficiently parallelize iterative algorithms. In: Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. ACM Press; 2018:377-386. doi:10.1145/3212734.3212756"},"publication":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC '18","abstract":[{"lang":"eng","text":"There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which certain iterative task-based algorithms can be efficiently parallelized via relaxed concurrent priority schedulers. These allow for high concurrency when inserting and removing tasks, at the cost of executing superfluous work due to the relaxed semantics of the scheduler. In this work, we take a step towards unifying these two research directions, by showing that there exists a family of relaxed priority schedulers that can efficiently and deterministically execute classic iterative algorithms such as greedy maximal independent set (MIS) and matching. Our primary result shows that, given a randomized scheduler with an expected relaxation factor of k in terms of the maximum allowed priority inversions on a task, and any graph on n vertices, the scheduler is able to execute greedy MIS with only an additive factor of \\poly(k) expected additional iterations compared to an exact (but not scalable) scheduler. This counter-intuitive result demonstrates that the overhead of relaxation when computing MIS is not dependent on the input size or structure of the input graph. Experimental results show that this overhead can be clearly offset by the gain in performance due to the highly scalable scheduler. In sum, we present an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion."}],"type":"conference","oa_version":"Preprint","status":"public","title":"Relaxed schedulers can efficiently parallelize iterative algorithms","_id":"5963","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1"},{"scopus_import":"1","article_processing_charge":"No","day":"16","page":"133-142","citation":{"short":"D.-A. Alistarh, T.A. Brown, J. Kopinsky, J.Z. Li, G. Nadiradze, in:, Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, ACM Press, 2018, pp. 133–142.","mla":"Alistarh, Dan-Adrian, et al. “Distributionally Linearizable Data Structures.” Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, ACM Press, 2018, pp. 133–42, doi:10.1145/3210377.3210411.","chicago":"Alistarh, Dan-Adrian, Trevor A Brown, Justin Kopinsky, Jerry Z. Li, and Giorgi Nadiradze. “Distributionally Linearizable Data Structures.” In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, 133–42. ACM Press, 2018. https://doi.org/10.1145/3210377.3210411.","ama":"Alistarh D-A, Brown TA, Kopinsky J, Li JZ, Nadiradze G. Distributionally linearizable data structures. In: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18. ACM Press; 2018:133-142. doi:10.1145/3210377.3210411","apa":"Alistarh, D.-A., Brown, T. A., Kopinsky, J., Li, J. Z., & Nadiradze, G. (2018). Distributionally linearizable data structures. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18 (pp. 133–142). Vienna, Austria: ACM Press. https://doi.org/10.1145/3210377.3210411","ieee":"D.-A. Alistarh, T. A. Brown, J. Kopinsky, J. Z. Li, and G. Nadiradze, “Distributionally linearizable data structures,” in Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, Vienna, Austria, 2018, pp. 133–142.","ista":"Alistarh D-A, Brown TA, Kopinsky J, Li JZ, Nadiradze G. 2018. Distributionally linearizable data structures. Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18. SPAA: Symposium on Parallelism in Algorithms and Architectures, 133–142."},"publication":"Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA '18","date_published":"2018-07-16T00:00:00Z","type":"conference","abstract":[{"lang":"eng","text":"Relaxed concurrent data structures have become increasingly popular, due to their scalability in graph processing and machine learning applications (\\citeNguyen13, gonzalez2012powergraph ). Despite considerable interest, there exist families of natural, high performing randomized relaxed concurrent data structures, such as the popular MultiQueue~\\citeMQ pattern for implementing relaxed priority queue data structures, for which no guarantees are known in the concurrent setting~\\citeAKLN17. Our main contribution is in showing for the first time that, under a set of analytic assumptions, a family of relaxed concurrent data structures, including variants of MultiQueues, but also a new approximate counting algorithm we call the MultiCounter, provides strong probabilistic guarantees on the degree of relaxation with respect to the sequential specification, in arbitrary concurrent executions. We formalize these guarantees via a new correctness condition called distributional linearizability, tailored to concurrent implementations with randomized relaxations. Our result is based on a new analysis of an asynchronous variant of the classic power-of-two-choices load balancing algorithm, in which placement choices can be based on inconsistent, outdated information (this result may be of independent interest). We validate our results empirically, showing that the MultiCounter algorithm can implement scalable relaxed timestamps."}],"title":"Distributionally linearizable data structures","status":"public","_id":"5965","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","oa_version":"Preprint","publication_identifier":{"isbn":["9781450357999"]},"month":"07","isi":1,"quality_controlled":"1","external_id":{"isi":["000545269600016"],"arxiv":["1804.01018"]},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1804.01018"}],"oa":1,"language":[{"iso":"eng"}],"doi":"10.1145/3210377.3210411","conference":{"name":"SPAA: Symposium on Parallelism in Algorithms and Architectures","location":"Vienna, Austria","start_date":"2018-07-16","end_date":"2018-07-18"},"department":[{"_id":"DaAl"}],"publisher":"ACM Press","publication_status":"published","year":"2018","date_updated":"2023-09-19T10:44:13Z","date_created":"2019-02-13T10:17:19Z","related_material":{"record":[{"relation":"dissertation_contains","status":"public","id":"10429"}]},"author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"full_name":"Brown, Trevor A","last_name":"Brown","first_name":"Trevor A","id":"3569F0A0-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Kopinsky, Justin","last_name":"Kopinsky","first_name":"Justin"},{"first_name":"Jerry Z.","last_name":"Li","full_name":"Li, Jerry Z."},{"full_name":"Nadiradze, Giorgi","first_name":"Giorgi","last_name":"Nadiradze"}]},{"page":"383-392","publication":"Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA '18","citation":{"ama":"Alistarh D-A, Haider SK, Kübler R, Nadiradze G. The transactional conflict problem. In: Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18. ACM Press; 2018:383-392. doi:10.1145/3210377.3210406","ista":"Alistarh D-A, Haider SK, Kübler R, Nadiradze G. 2018. The transactional conflict problem. Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18. SPAA: Symposium on Parallelism in Algorithms and Architectures, 383–392.","ieee":"D.-A. Alistarh, S. K. Haider, R. Kübler, and G. Nadiradze, “The transactional conflict problem,” in Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, Vienna, Austria, 2018, pp. 383–392.","apa":"Alistarh, D.-A., Haider, S. K., Kübler, R., & Nadiradze, G. (2018). The transactional conflict problem. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18 (pp. 383–392). Vienna, Austria: ACM Press. https://doi.org/10.1145/3210377.3210406","mla":"Alistarh, Dan-Adrian, et al. “The Transactional Conflict Problem.” Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, ACM Press, 2018, pp. 383–92, doi:10.1145/3210377.3210406.","short":"D.-A. Alistarh, S.K. Haider, R. Kübler, G. Nadiradze, in:, Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, ACM Press, 2018, pp. 383–392.","chicago":"Alistarh, Dan-Adrian, Syed Kamran Haider, Raphael Kübler, and Giorgi Nadiradze. “The Transactional Conflict Problem.” In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA ’18, 383–92. ACM Press, 2018. https://doi.org/10.1145/3210377.3210406."},"date_published":"2018-07-16T00:00:00Z","scopus_import":"1","day":"16","article_processing_charge":"No","status":"public","title":"The transactional conflict problem","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"5966","oa_version":"Preprint","type":"conference","abstract":[{"text":"The transactional conflict problem arises in transactional systems whenever two or more concurrent transactions clash on a data item. While the standard solution to such conflicts is to immediately abort one of the transactions, some practical systems consider the alternative of delaying conflict resolution for a short interval, which may allow one of the transactions to commit. The challenge in the transactional conflict problem is to choose the optimal length of this delay interval so as to minimize the overall running time penalty for the conflicting transactions. In this paper, we propose a family of optimal online algorithms for the transactional conflict problem. Specifically, we consider variants of this problem which arise in different implementations of transactional systems, namely \"requestor wins'' and \"requestor aborts'' implementations: in the former, the recipient of a coherence request is aborted, whereas in the latter, it is the requestor which has to abort. Both strategies are implemented by real systems. We show that the requestor aborts case can be reduced to a classic instance of the ski rental problem, while the requestor wins case leads to a new version of this classical problem, for which we derive optimal deterministic and randomized algorithms. Moreover, we prove that, under a simplified adversarial model, our algorithms are constant-competitive with the offline optimum in terms of throughput. We validate our algorithmic results empirically through a hardware simulation of hardware transactional memory (HTM), showing that our algorithms can lead to non-trivial performance improvements for classic concurrent data structures.","lang":"eng"}],"isi":1,"quality_controlled":"1","external_id":{"arxiv":["1804.00947"],"isi":["000545269600046"]},"oa":1,"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1804.00947"}],"language":[{"iso":"eng"}],"conference":{"start_date":"2018-07-16","location":"Vienna, Austria","end_date":"2018-07-18","name":"SPAA: Symposium on Parallelism in Algorithms and Architectures"},"doi":"10.1145/3210377.3210406","month":"07","publication_identifier":{"isbn":["9781450357999"]},"publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"ACM Press","year":"2018","date_created":"2019-02-13T10:26:07Z","date_updated":"2023-09-19T10:44:49Z","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Syed Kamran","last_name":"Haider","full_name":"Haider, Syed Kamran"},{"full_name":"Kübler, Raphael","last_name":"Kübler","first_name":"Raphael"},{"first_name":"Giorgi","last_name":"Nadiradze","full_name":"Nadiradze, Giorgi"}]},{"month":"07","publication_identifier":{"isbn":["9781450357951"]},"external_id":{"isi":["000458186900052"]},"main_file_link":[{"open_access":"1","url":"https://hal-univ-lyon3.archives-ouvertes.fr/INRIA/hal-01887733v1"}],"oa":1,"isi":1,"quality_controlled":"1","conference":{"name":"PODC: Principles of Distributed Computing","end_date":"2018-07-27","location":"Egham, United Kingdom","start_date":"2018-07-23"},"doi":"10.1145/3212734.3212785","language":[{"iso":"eng"}],"year":"2018","publication_status":"published","publisher":"ACM Press","department":[{"_id":"DaAl"}],"author":[{"last_name":"Aksenov","first_name":"Vitaly","full_name":"Aksenov, Vitaly"},{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Petr","last_name":"Kuznetsov","full_name":"Kuznetsov, Petr"}],"date_created":"2019-02-13T10:08:19Z","date_updated":"2023-09-19T10:43:45Z","scopus_import":"1","day":"23","article_processing_charge":"No","publication":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC '18","citation":{"chicago":"Aksenov, Vitaly, Dan-Adrian Alistarh, and Petr Kuznetsov. “Brief Announcement: Performance Prediction for Coarse-Grained Locking.” In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, 411–13. ACM Press, 2018. https://doi.org/10.1145/3212734.3212785.","short":"V. Aksenov, D.-A. Alistarh, P. Kuznetsov, in:, Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 411–413.","mla":"Aksenov, Vitaly, et al. “Brief Announcement: Performance Prediction for Coarse-Grained Locking.” Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, ACM Press, 2018, pp. 411–13, doi:10.1145/3212734.3212785.","apa":"Aksenov, V., Alistarh, D.-A., & Kuznetsov, P. (2018). Brief Announcement: Performance prediction for coarse-grained locking. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18 (pp. 411–413). Egham, United Kingdom: ACM Press. https://doi.org/10.1145/3212734.3212785","ieee":"V. Aksenov, D.-A. Alistarh, and P. Kuznetsov, “Brief Announcement: Performance prediction for coarse-grained locking,” in Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18, Egham, United Kingdom, 2018, pp. 411–413.","ista":"Aksenov V, Alistarh D-A, Kuznetsov P. 2018. Brief Announcement: Performance prediction for coarse-grained locking. Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. PODC: Principles of Distributed Computing, 411–413.","ama":"Aksenov V, Alistarh D-A, Kuznetsov P. Brief Announcement: Performance prediction for coarse-grained locking. In: Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing - PODC ’18. ACM Press; 2018:411-413. doi:10.1145/3212734.3212785"},"page":"411-413","date_published":"2018-07-23T00:00:00Z","type":"conference","abstract":[{"text":"A standard design pattern found in many concurrent data structures, such as hash tables or ordered containers, is an alternation of parallelizable sections that incur no data conflicts and critical sections that must run sequentially and are protected with locks. A lock can be viewed as a queue that arbitrates the order in which the critical sections are executed, and a natural question is whether we can use stochastic analysis to predict the resulting throughput. As a preliminary evidence to the affirmative, we describe a simple model that can be used to predict the throughput of coarse-grained lock-based algorithms. We show that our model works well for CLH lock, and we expect it to work for other popular lock designs such as TTAS, MCS, etc.","lang":"eng"}],"user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"5964","title":"Brief Announcement: Performance prediction for coarse-grained locking","status":"public","oa_version":"Submitted Version"},{"publication":"2018 IEEE International Workshop on Signal Processing Systems","citation":{"ista":"Stojanov A, Smith TM, Alistarh D-A, Puschel M. 2018. Fast quantized arithmetic on x86: Trading compute for data movement. 2018 IEEE International Workshop on Signal Processing Systems. SiPS: Workshop on Signal Processing Systems vol. 2018–October, 8598402.","ieee":"A. Stojanov, T. M. Smith, D.-A. Alistarh, and M. Puschel, “Fast quantized arithmetic on x86: Trading compute for data movement,” in 2018 IEEE International Workshop on Signal Processing Systems, Cape Town, South Africa, 2018, vol. 2018–October.","apa":"Stojanov, A., Smith, T. M., Alistarh, D.-A., & Puschel, M. (2018). Fast quantized arithmetic on x86: Trading compute for data movement. In 2018 IEEE International Workshop on Signal Processing Systems (Vol. 2018–October). Cape Town, South Africa: IEEE. https://doi.org/10.1109/SiPS.2018.8598402","ama":"Stojanov A, Smith TM, Alistarh D-A, Puschel M. Fast quantized arithmetic on x86: Trading compute for data movement. In: 2018 IEEE International Workshop on Signal Processing Systems. Vol 2018-October. IEEE; 2018. doi:10.1109/SiPS.2018.8598402","chicago":"Stojanov, Alen, Tyler Michael Smith, Dan-Adrian Alistarh, and Markus Puschel. “Fast Quantized Arithmetic on X86: Trading Compute for Data Movement.” In 2018 IEEE International Workshop on Signal Processing Systems, Vol. 2018–October. IEEE, 2018. https://doi.org/10.1109/SiPS.2018.8598402.","mla":"Stojanov, Alen, et al. “Fast Quantized Arithmetic on X86: Trading Compute for Data Movement.” 2018 IEEE International Workshop on Signal Processing Systems, vol. 2018–October, 8598402, IEEE, 2018, doi:10.1109/SiPS.2018.8598402.","short":"A. Stojanov, T.M. Smith, D.-A. Alistarh, M. Puschel, in:, 2018 IEEE International Workshop on Signal Processing Systems, IEEE, 2018."},"external_id":{"isi":["000465106800060"]},"isi":1,"quality_controlled":"1","conference":{"location":"Cape Town, South Africa","start_date":"2018-10-21","end_date":"2018-10-24","name":"SiPS: Workshop on Signal Processing Systems"},"date_published":"2018-12-31T00:00:00Z","doi":"10.1109/SiPS.2018.8598402","language":[{"iso":"eng"}],"scopus_import":"1","day":"31","month":"12","article_processing_charge":"No","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","_id":"6031","year":"2018","title":"Fast quantized arithmetic on x86: Trading compute for data movement","status":"public","publication_status":"published","publisher":"IEEE","department":[{"_id":"DaAl"}],"author":[{"full_name":"Stojanov, Alen","last_name":"Stojanov","first_name":"Alen"},{"full_name":"Smith, Tyler Michael","first_name":"Tyler Michael","last_name":"Smith"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Puschel","first_name":"Markus","full_name":"Puschel, Markus"}],"date_updated":"2023-09-19T14:41:51Z","date_created":"2019-02-17T22:59:25Z","volume":"2018-October","oa_version":"None","article_number":"8598402","type":"conference","abstract":[{"text":"We introduce Clover, a new library for efficient computation using low-precision data, providing mathematical routines required by fundamental methods in optimization and sparse recovery. Our library faithfully implements variants of stochastic quantization that guarantee convergence at low precision, and supports data formats from 4-bit quantized to 32-bit IEEE-754 on current Intel processors. In particular, we show that 4-bit can be implemented efficiently using Intel AVX despite the lack of native support for this data format. Experimental results with dot product, matrix-vector multiplication (MVM), gradient descent (GD), and iterative hard thresholding (IHT) demonstrate that the attainable speedups are in many cases close to linear with respect to the reduction of precision due to reduced data movement. Finally, for GD and IHT, we show examples of absolute speedup achieved by 4-bit versus 32-bit, by iterating until a given target error is achieved.","lang":"eng"}]},{"day":"30","month":"01","publication_identifier":{"isbn":["9781611975031"]},"article_processing_charge":"No","isi":1,"quality_controlled":"1","page":"2221-2239","publication":"Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms","citation":{"chicago":"Alistarh, Dan-Adrian, James Aspnes, and Rati Gelashvili. “Space-Optimal Majority in Population Protocols.” In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, 2221–39. ACM, 2018. https://doi.org/10.1137/1.9781611975031.144.","mla":"Alistarh, Dan-Adrian, et al. “Space-Optimal Majority in Population Protocols.” Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, 2018, pp. 2221–39, doi:10.1137/1.9781611975031.144.","short":"D.-A. Alistarh, J. Aspnes, R. Gelashvili, in:, Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, 2018, pp. 2221–2239.","ista":"Alistarh D-A, Aspnes J, Gelashvili R. 2018. Space-optimal majority in population protocols. Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms. SODA: Symposium on Discrete Algorithms, 2221–2239.","apa":"Alistarh, D.-A., Aspnes, J., & Gelashvili, R. (2018). Space-optimal majority in population protocols. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 2221–2239). New Orleans, LA, United States: ACM. https://doi.org/10.1137/1.9781611975031.144","ieee":"D.-A. Alistarh, J. Aspnes, and R. Gelashvili, “Space-optimal majority in population protocols,” in Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, United States, 2018, pp. 2221–2239.","ama":"Alistarh D-A, Aspnes J, Gelashvili R. Space-optimal majority in population protocols. In: Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM; 2018:2221-2239. doi:10.1137/1.9781611975031.144"},"oa":1,"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1704.04947"}],"external_id":{"arxiv":["1704.04947"],"isi":["000483921200145"]},"language":[{"iso":"eng"}],"conference":{"name":"SODA: Symposium on Discrete Algorithms","end_date":"2018-01-10","start_date":"2018-01-07","location":"New Orleans, LA, United States"},"doi":"10.1137/1.9781611975031.144","date_published":"2018-01-30T00:00:00Z","type":"conference","abstract":[{"lang":"eng","text":"Population protocols are a popular model of distributed computing, in which n agents with limited local state interact randomly, and cooperate to collectively compute global predicates. Inspired by recent developments in DNA programming, an extensive series of papers, across different communities, has examined the computability and complexity characteristics of this model. Majority, or consensus, is a central task in this model, in which agents need to collectively reach a decision as to which one of two states A or B had a higher initial count. Two metrics are important: the time that a protocol requires to stabilize to an output decision, and the state space size that each agent requires to do so. It is known that majority requires Ω(log log n) states per agent to allow for fast (poly-logarithmic time) stabilization, and that O(log2 n) states are sufficient. Thus, there is an exponential gap between the space upper and lower bounds for this problem. This paper addresses this question.\r\n\r\nOn the negative side, we provide a new lower bound of Ω(log n) states for any protocol which stabilizes in O(n1–c) expected time, for any constant c > 0. This result is conditional on monotonicity and output assumptions, satisfied by all known protocols. Technically, it represents a departure from previous lower bounds, in that it does not rely on the existence of dense configurations. Instead, we introduce a new generalized surgery technique to prove the existence of incorrect executions for any algorithm which would contradict the lower bound. Subsequently, our lower bound also applies to general initial configurations, including ones with a leader. On the positive side, we give a new algorithm for majority which uses O(log n) states, and stabilizes in O(log2 n) expected time. Central to the algorithm is a new leaderless phase clock technique, which allows agents to synchronize in phases of Θ(n log n) consecutive interactions using O(log n) states per agent, exploiting a new connection between population protocols and power-of-two-choices load balancing mechanisms. We also employ our phase clock to build a leader election algorithm with a state space of size O(log n), which stabilizes in O(log2 n) expected time."}],"publication_status":"published","status":"public","title":"Space-optimal majority in population protocols","publisher":"ACM","department":[{"_id":"DaAl"}],"_id":"7123","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","year":"2018","date_created":"2019-11-26T15:10:55Z","date_updated":"2023-09-19T15:03:16Z","oa_version":"Preprint","author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Aspnes","first_name":"James","full_name":"Aspnes, James"},{"first_name":"Rati","last_name":"Gelashvili","full_name":"Gelashvili, Rati"}]},{"year":"2018","publication_status":"published","publisher":"Neural Information Processing Systems Foundation","department":[{"_id":"DaAl"}],"author":[{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Zeyuan","last_name":"Allen-Zhu","full_name":"Allen-Zhu, Zeyuan"},{"first_name":"Jerry","last_name":"Li","full_name":"Li, Jerry"}],"date_updated":"2023-09-19T15:12:45Z","date_created":"2019-06-13T08:22:37Z","volume":2018,"external_id":{"isi":["000461823304061"],"arxiv":["1803.08917"]},"oa":1,"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1803.08917"}],"isi":1,"quality_controlled":"1","conference":{"name":"NeurIPS: Conference on Neural Information Processing Systems","location":"Montreal, Canada","start_date":"2018-12-02","end_date":"2018-12-08"},"language":[{"iso":"eng"}],"month":"12","_id":"6558","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","status":"public","title":"Byzantine stochastic gradient descent","intvolume":" 2018","oa_version":"Published Version","type":"conference","abstract":[{"text":"This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of m machines which allegedly compute stochastic gradients every iteration, an α-fraction are Byzantine, and may behave adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds ε-approximate minimizers of convex functions in T=O~(1/ε²m+α²/ε²) iterations. In contrast, traditional mini-batch SGD needs T=O(1/ε²m) iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sample complexity and time complexity.","lang":"eng"}],"publication":"Advances in Neural Information Processing Systems","citation":{"short":"D.-A. Alistarh, Z. Allen-Zhu, J. Li, in:, Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2018, pp. 4613–4623.","mla":"Alistarh, Dan-Adrian, et al. “Byzantine Stochastic Gradient Descent.” Advances in Neural Information Processing Systems, vol. 2018, Neural Information Processing Systems Foundation, 2018, pp. 4613–23.","chicago":"Alistarh, Dan-Adrian, Zeyuan Allen-Zhu, and Jerry Li. “Byzantine Stochastic Gradient Descent.” In Advances in Neural Information Processing Systems, 2018:4613–23. Neural Information Processing Systems Foundation, 2018.","ama":"Alistarh D-A, Allen-Zhu Z, Li J. Byzantine stochastic gradient descent. In: Advances in Neural Information Processing Systems. Vol 2018. Neural Information Processing Systems Foundation; 2018:4613-4623.","apa":"Alistarh, D.-A., Allen-Zhu, Z., & Li, J. (2018). Byzantine stochastic gradient descent. In Advances in Neural Information Processing Systems (Vol. 2018, pp. 4613–4623). Montreal, Canada: Neural Information Processing Systems Foundation.","ieee":"D.-A. Alistarh, Z. Allen-Zhu, and J. Li, “Byzantine stochastic gradient descent,” in Advances in Neural Information Processing Systems, Montreal, Canada, 2018, vol. 2018, pp. 4613–4623.","ista":"Alistarh D-A, Allen-Zhu Z, Li J. 2018. Byzantine stochastic gradient descent. Advances in Neural Information Processing Systems. NeurIPS: Conference on Neural Information Processing Systems vol. 2018, 4613–4623."},"page":"4613-4623","date_published":"2018-12-01T00:00:00Z","scopus_import":"1","day":"01","article_processing_charge":"No"},{"quality_controlled":"1","isi":1,"project":[{"grant_number":"665385","_id":"2564DBCA-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"International IST Doctoral Program"}],"external_id":{"isi":["000461852000047"],"arxiv":["1809.10505"]},"oa":1,"main_file_link":[{"url":"https://arxiv.org/abs/1809.10505","open_access":"1"}],"language":[{"iso":"eng"}],"conference":{"name":"NeurIPS: Conference on Neural Information Processing Systems","location":"Montreal, Canada","start_date":"2018-12-02","end_date":"2018-12-08"},"month":"12","publication_status":"published","department":[{"_id":"DaAl"},{"_id":"ChLa"}],"publisher":"Neural Information Processing Systems Foundation","year":"2018","date_created":"2019-06-27T09:32:55Z","date_updated":"2023-10-17T11:47:20Z","volume":"Volume 2018","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"last_name":"Hoefler","first_name":"Torsten","full_name":"Hoefler, Torsten"},{"full_name":"Johansson, Mikael","first_name":"Mikael","last_name":"Johansson"},{"full_name":"Konstantinov, Nikola H","last_name":"Konstantinov","first_name":"Nikola H","id":"4B9D76E4-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Khirirat","first_name":"Sarit","full_name":"Khirirat, Sarit"},{"first_name":"Cedric","last_name":"Renggli","full_name":"Renggli, Cedric"}],"ec_funded":1,"page":"5973-5983","publication":"Advances in Neural Information Processing Systems 31","citation":{"ama":"Alistarh D-A, Hoefler T, Johansson M, Konstantinov NH, Khirirat S, Renggli C. The convergence of sparsified gradient methods. In: Advances in Neural Information Processing Systems 31. Vol Volume 2018. Neural Information Processing Systems Foundation; 2018:5973-5983.","ista":"Alistarh D-A, Hoefler T, Johansson M, Konstantinov NH, Khirirat S, Renggli C. 2018. The convergence of sparsified gradient methods. Advances in Neural Information Processing Systems 31. NeurIPS: Conference on Neural Information Processing Systems vol. Volume 2018, 5973–5983.","ieee":"D.-A. Alistarh, T. Hoefler, M. Johansson, N. H. Konstantinov, S. Khirirat, and C. Renggli, “The convergence of sparsified gradient methods,” in Advances in Neural Information Processing Systems 31, Montreal, Canada, 2018, vol. Volume 2018, pp. 5973–5983.","apa":"Alistarh, D.-A., Hoefler, T., Johansson, M., Konstantinov, N. H., Khirirat, S., & Renggli, C. (2018). The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems 31 (Vol. Volume 2018, pp. 5973–5983). Montreal, Canada: Neural Information Processing Systems Foundation.","mla":"Alistarh, Dan-Adrian, et al. “The Convergence of Sparsified Gradient Methods.” Advances in Neural Information Processing Systems 31, vol. Volume 2018, Neural Information Processing Systems Foundation, 2018, pp. 5973–83.","short":"D.-A. Alistarh, T. Hoefler, M. Johansson, N.H. Konstantinov, S. Khirirat, C. Renggli, in:, Advances in Neural Information Processing Systems 31, Neural Information Processing Systems Foundation, 2018, pp. 5973–5983.","chicago":"Alistarh, Dan-Adrian, Torsten Hoefler, Mikael Johansson, Nikola H Konstantinov, Sarit Khirirat, and Cedric Renggli. “The Convergence of Sparsified Gradient Methods.” In Advances in Neural Information Processing Systems 31, Volume 2018:5973–83. Neural Information Processing Systems Foundation, 2018."},"date_published":"2018-12-01T00:00:00Z","scopus_import":"1","day":"01","article_processing_charge":"No","title":"The convergence of sparsified gradient methods","status":"public","_id":"6589","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","oa_version":"Preprint","type":"conference","abstract":[{"lang":"eng","text":"Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization, large-batch methods, and gradient sparsification, have been proposed. To date, gradient sparsification methods--where each node sorts gradients by magnitude, and only communicates a subset of the components, accumulating the rest locally--are known to yield some of the largest practical gains. Such methods can reduce the amount of communication per step by up to \\emph{three orders of magnitude}, while preserving model accuracy. Yet, this family of methods currently has no theoretical justification. This is the question we address in this paper. We prove that, under analytic assumptions, sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD. The main insight is that sparsification methods implicitly maintain bounds on the maximum impact of stale updates, thanks to selection by magnitude. Our analysis and empirical validation also reveal that these methods do require analytical conditions to converge well, justifying existing heuristics."}]},{"publication_identifier":{"isbn":["978-145035422-6"]},"month":"11","day":"28","scopus_import":1,"doi":"10.1145/3143361.3143367","date_published":"2017-11-28T00:00:00Z","conference":{"name":"CoNEXT: Conference on emerging Networking EXperiments and Technologies","start_date":"2017-12-12","location":"Incheon, South Korea","end_date":"2017-12-15"},"language":[{"iso":"eng"}],"citation":{"chicago":"Baig, Ghufran, Bozidar Radunovic, Dan-Adrian Alistarh, Matthew Balkwill, Thomas Karagiannis, and Lili Qiu. “Towards Unlicensed Cellular Networks in TV White Spaces.” In Proceedings of the 2017 13th International Conference on Emerging Networking EXperiments and Technologies, 2–14. ACM, 2017. https://doi.org/10.1145/3143361.3143367.","short":"G. Baig, B. Radunovic, D.-A. Alistarh, M. Balkwill, T. Karagiannis, L. Qiu, in:, Proceedings of the 2017 13th International Conference on Emerging Networking EXperiments and Technologies, ACM, 2017, pp. 2–14.","mla":"Baig, Ghufran, et al. “Towards Unlicensed Cellular Networks in TV White Spaces.” Proceedings of the 2017 13th International Conference on Emerging Networking EXperiments and Technologies, ACM, 2017, pp. 2–14, doi:10.1145/3143361.3143367.","ieee":"G. Baig, B. Radunovic, D.-A. Alistarh, M. Balkwill, T. Karagiannis, and L. Qiu, “Towards unlicensed cellular networks in TV white spaces,” in Proceedings of the 2017 13th International Conference on emerging Networking EXperiments and Technologies, Incheon, South Korea, 2017, pp. 2–14.","apa":"Baig, G., Radunovic, B., Alistarh, D.-A., Balkwill, M., Karagiannis, T., & Qiu, L. (2017). Towards unlicensed cellular networks in TV white spaces. In Proceedings of the 2017 13th International Conference on emerging Networking EXperiments and Technologies (pp. 2–14). Incheon, South Korea: ACM. https://doi.org/10.1145/3143361.3143367","ista":"Baig G, Radunovic B, Alistarh D-A, Balkwill M, Karagiannis T, Qiu L. 2017. Towards unlicensed cellular networks in TV white spaces. Proceedings of the 2017 13th International Conference on emerging Networking EXperiments and Technologies. CoNEXT: Conference on emerging Networking EXperiments and Technologies, 2–14.","ama":"Baig G, Radunovic B, Alistarh D-A, Balkwill M, Karagiannis T, Qiu L. Towards unlicensed cellular networks in TV white spaces. In: Proceedings of the 2017 13th International Conference on Emerging Networking EXperiments and Technologies. ACM; 2017:2-14. doi:10.1145/3143361.3143367"},"publication":"Proceedings of the 2017 13th International Conference on emerging Networking EXperiments and Technologies","page":"2 - 14","quality_controlled":"1","publist_id":"7333","abstract":[{"text":"In this paper we study network architecture for unlicensed cellular networking for outdoor coverage in TV white spaces. The main technology proposed for TV white spaces is 802.11af, a Wi-Fi variant adapted for TV frequencies. However, 802.11af is originally designed for improved indoor propagation. We show that long links, typical for outdoor use, exacerbate known Wi-Fi issues, such as hidden and exposed terminal, and significantly reduce its efficiency. Instead, we propose CellFi, an alternative architecture based on LTE. LTE is designed for long-range coverage and throughput efficiency, but it is also designed to operate in tightly controlled and centrally managed networks. CellFi overcomes these problems by designing an LTE-compatible spectrum database component, mandatory for TV white space networking, and introducing an interference management component for distributed coordination. CellFi interference management is compatible with existing LTE mechanisms, requires no explicit communication between base stations, and is more efficient than CSMA for long links. We evaluate our design through extensive real world evaluation on of-the-shelf LTE equipment and simulations. We show that, compared to 802.11af, it increases coverage by 40% and reduces median flow completion times by 2.3x.","lang":"eng"}],"type":"conference","author":[{"full_name":"Baig, Ghufran","last_name":"Baig","first_name":"Ghufran"},{"full_name":"Radunovic, Bozidar","last_name":"Radunovic","first_name":"Bozidar"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Balkwill, Matthew","first_name":"Matthew","last_name":"Balkwill"},{"full_name":"Karagiannis, Thomas","first_name":"Thomas","last_name":"Karagiannis"},{"full_name":"Qiu, Lili","last_name":"Qiu","first_name":"Lili"}],"oa_version":"None","date_updated":"2023-02-23T12:21:11Z","date_created":"2018-12-11T11:46:45Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"487","year":"2017","publisher":"ACM","department":[{"_id":"DaAl"}],"publication_status":"published","title":"Towards unlicensed cellular networks in TV white spaces","status":"public"},{"month":"01","main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1706.09937"}],"oa":1,"external_id":{"arxiv":["1706.09937"]},"quality_controlled":"1","doi":"10.1007/978-3-319-66799-7_11","conference":{"name":"DNA Computing and Molecular Programming"},"language":[{"iso":"eng"}],"publist_id":"6868","extern":"1","acknowledgement":"D. Alistarh - Supported by an SNF Ambizione Fellowship. A. Kosowski — Supported by Inria project GANG, ANR project DESCARTES, and\r\nNCN grant 2015/17/B/ST6/01897. D. Soloveichik — Supported by NSF grants CCF-1618895 and CCF-1652824.\r\n\r\n","year":"2017","publisher":"Springer","publication_status":"published","author":[{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"},{"full_name":"Dudek, Bartłomiej","first_name":"Bartłomiej","last_name":"Dudek"},{"last_name":"Kosowski","first_name":"Adrian","full_name":"Kosowski, Adrian"},{"full_name":"Soloveichik, David","last_name":"Soloveichik","first_name":"David"},{"full_name":"Uznański, Przemysław","first_name":"Przemysław","last_name":"Uznański"}],"volume":"10467 LNCS","date_created":"2018-12-11T11:48:30Z","date_updated":"2022-03-18T12:48:02Z","scopus_import":"1","article_processing_charge":"No","day":"01","citation":{"chicago":"Alistarh, Dan-Adrian, Bartłomiej Dudek, Adrian Kosowski, David Soloveichik, and Przemysław Uznański. “Robust Detection in Leak-Prone Population Protocols,” 10467 LNCS:155–71. Springer, 2017. https://doi.org/10.1007/978-3-319-66799-7_11.","short":"D.-A. Alistarh, B. Dudek, A. Kosowski, D. Soloveichik, P. Uznański, in:, Springer, 2017, pp. 155–171.","mla":"Alistarh, Dan-Adrian, et al. Robust Detection in Leak-Prone Population Protocols. Vol. 10467 LNCS, Springer, 2017, pp. 155–71, doi:10.1007/978-3-319-66799-7_11.","apa":"Alistarh, D.-A., Dudek, B., Kosowski, A., Soloveichik, D., & Uznański, P. (2017). Robust detection in leak-prone population protocols (Vol. 10467 LNCS, pp. 155–171). Presented at the DNA Computing and Molecular Programming, Springer. https://doi.org/10.1007/978-3-319-66799-7_11","ieee":"D.-A. Alistarh, B. Dudek, A. Kosowski, D. Soloveichik, and P. Uznański, “Robust detection in leak-prone population protocols,” presented at the DNA Computing and Molecular Programming, 2017, vol. 10467 LNCS, pp. 155–171.","ista":"Alistarh D-A, Dudek B, Kosowski A, Soloveichik D, Uznański P. 2017. Robust detection in leak-prone population protocols. DNA Computing and Molecular Programming, LNCS, vol. 10467 LNCS, 155–171.","ama":"Alistarh D-A, Dudek B, Kosowski A, Soloveichik D, Uznański P. Robust detection in leak-prone population protocols. In: Vol 10467 LNCS. Springer; 2017:155-171. doi:10.1007/978-3-319-66799-7_11"},"page":"155 - 171","date_published":"2017-01-01T00:00:00Z","type":"conference","alternative_title":["LNCS"],"abstract":[{"lang":"eng","text":"In contrast to electronic computation, chemical computation is noisy and susceptible to a variety of sources of error, which has prevented the construction of robust complex systems. To be effective, chemical algorithms must be designed with an appropriate error model in mind. Here we consider the model of chemical reaction networks that preserve molecular count (population protocols), and ask whether computation can be made robust to a natural model of unintended “leak” reactions. Our definition of leak is motivated by both the particular spurious behavior seen when implementing chemical reaction networks with DNA strand displacement cascades, as well as the unavoidable side reactions in any implementation due to the basic laws of chemistry. We develop a new “Robust Detection” algorithm for the problem of fast (logarithmic time) single molecule detection, and prove that it is robust to this general model of leaks. Besides potential applications in single molecule detection, the error-correction ideas developed here might enable a new class of robust-by-design chemical algorithms. Our analysis is based on a non-standard hybrid argument, combining ideas from discrete analysis of population protocols with classic Markov chain techniques."}],"_id":"788","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","title":"Robust detection in leak-prone population protocols","oa_version":"None"},{"day":"01","month":"01","language":[{"iso":"eng"}],"conference":{"name":"SODA: Symposium on Discrete Algorithms"},"doi":"doi.org/10.1137/1.9781611974782.169","date_published":"2017-01-01T00:00:00Z","page":"2560 - 2579","main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1602.08032"}],"oa":1,"citation":{"chicago":"Alistarh, Dan-Adrian, James Aspnes, David Eisenstat, Ronald Rivest, and Rati Gelashvili. “Time-Space Trade-Offs in Population Protocols,” 2560–79. SIAM, 2017. https://doi.org/doi.org/10.1137/1.9781611974782.169.","short":"D.-A. Alistarh, J. Aspnes, D. Eisenstat, R. Rivest, R. Gelashvili, in:, SIAM, 2017, pp. 2560–2579.","mla":"Alistarh, Dan-Adrian, et al. Time-Space Trade-Offs in Population Protocols. SIAM, 2017, pp. 2560–79, doi:doi.org/10.1137/1.9781611974782.169.","ieee":"D.-A. Alistarh, J. Aspnes, D. Eisenstat, R. Rivest, and R. Gelashvili, “Time-space trade-offs in population protocols,” presented at the SODA: Symposium on Discrete Algorithms, 2017, pp. 2560–2579.","apa":"Alistarh, D.-A., Aspnes, J., Eisenstat, D., Rivest, R., & Gelashvili, R. (2017). Time-space trade-offs in population protocols (pp. 2560–2579). Presented at the SODA: Symposium on Discrete Algorithms, SIAM. https://doi.org/doi.org/10.1137/1.9781611974782.169","ista":"Alistarh D-A, Aspnes J, Eisenstat D, Rivest R, Gelashvili R. 2017. Time-space trade-offs in population protocols. SODA: Symposium on Discrete Algorithms, 2560–2579.","ama":"Alistarh D-A, Aspnes J, Eisenstat D, Rivest R, Gelashvili R. Time-space trade-offs in population protocols. In: SIAM; 2017:2560-2579. doi:doi.org/10.1137/1.9781611974782.169"},"extern":"1","abstract":[{"lang":"eng","text":"Population protocols are a popular model of distributed computing, in which randomly-interacting agents with little computational power cooperate to jointly perform computational tasks. Inspired by developments in molecular computation, and in particular DNA computing, recent algorithmic work has focused on the complexity of solving simple yet fundamental tasks in the population model, such as leader election (which requires convergence to a single agent in a special "leader" state), and majority (in which agents must converge to a decision as to which of two possible initial states had higher initial count). Known results point towards an inherent trade-off between the time complexity of such algorithms, and the space complexity, i.e. size of the memory available to each agent. In this paper, we explore this trade-off and provide new upper and lower bounds for majority and leader election. First, we prove a unified lower bound, which relates the space available per node with the time complexity achievable by a protocol: for instance, our result implies that any protocol solving either of these tasks for n agents using O(log log n) states must take (n=polylogn) expected time. This is the first result to characterize time complexity for protocols which employ super-constant number of states per node, and proves that fast, poly-logarithmic running times require protocols to have relatively large space costs. On the positive side, we give algorithms showing that fast, poly-logarithmic convergence time can be achieved using O(log2 n) space per node, in the case of both tasks. Overall, our results highlight a time complexity separation between O(log log n) and (log2 n) state space size for both majority and leader election in population protocols, and introduce new techniques, which should be applicable more broadly."}],"publist_id":"6869","type":"conference","date_updated":"2023-02-23T13:19:13Z","date_created":"2018-12-11T11:48:30Z","oa_version":"None","author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"first_name":"James","last_name":"Aspnes","full_name":"Aspnes, James"},{"full_name":"Eisenstat, David","last_name":"Eisenstat","first_name":"David"},{"last_name":"Rivest","first_name":"Ronald","full_name":"Rivest, Ronald"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"}],"publication_status":"published","title":"Time-space trade-offs in population protocols","status":"public","publisher":"SIAM","acknowledgement":"Dan Alistarh was supported by a Swiss National Science\r\nFoundation Ambizione Fellowship. James Aspnes was supported by the National Science Foundation under grants\r\nCCF-1637385 and CCF-1650596. Rati Gelashvili was supported by the National Science Foundation under grants\r\nCCF-1217921, CCF-1301926, and IIS-1447786, the Department of Energy under grant ER26116/DE-SC0008923, and\r\nOracle and Intel corporations.\r\nThe authors would like to thank David Doty, David\r\nSoloveichik, and Milan Vojnovic for insightful discussions\r\nand feedback during the development of this work.","_id":"787","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2017"},{"author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"full_name":"Leiserson, William","last_name":"Leiserson","first_name":"William"},{"full_name":"Matveev, Alexander","first_name":"Alexander","last_name":"Matveev"},{"first_name":"Nir","last_name":"Shavit","full_name":"Shavit, Nir"}],"date_updated":"2023-02-23T13:19:44Z","date_created":"2018-12-11T11:48:30Z","oa_version":"None","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"789","year":"2017","acknowledgement":"William Leiserson, Alexander Matveev, and Nir Shavit were supported by the NSF under grants IIS-1447786 and CCF-1563880, and Dan Alistarh was supported by a Swiss National Fund Ambizione Fellowship.","title":"Forkscan: Conservative memory reclamation for modern operating systems","publication_status":"published","status":"public","publisher":"ACM","abstract":[{"text":"The problem of efficient concurrent memory reclamation in unmanaged languages such as C or C++ is one of the major challenges facing the parallelization of billions of lines of legacy code. Garbage collectors for C/C++ can be inefficient; thus, programmers are often forced to use finely-crafted concurrent memory reclamation techniques. These techniques can provide good performance, but require considerable programming effort to deploy, and have strict requirements, allowing the programmer very little room for error. In this work, we present Forkscan, a new conservative concurrent memory reclamation scheme which is fully automatic and surprisingly scalable. Forkscan's semantics place it between automatic garbage collectors (it requires the programmer to explicitly retire nodes before they can be reclaimed), and concurrent memory reclamation techniques (as it does not assume that nodes are completely unlinked from the data structure for correctness). Forkscan's implementation exploits these new semantics for efficiency: we leverage parallelism and optimized implementations of signaling and copy-on-write in modern operating systems to efficiently obtain and process consistent snapshots of memory that can be scanned concurrently with the normal program operation. Empirical evaluation on a range of classical concurrent data structure microbenchmarks shows that Forkscan can preserve the scalability of the original code, while maintaining an order of magnitude lower latency than automatic garbage collection, and demonstrating competitive performance with finely crafted memory reclamation techniques.","lang":"eng"}],"publist_id":"6867","extern":"1","type":"conference","conference":{"name":"EuroSys: European Conference on Computer Systems"},"date_published":"2017-01-01T00:00:00Z","doi":"10.1145/3064176.3064214","language":[{"iso":"eng"}],"citation":{"ieee":"D.-A. Alistarh, W. Leiserson, A. Matveev, and N. Shavit, “Forkscan: Conservative memory reclamation for modern operating systems,” presented at the EuroSys: European Conference on Computer Systems, 2017, pp. 483–498.","apa":"Alistarh, D.-A., Leiserson, W., Matveev, A., & Shavit, N. (2017). Forkscan: Conservative memory reclamation for modern operating systems (pp. 483–498). Presented at the EuroSys: European Conference on Computer Systems, ACM. https://doi.org/10.1145/3064176.3064214","ista":"Alistarh D-A, Leiserson W, Matveev A, Shavit N. 2017. Forkscan: Conservative memory reclamation for modern operating systems. EuroSys: European Conference on Computer Systems, 483–498.","ama":"Alistarh D-A, Leiserson W, Matveev A, Shavit N. Forkscan: Conservative memory reclamation for modern operating systems. In: ACM; 2017:483-498. doi:10.1145/3064176.3064214","chicago":"Alistarh, Dan-Adrian, William Leiserson, Alexander Matveev, and Nir Shavit. “Forkscan: Conservative Memory Reclamation for Modern Operating Systems,” 483–98. ACM, 2017. https://doi.org/10.1145/3064176.3064214.","short":"D.-A. Alistarh, W. Leiserson, A. Matveev, N. Shavit, in:, ACM, 2017, pp. 483–498.","mla":"Alistarh, Dan-Adrian, et al. Forkscan: Conservative Memory Reclamation for Modern Operating Systems. ACM, 2017, pp. 483–98, doi:10.1145/3064176.3064214."},"page":"483 - 498","month":"01","day":"01","article_processing_charge":"No"},{"publist_id":"6865","abstract":[{"lang":"eng","text":"Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme - called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper."}],"extern":"1","type":"conference","author":[{"last_name":"Kara","first_name":"Kaan","full_name":"Kara, Kaan"},{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Gustavo","last_name":"Alonso","full_name":"Alonso, Gustavo"},{"full_name":"Mutlu, Onur","first_name":"Onur","last_name":"Mutlu"},{"last_name":"Zhang","first_name":"Ce","full_name":"Zhang, Ce"}],"oa_version":"None","date_updated":"2023-02-23T13:19:52Z","date_created":"2018-12-11T11:48:31Z","_id":"790","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2017","publisher":"IEEE","status":"public","title":"FPGA-accelerated dense linear machine learning: A precision-convergence trade-off","publication_status":"published","article_processing_charge":"No","month":"06","day":"30","date_published":"2017-06-30T00:00:00Z","doi":"10.1109/FCCM.2017.39","conference":{"name":"FCCM: Field-Programmable Custom Computing Machines"},"language":[{"iso":"eng"}],"citation":{"chicago":"Kara, Kaan, Dan-Adrian Alistarh, Gustavo Alonso, Onur Mutlu, and Ce Zhang. “FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off,” 160–67. IEEE, 2017. https://doi.org/10.1109/FCCM.2017.39.","short":"K. Kara, D.-A. Alistarh, G. Alonso, O. Mutlu, C. Zhang, in:, IEEE, 2017, pp. 160–167.","mla":"Kara, Kaan, et al. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. IEEE, 2017, pp. 160–67, doi:10.1109/FCCM.2017.39.","apa":"Kara, K., Alistarh, D.-A., Alonso, G., Mutlu, O., & Zhang, C. (2017). FPGA-accelerated dense linear machine learning: A precision-convergence trade-off (pp. 160–167). Presented at the FCCM: Field-Programmable Custom Computing Machines, IEEE. https://doi.org/10.1109/FCCM.2017.39","ieee":"K. Kara, D.-A. Alistarh, G. Alonso, O. Mutlu, and C. Zhang, “FPGA-accelerated dense linear machine learning: A precision-convergence trade-off,” presented at the FCCM: Field-Programmable Custom Computing Machines, 2017, pp. 160–167.","ista":"Kara K, Alistarh D-A, Alonso G, Mutlu O, Zhang C. 2017. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. FCCM: Field-Programmable Custom Computing Machines, 160–167.","ama":"Kara K, Alistarh D-A, Alonso G, Mutlu O, Zhang C. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. In: IEEE; 2017:160-167. doi:10.1109/FCCM.2017.39"},"page":"160 - 167"},{"type":"conference","abstract":[{"text":"Consider the following random process: we are given n queues, into which elements of increasing labels are inserted uniformly at random. To remove an element, we pick two queues at random, and remove the element of lower label (higher priority) among the two. The cost of a removal is the rank of the label removed, among labels still present in any of the queues, that is, the distance from the optimal choice at each step. Variants of this strategy are prevalent in state-of-the-art concurrent priority queue implementations. Nonetheless, it is not known whether such implementations provide any rank guarantees, even in a sequential model. We answer this question, showing that this strategy provides surprisingly strong guarantees: Although the single-choice process, where we always insert and remove from a single randomly chosen queue, has degrading cost, going to infinity as we increase the number of steps, in the two choice process, the expected rank of a removed element is O(n) while the expected worst-case cost is O(n log n). These bounds are tight, and hold irrespective of the number of steps for which we run the process. The argument is based on a new technical connection between "heavily loaded" balls-into-bins processes and priority scheduling. Our analytic results inspire a new concurrent priority queue implementation, which improves upon the state of the art in terms of practical performance.","lang":"eng"}],"_id":"791","user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","status":"public","title":"The power of choice in priority scheduling","oa_version":"Submitted Version","scopus_import":"1","day":"26","article_processing_charge":"No","publication":"Proceedings of the ACM Symposium on Principles of Distributed Computing","citation":{"chicago":"Alistarh, Dan-Adrian, Justin Kopinsky, Jerry Li, and Giorgi Nadiradze. “The Power of Choice in Priority Scheduling.” In Proceedings of the ACM Symposium on Principles of Distributed Computing, Part F129314:283–92. ACM, 2017. https://doi.org/10.1145/3087801.3087810.","mla":"Alistarh, Dan-Adrian, et al. “The Power of Choice in Priority Scheduling.” Proceedings of the ACM Symposium on Principles of Distributed Computing, vol. Part F129314, ACM, 2017, pp. 283–92, doi:10.1145/3087801.3087810.","short":"D.-A. Alistarh, J. Kopinsky, J. Li, G. Nadiradze, in:, Proceedings of the ACM Symposium on Principles of Distributed Computing, ACM, 2017, pp. 283–292.","ista":"Alistarh D-A, Kopinsky J, Li J, Nadiradze G. 2017. The power of choice in priority scheduling. Proceedings of the ACM Symposium on Principles of Distributed Computing. PODC: Principles of Distributed Computing vol. Part F129314, 283–292.","ieee":"D.-A. Alistarh, J. Kopinsky, J. Li, and G. Nadiradze, “The power of choice in priority scheduling,” in Proceedings of the ACM Symposium on Principles of Distributed Computing, Washington, WA, USA, 2017, vol. Part F129314, pp. 283–292.","apa":"Alistarh, D.-A., Kopinsky, J., Li, J., & Nadiradze, G. (2017). The power of choice in priority scheduling. In Proceedings of the ACM Symposium on Principles of Distributed Computing (Vol. Part F129314, pp. 283–292). Washington, WA, USA: ACM. https://doi.org/10.1145/3087801.3087810","ama":"Alistarh D-A, Kopinsky J, Li J, Nadiradze G. The power of choice in priority scheduling. In: Proceedings of the ACM Symposium on Principles of Distributed Computing. Vol Part F129314. ACM; 2017:283-292. doi:10.1145/3087801.3087810"},"page":"283 - 292","date_published":"2017-07-26T00:00:00Z","publist_id":"6864","year":"2017","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"ACM","author":[{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Justin","last_name":"Kopinsky","full_name":"Kopinsky, Justin"},{"first_name":"Jerry","last_name":"Li","full_name":"Li, Jerry"},{"full_name":"Nadiradze, Giorgi","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-5634-0731","first_name":"Giorgi","last_name":"Nadiradze"}],"date_updated":"2023-09-27T12:17:59Z","date_created":"2018-12-11T11:48:31Z","volume":"Part F129314","month":"07","publication_identifier":{"isbn":["978-145034992-5"]},"external_id":{"isi":["000462995000035"]},"main_file_link":[{"url":"https://arxiv.org/abs/1706.04178","open_access":"1"}],"oa":1,"quality_controlled":"1","isi":1,"conference":{"name":"PODC: Principles of Distributed Computing","start_date":"2017-07-25","location":"Washington, WA, USA","end_date":"2017-07-27"},"doi":"10.1145/3087801.3087810","language":[{"iso":"eng"}]},{"date_published":"2017-01-01T00:00:00Z","citation":{"ieee":"D.-A. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnović, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” presented at the NIPS: Neural Information Processing System, Long Beach, CA, United States, 2017, vol. 2017, pp. 1710–1721.","apa":"Alistarh, D.-A., Grubic, D., Li, J., Tomioka, R., & Vojnović, M. (2017). QSGD: Communication-efficient SGD via gradient quantization and encoding (Vol. 2017, pp. 1710–1721). Presented at the NIPS: Neural Information Processing System, Long Beach, CA, United States: Neural Information Processing Systems Foundation.","ista":"Alistarh D-A, Grubic D, Li J, Tomioka R, Vojnović M. 2017. QSGD: Communication-efficient SGD via gradient quantization and encoding. NIPS: Neural Information Processing System, Advances in Neural Information Processing Systems, vol. 2017, 1710–1721.","ama":"Alistarh D-A, Grubic D, Li J, Tomioka R, Vojnović M. QSGD: Communication-efficient SGD via gradient quantization and encoding. In: Vol 2017. Neural Information Processing Systems Foundation; 2017:1710-1721.","chicago":"Alistarh, Dan-Adrian, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnović. “QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding,” 2017:1710–21. Neural Information Processing Systems Foundation, 2017.","short":"D.-A. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnović, in:, Neural Information Processing Systems Foundation, 2017, pp. 1710–1721.","mla":"Alistarh, Dan-Adrian, et al. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. Vol. 2017, Neural Information Processing Systems Foundation, 2017, pp. 1710–21."},"page":"1710-1721","article_processing_charge":"No","day":"01","oa_version":"Submitted Version","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"431","intvolume":" 2017","title":"QSGD: Communication-efficient SGD via gradient quantization and encoding","status":"public","abstract":[{"text":"Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties. A fundamental barrier when parallelizing SGD is the high bandwidth cost of communicating gradient updates between nodes; consequently, several lossy compresion heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always converge. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes with convergence guarantees and good practical performance. QSGD allows the user to smoothly trade off communication bandwidth and convergence time: nodes can adjust the number of bits sent per iteration, at the cost of possibly higher variance. We show that this trade-off is inherent, in the sense that improving it past some threshold would violate information-theoretic lower bounds. QSGD guarantees convergence for convex and non-convex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques. When applied to training deep neural networks for image classification and automated speech recognition, QSGD leads to significant reductions in end-to-end training time. For instance, on 16GPUs, we can train the ResNet-152 network to full accuracy on ImageNet 1.8 × faster than the full-precision variant. ","lang":"eng"}],"type":"conference","alternative_title":["Advances in Neural Information Processing Systems"],"conference":{"name":"NIPS: Neural Information Processing System","location":"Long Beach, CA, United States","start_date":"2017-12-04","end_date":"2017-12-09"},"language":[{"iso":"eng"}],"oa":1,"external_id":{"arxiv":["1610.02132"]},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1610.02132"}],"quality_controlled":"1","publication_identifier":{"issn":["10495258"]},"month":"01","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"full_name":"Grubic, Demjan","last_name":"Grubic","first_name":"Demjan"},{"full_name":"Li, Jerry","first_name":"Jerry","last_name":"Li"},{"full_name":"Tomioka, Ryota","last_name":"Tomioka","first_name":"Ryota"},{"first_name":"Milan","last_name":"Vojnović","full_name":"Vojnović, Milan"}],"volume":2017,"date_updated":"2023-10-17T11:48:03Z","date_created":"2018-12-11T11:46:26Z","year":"2017","publisher":"Neural Information Processing Systems Foundation","department":[{"_id":"DaAl"}],"publication_status":"published","publist_id":"7392"},{"month":"01","publication_identifier":{"isbn":["978-151085514-4"]},"quality_controlled":"1","oa":1,"language":[{"iso":"eng"}],"conference":{"name":"ICML: International Conference on Machine Learning","location":"Sydney, Australia","start_date":"2017-08-06","end_date":"2017-08-11"},"file_date_updated":"2020-07-14T12:46:26Z","publist_id":"7391","publication_status":"published","department":[{"_id":"DaAl"}],"publisher":"ML Research Press","year":"2017","date_updated":"2023-10-17T12:31:15Z","date_created":"2018-12-11T11:46:26Z","volume":" 70","author":[{"first_name":"Hantian","last_name":"Zhang","full_name":"Zhang, Hantian"},{"last_name":"Li","first_name":"Jerry","full_name":"Li, Jerry"},{"full_name":"Kara, Kaan","first_name":"Kaan","last_name":"Kara"},{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Liu, Ji","last_name":"Liu","first_name":"Ji"},{"full_name":"Zhang, Ce","first_name":"Ce","last_name":"Zhang"}],"scopus_import":"1","day":"01","has_accepted_license":"1","article_processing_charge":"No","page":"4035 - 4043","publication":"Proceedings of Machine Learning Research","citation":{"short":"H. Zhang, J. Li, K. Kara, D.-A. Alistarh, J. Liu, C. Zhang, in:, Proceedings of Machine Learning Research, ML Research Press, 2017, pp. 4035–4043.","mla":"Zhang, Hantian, et al. “ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning.” Proceedings of Machine Learning Research, vol. 70, ML Research Press, 2017, pp. 4035–43.","chicago":"Zhang, Hantian, Jerry Li, Kaan Kara, Dan-Adrian Alistarh, Ji Liu, and Ce Zhang. “ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning.” In Proceedings of Machine Learning Research, 70:4035–43. ML Research Press, 2017.","ama":"Zhang H, Li J, Kara K, Alistarh D-A, Liu J, Zhang C. ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning. In: Proceedings of Machine Learning Research. Vol 70. ML Research Press; 2017:4035-4043.","apa":"Zhang, H., Li, J., Kara, K., Alistarh, D.-A., Liu, J., & Zhang, C. (2017). ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning. In Proceedings of Machine Learning Research (Vol. 70, pp. 4035–4043). Sydney, Australia: ML Research Press.","ieee":"H. Zhang, J. Li, K. Kara, D.-A. Alistarh, J. Liu, and C. Zhang, “ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning,” in Proceedings of Machine Learning Research, Sydney, Australia, 2017, vol. 70, pp. 4035–4043.","ista":"Zhang H, Li J, Kara K, Alistarh D-A, Liu J, Zhang C. 2017. ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning. Proceedings of Machine Learning Research. ICML: International Conference on Machine Learning, PMLR Press, vol. 70, 4035–4043."},"date_published":"2017-01-01T00:00:00Z","alternative_title":["PMLR Press"],"type":"conference","abstract":[{"text":"Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We mainly focus on linear models, and the answer is yes for linear models. We develop a simple framework called ZipML based on one simple but novel strategy called double sampling. Our ZipML framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quanti- zation would introduce significant bias. We val- idate our framework across a range of applica- tions, and show that it enables an FPGA proto- type that is up to 6.5 × faster than an implemen- tation using full 32-bit precision. We further de- velop a variance-optimal stochastic quantization strategy and show that it can make a significant difference in a variety of settings. When applied to linear models together with double sampling, we save up to another 1.7 × in data movement compared with uniform quantization. When training deep networks with quantized models, we achieve higher accuracy than the state-of-the- art XNOR-Net. ","lang":"eng"}],"title":"ZipML: Training linear models with end-to-end low precision, and a little bit of deep learning","status":"public","ddc":["000"],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"432","file":[{"access_level":"open_access","file_name":"2017_ICML_Zhang.pdf","creator":"dernst","content_type":"application/pdf","file_size":849345,"file_id":"5869","relation":"main_file","checksum":"86156ba7f4318e47cef3eb9092593c10","date_updated":"2020-07-14T12:46:26Z","date_created":"2019-01-22T08:23:58Z"}],"oa_version":"Submitted Version"},{"extern":"1","publist_id":"6870","volume":63,"date_created":"2018-12-11T11:48:29Z","date_updated":"2023-02-23T13:19:04Z","author":[{"full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian"},{"full_name":"Censor Hillel, Keren","first_name":"Keren","last_name":"Censor Hillel"},{"full_name":"Shavit, Nir","first_name":"Nir","last_name":"Shavit"}],"publisher":"ACM","publication_status":"published","year":"2016","acknowledgement":"Part of this work was performed while the first author was a postdoctoral associate at MIT CSAIL, where he was supported by the SNF Postdoctoral Fellows Program, NSF grant CCF-1217921, DoE ASCR grant ER26116/DE-SC0008923, and by grants from the Oracle and Intel corporations. The second author was supported in part by ISF grant 1696/14. The third author was supported in part by NSF grants CCF-1217921, CCF-1301926, IIS-1447786, and CCF-1561807, and the U.S. Department of Energy under grant DE-SC0008923, and by equipment grants from Intel Corporation.","month":"09","language":[{"iso":"eng"}],"doi":"10.1145/2903136","quality_controlled":"1","oa":1,"external_id":{"arxiv":["1311.3200"]},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1311.3200"}],"issue":"4","abstract":[{"text":"Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. Although obtaining efficient wait-free algorithms has been a long-time goal for the theory community, most nonblocking commercial code is only lock-free. This article suggests a simple solution to this problem.We show that for a large class of lock-free algorithms, under scheduling conditions that approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, programmers can continue to design simple lock-free algorithms instead of complex wait-free ones, and in practice, they will get wait-free progress. Our main contribution is a new way of analyzing a general class of lock-free algorithms under a stochastic scheduler. Our analysis relates the individual performance of processes to the global performance of the system using Markov chain lifting between a complex per-process chain and a simpler system progress chain. We show that lock-free algorithms are not only wait-free with probability 1 but that in fact a general subset of lock-free algorithms can be closely bounded in terms of the average number of steps required until an operation completes. To the best of our knowledge, this is the first attempt to analyze progress conditions, typically stated in relation to a worst-case adversary, in a stochastic model capturing their expected asymptotic behavior.","lang":"eng"}],"type":"journal_article","oa_version":"Preprint","intvolume":" 63","title":"Are lock free concurrent algorithms practically wait free ","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"786","article_processing_charge":"No","day":"01","date_published":"2016-09-01T00:00:00Z","citation":{"short":"D.-A. Alistarh, K. Censor Hillel, N. Shavit, Journal of the ACM 63 (2016).","mla":"Alistarh, Dan-Adrian, et al. “Are Lock Free Concurrent Algorithms Practically Wait Free .” Journal of the ACM, vol. 63, no. 4, ACM, 2016, doi:10.1145/2903136.","chicago":"Alistarh, Dan-Adrian, Keren Censor Hillel, and Nir Shavit. “Are Lock Free Concurrent Algorithms Practically Wait Free .” Journal of the ACM. ACM, 2016. https://doi.org/10.1145/2903136.","ama":"Alistarh D-A, Censor Hillel K, Shavit N. Are lock free concurrent algorithms practically wait free . Journal of the ACM. 2016;63(4). doi:10.1145/2903136","ieee":"D.-A. Alistarh, K. Censor Hillel, and N. Shavit, “Are lock free concurrent algorithms practically wait free ,” Journal of the ACM, vol. 63, no. 4. ACM, 2016.","apa":"Alistarh, D.-A., Censor Hillel, K., & Shavit, N. (2016). Are lock free concurrent algorithms practically wait free . Journal of the ACM. ACM. https://doi.org/10.1145/2903136","ista":"Alistarh D-A, Censor Hillel K, Shavit N. 2016. Are lock free concurrent algorithms practically wait free . Journal of the ACM. 63(4)."},"publication":"Journal of the ACM"},{"day":"27","month":"02","article_processing_charge":"No","scopus_import":"1","language":[{"iso":"eng"}],"conference":{"name":"PPoPP: Principles and Practice of Parallel Pogramming"},"doi":"10.1145/2851141.2851155","date_published":"2016-02-27T00:00:00Z","quality_controlled":"1","citation":{"short":"S. Haider, W. Hasenplaugh, D.-A. Alistarh, in:, ACM, 2016.","mla":"Haider, Syed, et al. Lease/Release: Architectural Support for Scaling Contended Data Structures. Vol. 12-16-March-2016, ACM, 2016, doi:10.1145/2851141.2851155.","chicago":"Haider, Syed, William Hasenplaugh, and Dan-Adrian Alistarh. “Lease/Release: Architectural Support for Scaling Contended Data Structures,” Vol. 12-16-March-2016. ACM, 2016. https://doi.org/10.1145/2851141.2851155.","ama":"Haider S, Hasenplaugh W, Alistarh D-A. Lease/Release: Architectural support for scaling contended data structures. In: Vol 12-16-March-2016. ACM; 2016. doi:10.1145/2851141.2851155","apa":"Haider, S., Hasenplaugh, W., & Alistarh, D.-A. (2016). Lease/Release: Architectural support for scaling contended data structures (Vol. 12-16-March-2016). Presented at the PPoPP: Principles and Practice of Parallel Pogramming, ACM. https://doi.org/10.1145/2851141.2851155","ieee":"S. Haider, W. Hasenplaugh, and D.-A. Alistarh, “Lease/Release: Architectural support for scaling contended data structures,” presented at the PPoPP: Principles and Practice of Parallel Pogramming, 2016, vol. 12-16-March-2016.","ista":"Haider S, Hasenplaugh W, Alistarh D-A. 2016. Lease/Release: Architectural support for scaling contended data structures. PPoPP: Principles and Practice of Parallel Pogramming vol. 12-16-March-2016."},"extern":"1","abstract":[{"lang":"eng","text":"High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating designs which minimize contention, and several programming techniques have been proposed to mitigate its effects. However, there are currently few architectural mechanisms to allow scaling contended data structures at high thread counts. In this paper, we investigate hardware support for scalable contended data structures. We propose Lease/Release, a simple addition to standard directory-based MSI cache coherence protocols, allowing participants to lease memory, at the granularity of cache lines, by delaying coherence messages for a short, bounded period of time. Our analysis shows that Lease/Release can significantly reduce the overheads of contention for both non-blocking (lock-free) and lock-based data structure implementations, while ensuring that no deadlocks are introduced. We validate Lease/Release empirically on the Graphite multiprocessor simulator, on a range of data structures, including queue, stack, and priority queue implementations, as well as on transactional applications. Results show that Lease/Release consistently improves both throughput and energy usage, by up to 5x, both for lock-free and lock-based data structure designs."}],"publist_id":"6871","type":"conference","date_updated":"2022-03-18T12:56:29Z","date_created":"2018-12-11T11:48:29Z","oa_version":"None","volume":"12-16-March-2016","author":[{"full_name":"Haider, Syed","first_name":"Syed","last_name":"Haider"},{"full_name":"Hasenplaugh, William","first_name":"William","last_name":"Hasenplaugh"},{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"}],"status":"public","publication_status":"published","title":"Lease/Release: Architectural support for scaling contended data structures","publisher":"ACM","acknowledgement":"We would like to thank Richard Black, Miguel Castro, Dave Dice, Aleksandar Dragojevic, Maurice Herlihy, Ant Rowstron, Nir Shavit, and Vasileios Trigonakis, as well as the anonymous reviewers, for helpful suggestions during the development of this paper.","_id":"785","year":"2016","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87"},{"extern":"1","publist_id":"6878","abstract":[{"text":"High-performance concurrent priority queues are essential for applications such as task scheduling and discrete event simulation. Unfortunately, even the best performing implementations do not scale past a number of threads in the single digits. This is because of the sequential bottleneck in accessing the elements at the head of the queue in order to perform a DeleteMin operation. In this paper, we present the SprayList, a scalable priority queue with relaxed ordering semantics. Starting from a non-blocking SkipList, the main innovation behind our design is that the DeleteMin operations avoid a sequential bottleneck by "spraying" themselves onto the head of the SkipList list in a coordinated fashion. The spraying is implemented using a carefully designed random walk, so that DeleteMin returns an element among the first O(plog3p) in the list, with high probability, where p is the number of threads. We prove that the running time of a DeleteMin operation is O(log3p), with high probability, independent of the size of the list. Our experiments show that the relaxed semantics allow the data structure to scale for high thread counts, comparable to a classic unordered SkipList. Furthermore, we observe that, for reasonably parallel workloads, the scalability benefits of relaxation considerably outweigh the additional work due to out-of-order execution.","lang":"eng"}],"type":"conference","volume":"2015-January","oa_version":"None","date_updated":"2023-02-23T13:16:43Z","date_created":"2018-12-11T11:48:26Z","author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Kopinsky, Justin","first_name":"Justin","last_name":"Kopinsky"},{"first_name":"Jerry","last_name":"Li","full_name":"Li, Jerry"},{"full_name":"Shavit, Nir","first_name":"Nir","last_name":"Shavit"}],"publisher":"ACM","status":"public","title":"The SprayList: A scalable relaxed priority queue","publication_status":"published","acknowledgement":"Support is gratefully acknowledged from the National Science Foundation under grants CCF-1217921, CCF-1301926, and IIS-1447786, the Department of Energy under grant ER26116/DE-SC0008923, and the Oracle\r\nand Intel corporations.","_id":"776","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2015","article_processing_charge":"No","day":"24","month":"01","language":[{"iso":"eng"}],"date_published":"2015-01-24T00:00:00Z","doi":"10.1145/2688500.2688523","conference":{"name":"PPoPP: Principles and Practice of Parallel Pogramming"},"page":"11 - 20","citation":{"chicago":"Alistarh, Dan-Adrian, Justin Kopinsky, Jerry Li, and Nir Shavit. “The SprayList: A Scalable Relaxed Priority Queue,” 2015–January:11–20. ACM, 2015. https://doi.org/10.1145/2688500.2688523.","mla":"Alistarh, Dan-Adrian, et al. The SprayList: A Scalable Relaxed Priority Queue. Vol. 2015–January, ACM, 2015, pp. 11–20, doi:10.1145/2688500.2688523.","short":"D.-A. Alistarh, J. Kopinsky, J. Li, N. Shavit, in:, ACM, 2015, pp. 11–20.","ista":"Alistarh D-A, Kopinsky J, Li J, Shavit N. 2015. The SprayList: A scalable relaxed priority queue. PPoPP: Principles and Practice of Parallel Pogramming vol. 2015–January, 11–20.","ieee":"D.-A. Alistarh, J. Kopinsky, J. Li, and N. Shavit, “The SprayList: A scalable relaxed priority queue,” presented at the PPoPP: Principles and Practice of Parallel Pogramming, 2015, vol. 2015–January, pp. 11–20.","apa":"Alistarh, D.-A., Kopinsky, J., Li, J., & Shavit, N. (2015). The SprayList: A scalable relaxed priority queue (Vol. 2015–January, pp. 11–20). Presented at the PPoPP: Principles and Practice of Parallel Pogramming, ACM. https://doi.org/10.1145/2688500.2688523","ama":"Alistarh D-A, Kopinsky J, Li J, Shavit N. The SprayList: A scalable relaxed priority queue. In: Vol 2015-January. ACM; 2015:11-20. doi:10.1145/2688500.2688523"}},{"author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"full_name":"Iglesias, Jennifer","last_name":"Iglesias","first_name":"Jennifer"},{"full_name":"Vojnović, Milan","last_name":"Vojnović","first_name":"Milan"}],"oa_version":"None","volume":"2015-January","date_updated":"2023-02-23T13:17:09Z","date_created":"2018-12-11T11:48:27Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"777","year":"2015","publisher":"Neural Information Processing Systems","title":"Streaming min-max hypergraph partitioning","status":"public","publication_status":"published","publist_id":"6879","abstract":[{"lang":"eng","text":"In many applications, the data is of rich structure that can be represented by a hypergraph, where the data items are represented by vertices and the associations among items are represented by hyperedges. Equivalently, we are given an input bipartite graph with two types of vertices: items, and associations (which we refer to as topics). We consider the problem of partitioning the set of items into a given number of components such that the maximum number of topics covered by a component is minimized. This is a clustering problem with various applications, e.g. partitioning of a set of information objects such as documents, images, and videos, and load balancing in the context of modern computation platforms.Inthis paper, we focus on the streaming computation model for this problem, in which items arrive online one at a time and each item must be assigned irrevocably to a component at its arrival time. Motivated by scalability requirements, we focus on the class of streaming computation algorithms with memory limited to be at most linear in the number of components. We show that a greedy assignment strategy is able to recover a hidden co-clustering of items under a natural set of recovery conditions. We also report results of an extensive empirical evaluation, which demonstrate that this greedy strategy yields superior performance when compared with alternative approaches."}],"extern":"1","type":"conference","date_published":"2015-01-01T00:00:00Z","conference":{"name":"NIPS: Neural Information Processing Systems"},"language":[{"iso":"eng"}],"main_file_link":[{"url":"http://papers.nips.cc/paper/5897-streaming-min-max-hypergraph-partitioning"}],"citation":{"chicago":"Alistarh, Dan-Adrian, Jennifer Iglesias, and Milan Vojnović. “Streaming Min-Max Hypergraph Partitioning,” 2015–January:1900–1908. Neural Information Processing Systems, 2015.","mla":"Alistarh, Dan-Adrian, et al. Streaming Min-Max Hypergraph Partitioning. Vol. 2015–January, Neural Information Processing Systems, 2015, pp. 1900–08.","short":"D.-A. Alistarh, J. Iglesias, M. Vojnović, in:, Neural Information Processing Systems, 2015, pp. 1900–1908.","ista":"Alistarh D-A, Iglesias J, Vojnović M. 2015. Streaming min-max hypergraph partitioning. NIPS: Neural Information Processing Systems vol. 2015–January, 1900–1908.","apa":"Alistarh, D.-A., Iglesias, J., & Vojnović, M. (2015). Streaming min-max hypergraph partitioning (Vol. 2015–January, pp. 1900–1908). Presented at the NIPS: Neural Information Processing Systems, Neural Information Processing Systems.","ieee":"D.-A. Alistarh, J. Iglesias, and M. Vojnović, “Streaming min-max hypergraph partitioning,” presented at the NIPS: Neural Information Processing Systems, 2015, vol. 2015–January, pp. 1900–1908.","ama":"Alistarh D-A, Iglesias J, Vojnović M. Streaming min-max hypergraph partitioning. In: Vol 2015-January. Neural Information Processing Systems; 2015:1900-1908."},"page":"1900 - 1908","article_processing_charge":"No","month":"01","day":"01"},{"date_published":"2015-01-01T00:00:00Z","citation":{"short":"D.-A. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, N. Shavit, in:, Springer, 2015, pp. 185–199.","mla":"Alistarh, Dan-Adrian, et al. Inherent Limitations of Hybrid Transactional Memory. Vol. 9363, Springer, 2015, pp. 185–99, doi:10.1007/978-3-662-48653-5_13.","chicago":"Alistarh, Dan-Adrian, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi, and Nir Shavit. “Inherent Limitations of Hybrid Transactional Memory,” 9363:185–99. Springer, 2015. https://doi.org/10.1007/978-3-662-48653-5_13.","ama":"Alistarh D-A, Kopinsky J, Kuznetsov P, Ravi S, Shavit N. Inherent limitations of hybrid transactional memory. In: Vol 9363. Springer; 2015:185-199. doi:10.1007/978-3-662-48653-5_13","ieee":"D.-A. Alistarh, J. Kopinsky, P. Kuznetsov, S. Ravi, and N. Shavit, “Inherent limitations of hybrid transactional memory,” presented at the DISC: Distributed Computing, 2015, vol. 9363, pp. 185–199.","apa":"Alistarh, D.-A., Kopinsky, J., Kuznetsov, P., Ravi, S., & Shavit, N. (2015). Inherent limitations of hybrid transactional memory (Vol. 9363, pp. 185–199). Presented at the DISC: Distributed Computing, Springer. https://doi.org/10.1007/978-3-662-48653-5_13","ista":"Alistarh D-A, Kopinsky J, Kuznetsov P, Ravi S, Shavit N. 2015. Inherent limitations of hybrid transactional memory. DISC: Distributed Computing, LNCS, vol. 9363, 185–199."},"page":"185 - 199","article_processing_charge":"No","day":"01","oa_version":"None","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","_id":"778","intvolume":" 9363","status":"public","title":"Inherent limitations of hybrid transactional memory","abstract":[{"lang":"eng","text":"Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the costs of providing concurrency between hardware and software transactions in HyTM are still not well understood. In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses. The model allows us to formally quantify and analyze the amount of overhead (instrumentation) caused by the potential presence of software transactions.We prove that (1) it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for very weak progress guarantees, and (2) the instrumentation cost incurred by a hardware transaction in any progressive opaque HyTM is linear in the size of the transaction’s data set.We further describe two implementations which exhibit optimal instrumentation costs for two different progress conditions. In sum, this paper proposes the first formal HyTM model and captures for the first time the trade-off between the degree of hardware-software TM concurrency and the amount of instrumentation overhead."}],"type":"conference","alternative_title":["LNCS"],"doi":"10.1007/978-3-662-48653-5_13","conference":{"name":"DISC: Distributed Computing"},"language":[{"iso":"eng"}],"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1405.5689"}],"oa":1,"external_id":{"arxiv":["1405.5689"]},"quality_controlled":"1","month":"01","author":[{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"},{"full_name":"Kopinsky, Justin","first_name":"Justin","last_name":"Kopinsky"},{"last_name":"Kuznetsov","first_name":"Petr","full_name":"Kuznetsov, Petr"},{"full_name":"Ravi, Srivatsan","first_name":"Srivatsan","last_name":"Ravi"},{"last_name":"Shavit","first_name":"Nir","full_name":"Shavit, Nir"}],"volume":9363,"date_created":"2018-12-11T11:48:27Z","date_updated":"2023-02-23T13:17:35Z","year":"2015","acknowledgement":"P. Kuznetsov-The author is supported by the Agence Nationale de la Recherche, ANR-14-CE35-0010-01, project DISCMAT. N. Shavit-Support is gratfeully acknowledgedfrom the National Science Foundation under grants CCF-1217921, CCF-1201926, and IIS-1447786, the Department of Energy under grant ER26116/DE-SC0008923, and the Oracle and Intel corporations.","publisher":"Springer","publication_status":"published","publist_id":"6880","extern":"1"},{"article_processing_charge":"No","day":"13","month":"06","language":[{"iso":"eng"}],"date_published":"2015-06-13T00:00:00Z","doi":"10.1145/2755573.2755600","conference":{"name":"SPAA: Symposium on Parallelism in Algorithms and Architectures"},"page":"123 - 132","citation":{"chicago":"Alistarh, Dan-Adrian, Alexander Matveev, William Leiserson, and Nir Shavit. “ThreadScan: Automatic and Scalable Memory Reclamation,” 2015–June:123–32. ACM, 2015. https://doi.org/10.1145/2755573.2755600.","short":"D.-A. Alistarh, A. Matveev, W. Leiserson, N. Shavit, in:, ACM, 2015, pp. 123–132.","mla":"Alistarh, Dan-Adrian, et al. ThreadScan: Automatic and Scalable Memory Reclamation. Vol. 2015–June, ACM, 2015, pp. 123–32, doi:10.1145/2755573.2755600.","ieee":"D.-A. Alistarh, A. Matveev, W. Leiserson, and N. Shavit, “ThreadScan: Automatic and scalable memory reclamation,” presented at the SPAA: Symposium on Parallelism in Algorithms and Architectures, 2015, vol. 2015–June, pp. 123–132.","apa":"Alistarh, D.-A., Matveev, A., Leiserson, W., & Shavit, N. (2015). ThreadScan: Automatic and scalable memory reclamation (Vol. 2015–June, pp. 123–132). Presented at the SPAA: Symposium on Parallelism in Algorithms and Architectures, ACM. https://doi.org/10.1145/2755573.2755600","ista":"Alistarh D-A, Matveev A, Leiserson W, Shavit N. 2015. ThreadScan: Automatic and scalable memory reclamation. SPAA: Symposium on Parallelism in Algorithms and Architectures vol. 2015–June, 123–132.","ama":"Alistarh D-A, Matveev A, Leiserson W, Shavit N. ThreadScan: Automatic and scalable memory reclamation. In: Vol 2015-June. ACM; 2015:123-132. doi:10.1145/2755573.2755600"},"extern":"1","publist_id":"6876","abstract":[{"text":"The concurrent memory reclamation problem is that of devising a way for a deallocating thread to verify that no other concurrent threads hold references to a memory block being deallocated. To date, in the absence of automatic garbage collection, there is no satisfactory solution to this problem; existing tracking methods like hazard pointers, reference counters, or epoch-based techniques like RCU, are either prohibitively expensive or require significant programming expertise, to the extent that implementing them efficiently can be worthy of a publication. None of the existing techniques are automatic or even semi-automated. In this paper, we take a new approach to concurrent memory reclamation: instead of manually tracking access to memory locations as done in techniques like hazard pointers, or restricting shared accesses to specific epoch boundaries as in RCU, our algorithm, called ThreadScan, leverages operating system signaling to automatically detect which memory locations are being accessed by concurrent threads. Initial empirical evidence shows that ThreadScan scales surprisingly well and requires negligible programming effort beyond the standard use of Malloc and Free.","lang":"eng"}],"type":"conference","oa_version":"None","volume":"2015-June","date_updated":"2023-02-23T12:35:42Z","date_created":"2018-12-11T11:48:27Z","related_material":{"record":[{"relation":"later_version","status":"public","id":"6001"}]},"author":[{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"first_name":"Alexander","last_name":"Matveev","full_name":"Matveev, Alexander"},{"first_name":"William","last_name":"Leiserson","full_name":"Leiserson, William"},{"full_name":"Shavit, Nir","first_name":"Nir","last_name":"Shavit"}],"publisher":"ACM","publication_status":"published","title":"ThreadScan: Automatic and scalable memory reclamation","status":"public","_id":"779","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","acknowledgement":"Support is gratefully acknowledged from the National Science Foundation under grants CCF-1217921, CCF-1301926, and IIS-1447786, the Department of Energy under grant ER26116/DE-SC0008923, and the Oracle corporation. In particular, we would like to thank Dave Dice, Alex Kogan, and Mark Moir from the Oracle Scalable Synchronization Research Group for very useful feedback on earlier drafts of this paper.","year":"2015"},{"day":"01","month":"01","language":[{"iso":"eng"}],"date_published":"2015-01-01T00:00:00Z","doi":"10.1007/978-3-662-47666-6_38","conference":{"name":"ICALP: International Colloquium on Automota, Languages and Programming"},"page":"479 - 491","oa":1,"citation":{"ista":"Alistarh D-A, Gelashvili R. 2015. Polylogarithmic-time leader election in population protocols. ICALP: International Colloquium on Automota, Languages and Programming vol. 9135, 479–491.","apa":"Alistarh, D.-A., & Gelashvili, R. (2015). Polylogarithmic-time leader election in population protocols (Vol. 9135, pp. 479–491). Presented at the ICALP: International Colloquium on Automota, Languages and Programming, Springer. https://doi.org/10.1007/978-3-662-47666-6_38","ieee":"D.-A. Alistarh and R. Gelashvili, “Polylogarithmic-time leader election in population protocols,” presented at the ICALP: International Colloquium on Automota, Languages and Programming, 2015, vol. 9135, pp. 479–491.","ama":"Alistarh D-A, Gelashvili R. Polylogarithmic-time leader election in population protocols. In: Vol 9135. Springer; 2015:479-491. doi:10.1007/978-3-662-47666-6_38","chicago":"Alistarh, Dan-Adrian, and Rati Gelashvili. “Polylogarithmic-Time Leader Election in Population Protocols,” 9135:479–91. Springer, 2015. https://doi.org/10.1007/978-3-662-47666-6_38.","mla":"Alistarh, Dan-Adrian, and Rati Gelashvili. Polylogarithmic-Time Leader Election in Population Protocols. Vol. 9135, Springer, 2015, pp. 479–91, doi:10.1007/978-3-662-47666-6_38.","short":"D.-A. Alistarh, R. Gelashvili, in:, Springer, 2015, pp. 479–491."},"external_id":{"arxiv":["1502.05745"]},"main_file_link":[{"url":"https://arxiv.org/abs/1502.05745","open_access":"1"}],"extern":"1","publist_id":"6877","abstract":[{"text":"Population protocols are networks of finite-state agents, interacting randomly, and updating their states using simple rules. Despite their extreme simplicity, these systems have been shown to cooperatively perform complex computational tasks, such as simulating register machines to compute standard arithmetic functions. The election of a unique leader agent is a key requirement in such computational constructions. Yet, the fastest currently known population protocol for electing a leader only has linear convergence time, and it has recently been shown that no population protocol using a constant number of states per node may overcome this linear bound. In this paper, we give the first population protocol for leader election with polylogarithmic convergence time, using polylogarithmic memory states per node. The protocol structure is quite simple: each node has an associated value, and is either a leader (still in contention) or a minion (following some leader). A leader keeps incrementing its value and “defeats” other leaders in one-to-one interactions, and will drop from contention and become a minion if it meets a leader with higher value. Importantly, a leader also drops out if it meets a minion with higher absolute value. While these rules are quite simple, the proof that this algorithm achieves polylogarithmic convergence time is non-trivial. In particular, the argument combines careful use of concentration inequalities with anti-concentration bounds, showing that the leaders’ values become spread apart as the execution progresses, which in turn implies that straggling leaders get quickly eliminated. We complement our analysis with empirical results, showing that our protocol converges extremely fast, even for large network sizes.","lang":"eng"}],"type":"conference","oa_version":"Preprint","volume":9135,"date_created":"2018-12-11T11:48:28Z","date_updated":"2023-02-23T13:18:11Z","author":[{"last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Gelashvili","first_name":"Rati","full_name":"Gelashvili, Rati"}],"publisher":"Springer","intvolume":" 9135","status":"public","publication_status":"published","title":"Polylogarithmic-time leader election in population protocols","_id":"780","acknowledgement":"Support is gratefully acknowledged from the National Science Foundation under grants CCF-1217921, CCF-1301926, and IIS-1447786, the Department of Energy under grant ER26116/DE-SC0008923, and the Oracle and Intel corporations.”","year":"2015","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87"},{"month":"07","day":"21","article_processing_charge":"No","conference":{"name":"PODC: Principles of Distributed Computing"},"doi":"10.1145/2767386.2767429","date_published":"2015-07-21T00:00:00Z","language":[{"iso":"eng"}],"citation":{"ama":"Alistarh D-A, Gelashvili R, Vojnović M. Fast and exact majority in population protocols. In: Vol 2015-July. ACM; 2015:47-56. doi:10.1145/2767386.2767429","ista":"Alistarh D-A, Gelashvili R, Vojnović M. 2015. Fast and exact majority in population protocols. PODC: Principles of Distributed Computing vol. 2015–July, 47–56.","ieee":"D.-A. Alistarh, R. Gelashvili, and M. Vojnović, “Fast and exact majority in population protocols,” presented at the PODC: Principles of Distributed Computing, 2015, vol. 2015–July, pp. 47–56.","apa":"Alistarh, D.-A., Gelashvili, R., & Vojnović, M. (2015). Fast and exact majority in population protocols (Vol. 2015–July, pp. 47–56). Presented at the PODC: Principles of Distributed Computing, ACM. https://doi.org/10.1145/2767386.2767429","mla":"Alistarh, Dan-Adrian, et al. Fast and Exact Majority in Population Protocols. Vol. 2015–July, ACM, 2015, pp. 47–56, doi:10.1145/2767386.2767429.","short":"D.-A. Alistarh, R. Gelashvili, M. Vojnović, in:, ACM, 2015, pp. 47–56.","chicago":"Alistarh, Dan-Adrian, Rati Gelashvili, and Milan Vojnović. “Fast and Exact Majority in Population Protocols,” 2015–July:47–56. ACM, 2015. https://doi.org/10.1145/2767386.2767429."},"page":"47 - 56","abstract":[{"lang":"eng","text":"Population protocols, roughly defined as systems consisting of large numbers of simple identical agents, interacting at random and updating their state following simple rules, are an important research topic at the intersection of distributed computing and biology. One of the fundamental tasks that a population protocol may solve is majority: each node starts in one of two states; the goal is for all nodes to reach a correct consensus on which of the two states was initially the majority. Despite considerable research effort, known protocols for this problem are either exact but slow (taking linear parallel time to converge), or fast but approximate (with non-zero probability of error). In this paper, we show that this trade-off between preciasion and speed is not inherent. We present a new protocol called Average and Conquer (AVC) that solves majority ex-actly in expected parallel convergence time O(log n/(sε) + log n log s), where n is the number of nodes, εn is the initial node advantage of the majority state, and s = Ω(log n log log n) is the number of states the protocol employs. This shows that the majority problem can be solved exactly in time poly-logarithmic in n, provided that the memory per node is s = Ω(1/ε + lognlog1/ε). On the negative side, we establish a lower bound of Ω(1/ε) on the expected paraallel convergence time for the case of four memory states per node, and a lower bound of Ω(logn) parallel time for protocols using any number of memory states per node.per node, and a lower bound of (log n) parallel time for protocols using any number of memory states per node."}],"publist_id":"6873","extern":"1","type":"conference","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"last_name":"Gelashvili","first_name":"Rati","full_name":"Gelashvili, Rati"},{"last_name":"Vojnović","first_name":"Milan","full_name":"Vojnović, Milan"}],"date_created":"2018-12-11T11:48:28Z","date_updated":"2023-02-23T13:18:35Z","oa_version":"None","volume":"2015-July","_id":"781","year":"2015","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","publication_status":"published","status":"public","title":"Fast and exact majority in population protocols","publisher":"ACM"},{"page":"251 - 260","citation":{"mla":"Alistarh, Dan-Adrian, et al. Lock-Free Algorithms under Stochastic Schedulers. Vol. 2015–July, ACM, 2015, pp. 251–60, doi:10.1145/2767386.2767430.","short":"D.-A. Alistarh, T. Sauerwald, M. Vojnović, in:, ACM, 2015, pp. 251–260.","chicago":"Alistarh, Dan-Adrian, Thomas Sauerwald, and Milan Vojnović. “Lock-Free Algorithms under Stochastic Schedulers,” 2015–July:251–60. ACM, 2015. https://doi.org/10.1145/2767386.2767430.","ama":"Alistarh D-A, Sauerwald T, Vojnović M. Lock-Free algorithms under stochastic schedulers. In: Vol 2015-July. ACM; 2015:251-260. doi:10.1145/2767386.2767430","ista":"Alistarh D-A, Sauerwald T, Vojnović M. 2015. Lock-Free algorithms under stochastic schedulers. PODC: Principles of Distributed Computing vol. 2015–July, 251–260.","apa":"Alistarh, D.-A., Sauerwald, T., & Vojnović, M. (2015). Lock-Free algorithms under stochastic schedulers (Vol. 2015–July, pp. 251–260). Presented at the PODC: Principles of Distributed Computing, ACM. https://doi.org/10.1145/2767386.2767430","ieee":"D.-A. Alistarh, T. Sauerwald, and M. Vojnović, “Lock-Free algorithms under stochastic schedulers,” presented at the PODC: Principles of Distributed Computing, 2015, vol. 2015–July, pp. 251–260."},"language":[{"iso":"eng"}],"conference":{"name":"PODC: Principles of Distributed Computing"},"date_published":"2015-07-21T00:00:00Z","doi":"10.1145/2767386.2767430","day":"21","month":"07","article_processing_charge":"No","publication_status":"published","status":"public","title":"Lock-Free algorithms under stochastic schedulers","publisher":"ACM","_id":"782","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2015","date_updated":"2023-02-23T13:18:50Z","date_created":"2018-12-11T11:48:28Z","volume":"2015-July","oa_version":"None","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Sauerwald, Thomas","last_name":"Sauerwald","first_name":"Thomas"},{"full_name":"Vojnović, Milan","first_name":"Milan","last_name":"Vojnović"}],"type":"conference","extern":"1","abstract":[{"text":"In this work, we consider the following random process, mo- Tivated by the analysis of lock-free concurrent algorithms under high memory contention. In each round, a new scheduling step is allocated to one of n threads, according to a distribution p = (p1; p2; : : : ; pn), where thread i is scheduled with probability pi. When some thread first reaches a set threshold of executed steps, it registers a win, completing its current operation, and resets its step count to 1. At the same time, threads whose step count was close to the threshold also get reset because of the win, but to 0 steps, being penalized for almost winning. We are interested in two questions: how often does some thread complete an operation (system latency), and how often does a specific thread complete an operation (individual latency)? We provide asymptotically tight bounds for the system and individual latency of this general concurrency pattern, for arbitrary scheduling distributions p. Surprisingly, a sim- ple characterization exists: in expectation, the system will complete a new operation every Θ(1/p 2) steps, while thread i will complete a new operation every Θ(1/2=p i ) steps. The proof is interesting in its own right, as it requires a careful analysis of how the higher norms of the vector p inuence the thread step counts and latencies in this random process. Our result offers a simple connection between the scheduling distribution and the average performance of concurrent algorithms, which has several applications.","lang":"eng"}],"publist_id":"6874"},{"language":[{"iso":"eng"}],"doi":"10.1145/2767386.2767420","date_published":"2015-07-21T00:00:00Z","conference":{"name":"PODC: Principles of Distributed Computing"},"page":"365 - 374","citation":{"short":"D.-A. Alistarh, R. Gelashvili, A. Vladu, in:, ACM, 2015, pp. 365–374.","mla":"Alistarh, Dan-Adrian, et al. How to Elect a Leader Faster than a Tournament. Vol. 2015–July, ACM, 2015, pp. 365–74, doi:10.1145/2767386.2767420.","chicago":"Alistarh, Dan-Adrian, Rati Gelashvili, and Adrian Vladu. “How to Elect a Leader Faster than a Tournament,” 2015–July:365–74. ACM, 2015. https://doi.org/10.1145/2767386.2767420.","ama":"Alistarh D-A, Gelashvili R, Vladu A. How to elect a leader faster than a tournament. In: Vol 2015-July. ACM; 2015:365-374. doi:10.1145/2767386.2767420","apa":"Alistarh, D.-A., Gelashvili, R., & Vladu, A. (2015). How to elect a leader faster than a tournament (Vol. 2015–July, pp. 365–374). Presented at the PODC: Principles of Distributed Computing, ACM. https://doi.org/10.1145/2767386.2767420","ieee":"D.-A. Alistarh, R. Gelashvili, and A. Vladu, “How to elect a leader faster than a tournament,” presented at the PODC: Principles of Distributed Computing, 2015, vol. 2015–July, pp. 365–374.","ista":"Alistarh D-A, Gelashvili R, Vladu A. 2015. How to elect a leader faster than a tournament. PODC: Principles of Distributed Computing vol. 2015–July, 365–374."},"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1411.1001"}],"oa":1,"article_processing_charge":"No","month":"07","day":"21","volume":"2015-July","oa_version":"None","date_created":"2018-12-11T11:48:28Z","date_updated":"2023-02-23T13:18:55Z","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Gelashvili, Rati","first_name":"Rati","last_name":"Gelashvili"},{"first_name":"Adrian","last_name":"Vladu","full_name":"Vladu, Adrian"}],"publisher":"ACM","status":"public","publication_status":"published","title":"How to elect a leader faster than a tournament","_id":"783","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2015","acknowledgement":"Support is gratefully acknowledged from the National Science Foundation under grants CCF-1217921, CCF-1301926,\r\nand IIS-1447786, the Department of Energy under grant\r\nER26116/DE-SC0008923, and the Oracle and Intel corporations.\r\nThe authors would like to thank Prof. Nir Shavit for ad-\r\nvice and encouragement during this work, and the anonymous reviewers for their very useful suggestions.","extern":"1","publist_id":"6875","abstract":[{"text":"The problem of electing a leader from among n contenders is one of the fundamental questions in distributed computing. In its simplest formulation, the task is as follows: given n processors, all participants must eventually return a win or lose indication, such that a single contender may win. Despite a considerable amount of work on leader election, the following question is still open: can we elect a leader in an asynchronous fault-prone system faster than just running a Θ(log n)-time tournament, against a strong adaptive adversary? In this paper, we answer this question in the affirmative, improving on a decades-old upper bound. We introduce two new algorithmic ideas to reduce the time complexity of electing a leader to O(log∗ n), using O(n2) point-to-point messages. A non-trivial application of our algorithm is a new upper bound for the tight renaming problem, assigning n items to the n participants in expected O(log2 n) time and O(n2) messages. We complement our results with lower bound of Ω(n2) messages for solving these two problems, closing the question of their message complexity.","lang":"eng"}],"type":"conference"},{"author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"first_name":"Hitesh","last_name":"Ballani","full_name":"Ballani, Hitesh"},{"first_name":"Paolo","last_name":"Costa","full_name":"Costa, Paolo"},{"full_name":"Funnell, Adam","last_name":"Funnell","first_name":"Adam"},{"last_name":"Benjamin","first_name":"Joshua","full_name":"Benjamin, Joshua"},{"last_name":"Watts","first_name":"Philip","full_name":"Watts, Philip"},{"full_name":"Thomsen, Benn","last_name":"Thomsen","first_name":"Benn"}],"date_updated":"2023-02-23T13:18:57Z","date_created":"2018-12-11T11:48:29Z","oa_version":"None","_id":"784","year":"2015","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"A high-radix, low-latency optical switch for data centers","publication_status":"published","status":"public","publisher":"ACM","abstract":[{"lang":"eng","text":"We demonstrate an optical switch design that can scale up to a thousand ports with high per-port bandwidth (25 Gbps+) and low switching latency (40 ns). Our design uses a broadcast and select architecture, based on a passive star coupler and fast tunable transceivers. In addition we employ time division multiplexing to achieve very low switching latency. Our demo shows the feasibility of the switch data plane using a small testbed, comprising two transmitters and a receiver, connected through a star coupler."}],"publist_id":"6872","extern":"1","type":"conference","conference":{"end_date":"2015-08-21","location":"London, United Kindgdom","start_date":"2015-08-17","name":"SIGCOMM: Special Interest Group on Data Communication"},"date_published":"2015-01-01T00:00:00Z","doi":"10.1145/2785956.2790035","language":[{"iso":"eng"}],"citation":{"chicago":"Alistarh, Dan-Adrian, Hitesh Ballani, Paolo Costa, Adam Funnell, Joshua Benjamin, Philip Watts, and Benn Thomsen. “A High-Radix, Low-Latency Optical Switch for Data Centers,” 367–68. ACM, 2015. https://doi.org/10.1145/2785956.2790035.","mla":"Alistarh, Dan-Adrian, et al. A High-Radix, Low-Latency Optical Switch for Data Centers. ACM, 2015, pp. 367–68, doi:10.1145/2785956.2790035.","short":"D.-A. Alistarh, H. Ballani, P. Costa, A. Funnell, J. Benjamin, P. Watts, B. Thomsen, in:, ACM, 2015, pp. 367–368.","ista":"Alistarh D-A, Ballani H, Costa P, Funnell A, Benjamin J, Watts P, Thomsen B. 2015. A high-radix, low-latency optical switch for data centers. SIGCOMM: Special Interest Group on Data Communication, 367–368.","apa":"Alistarh, D.-A., Ballani, H., Costa, P., Funnell, A., Benjamin, J., Watts, P., & Thomsen, B. (2015). A high-radix, low-latency optical switch for data centers (pp. 367–368). Presented at the SIGCOMM: Special Interest Group on Data Communication, London, United Kindgdom: ACM. https://doi.org/10.1145/2785956.2790035","ieee":"D.-A. Alistarh et al., “A high-radix, low-latency optical switch for data centers,” presented at the SIGCOMM: Special Interest Group on Data Communication, London, United Kindgdom, 2015, pp. 367–368.","ama":"Alistarh D-A, Ballani H, Costa P, et al. A high-radix, low-latency optical switch for data centers. In: ACM; 2015:367-368. doi:10.1145/2785956.2790035"},"quality_controlled":"1","page":"367 - 368","month":"01","day":"01","publication_identifier":{"isbn":["978-1-4503-3542-3"]}},{"_id":"768","year":"2014","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","acknowledgement":"Dan Alistarh - This author was supported by the SNF Postdoctoral Fellows Program, NSF grant CCF-1217921, DoE ASCR grant ER26116/DE-SC0008923, and by grants from the Oracle and Intel corporations.\r\nJames Aspnes - Supported in part by NSF grant CCF-0916389.\r\nMichael A. Bender - This research was supported in part by NSF grants CCF 1114809, CCF 1217708, IIS 1247726, and IIS 1251137.\r\nRati Gelashvili - This work was supported in part by NSF grants CCF-1217921, CCF-1301926, DoE ASCR grant ER26116/DE-SC0008923, and by grants from the Oracle and Intel corporations.\r\nSeth Gilbert - Supported by Singapore AcRF-2 MOE2011-T2-2-042.\r\n","publication_status":"published","title":"Dynamic task allocation in asynchronous shared memory","status":"public","publisher":"SIAM","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Aspnes, James","last_name":"Aspnes","first_name":"James"},{"full_name":"Bender, Michael","first_name":"Michael","last_name":"Bender"},{"first_name":"Rati","last_name":"Gelashvili","full_name":"Gelashvili, Rati"},{"full_name":"Gilbert, Seth","last_name":"Gilbert","first_name":"Seth"}],"date_updated":"2023-02-23T13:13:52Z","date_created":"2018-12-11T11:48:24Z","oa_version":"None","type":"conference","abstract":[{"lang":"eng","text":"Task allocation is a classic distributed problem in which a set of p potentially faulty processes must cooperate to perform a set of tasks. This paper considers a new dynamic version of the problem, in which tasks are injected adversarially during an asynchronous execution. We give the first asynchronous shared-memory algorithm for dynamic task allocation, and we prove that our solution is optimal within logarithmic factors. The main algorithmic idea is a randomized concurrent data structure called a dynamic to-do tree, which allows processes to pick new tasks to perform at random from the set of available tasks, and to insert tasks at random empty locations in the data structure. Our analysis shows that these properties avoid duplicating work unnecessarily. On the other hand, since the adversary controls the input as well the scheduling, it can induce executions where lots of processes contend for a few available tasks, which is inefficient. However, we prove that every algorithm has the same problem: given an arbitrary input, if OPT is the worst-case complexity of the optimal algorithm on that input, then the expected work complexity of our algorithm on the same input is O(OPT log3 m), where m is an upper bound on the number of tasks that are present in the system at any given time."}],"publist_id":"6886","extern":"1","citation":{"chicago":"Alistarh, Dan-Adrian, James Aspnes, Michael Bender, Rati Gelashvili, and Seth Gilbert. “Dynamic Task Allocation in Asynchronous Shared Memory,” 416–35. SIAM, 2014. https://doi.org/10.1137/1.9781611973402.31.","mla":"Alistarh, Dan-Adrian, et al. Dynamic Task Allocation in Asynchronous Shared Memory. SIAM, 2014, pp. 416–35, doi:10.1137/1.9781611973402.31.","short":"D.-A. Alistarh, J. Aspnes, M. Bender, R. Gelashvili, S. Gilbert, in:, SIAM, 2014, pp. 416–435.","ista":"Alistarh D-A, Aspnes J, Bender M, Gelashvili R, Gilbert S. 2014. Dynamic task allocation in asynchronous shared memory. SODA: Symposium on Discrete Algorithms, 416–435.","ieee":"D.-A. Alistarh, J. Aspnes, M. Bender, R. Gelashvili, and S. Gilbert, “Dynamic task allocation in asynchronous shared memory,” presented at the SODA: Symposium on Discrete Algorithms, 2014, pp. 416–435.","apa":"Alistarh, D.-A., Aspnes, J., Bender, M., Gelashvili, R., & Gilbert, S. (2014). Dynamic task allocation in asynchronous shared memory (pp. 416–435). Presented at the SODA: Symposium on Discrete Algorithms, SIAM. https://doi.org/10.1137/1.9781611973402.31","ama":"Alistarh D-A, Aspnes J, Bender M, Gelashvili R, Gilbert S. Dynamic task allocation in asynchronous shared memory. In: SIAM; 2014:416-435. doi:10.1137/1.9781611973402.31"},"page":"416 - 435","conference":{"name":"SODA: Symposium on Discrete Algorithms"},"date_published":"2014-01-01T00:00:00Z","doi":"10.1137/1.9781611973402.31","language":[{"iso":"eng"}],"month":"01","day":"01","article_processing_charge":"No"},{"citation":{"ista":"Alistarh D-A, Aspnes J, Censor Hillel K, Gilbert S, Guerraoui R. 2014. Tight bounds for asynchronous renaming. Journal of the ACM. 61(3).","apa":"Alistarh, D.-A., Aspnes, J., Censor Hillel, K., Gilbert, S., & Guerraoui, R. (2014). Tight bounds for asynchronous renaming. Journal of the ACM. ACM. https://doi.org/10.1145/2597630","ieee":"D.-A. Alistarh, J. Aspnes, K. Censor Hillel, S. Gilbert, and R. Guerraoui, “Tight bounds for asynchronous renaming,” Journal of the ACM, vol. 61, no. 3. ACM, 2014.","ama":"Alistarh D-A, Aspnes J, Censor Hillel K, Gilbert S, Guerraoui R. Tight bounds for asynchronous renaming. Journal of the ACM. 2014;61(3). doi:10.1145/2597630","chicago":"Alistarh, Dan-Adrian, James Aspnes, Keren Censor Hillel, Seth Gilbert, and Rachid Guerraoui. “Tight Bounds for Asynchronous Renaming.” Journal of the ACM. ACM, 2014. https://doi.org/10.1145/2597630.","mla":"Alistarh, Dan-Adrian, et al. “Tight Bounds for Asynchronous Renaming.” Journal of the ACM, vol. 61, no. 3, ACM, 2014, doi:10.1145/2597630.","short":"D.-A. Alistarh, J. Aspnes, K. Censor Hillel, S. Gilbert, R. Guerraoui, Journal of the ACM 61 (2014)."},"publication":"Journal of the ACM","doi":"10.1145/2597630","date_published":"2014-05-01T00:00:00Z","language":[{"iso":"eng"}],"article_processing_charge":"No","day":"01","month":"05","_id":"769","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","acknowledgement":"The work of J. Aspnes was supported in part by NSF grant CCF-0916389. The work of S. Gilbert was\r\nsupported by Singapore AcRF-2 MOE 2011-T2-2-042.\r\nK. Censor-Hillel is a Shalon Fellow. Part of this work was performed while K. Censor-Hillel was a postdoc at\r\nMIT, supported by the Simons Postdoctoral Fellowship.","year":"2014","intvolume":" 61","publisher":"ACM","title":"Tight bounds for asynchronous renaming","status":"public","publication_status":"published","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"first_name":"James","last_name":"Aspnes","full_name":"Aspnes, James"},{"last_name":"Censor Hillel","first_name":"Keren","full_name":"Censor Hillel, Keren"},{"full_name":"Gilbert, Seth","last_name":"Gilbert","first_name":"Seth"},{"last_name":"Guerraoui","first_name":"Rachid","full_name":"Guerraoui, Rachid"}],"volume":61,"oa_version":"None","date_created":"2018-12-11T11:48:24Z","date_updated":"2023-02-23T13:14:09Z","type":"journal_article","issue":"3","publist_id":"6887","abstract":[{"text":"This article presents the first tight bounds on the time complexity of shared-memory renaming, a fundamental problem in distributed computing in which a set of processes need to pick distinct identifiers from a small namespace. We first prove an individual lower bound of ω(k) process steps for deterministic renaming into any namespace of size subexponential in k, where k is the number of participants. The bound is tight: it draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for deterministic concurrent fetch-and-increment counters, queues, and stacks. The proof is based on a new reduction from renaming to another fundamental problem in distributed computing: mutual exclusion. We complement this individual bound with a global lower bound of ω(klog(k/c)) on the total step complexity of renaming into a namespace of size ck, for any c = 1. This result applies to randomized algorithms against a strong adversary, and helps derive new global lower bounds for randomized approximate counter implementations, that are tight within logarithmic factors. On the algorithmic side, we give a protocol that transforms any sorting network into a randomized strong adaptive renaming algorithm, with expected cost equal to the depth of the sorting network. This gives a tight adaptive renaming algorithm with expected step complexity O(log k), where k is the contention in the current execution. This algorithm is the first to achieve sublinear time, and it is time-optimal as per our randomized lower bound. Finally, we use this renaming protocol to build monotone-consistent counters with logarithmic step complexity and linearizable fetch-and-increment registers with polylogarithmic cost.","lang":"eng"}],"extern":"1"},{"conference":{"name":"EuroSys: European Conference on Computer Systems"},"doi":"10.1145/2592798.2592808","date_published":"2014-01-01T00:00:00Z","language":[{"iso":"eng"}],"citation":{"mla":"Alistarh, Dan-Adrian, et al. StackTrack: An Automated Transactional Approach to Concurrent Memory Reclamation. ACM, 2014, doi:10.1145/2592798.2592808.","short":"D.-A. Alistarh, P. Eugster, M. Herlihy, A. Matveev, N. Shavit, in:, ACM, 2014.","chicago":"Alistarh, Dan-Adrian, Patrick Eugster, Maurice Herlihy, Alexander Matveev, and Nir Shavit. “StackTrack: An Automated Transactional Approach to Concurrent Memory Reclamation.” ACM, 2014. https://doi.org/10.1145/2592798.2592808.","ama":"Alistarh D-A, Eugster P, Herlihy M, Matveev A, Shavit N. StackTrack: An automated transactional approach to concurrent memory reclamation. In: ACM; 2014. doi:10.1145/2592798.2592808","ista":"Alistarh D-A, Eugster P, Herlihy M, Matveev A, Shavit N. 2014. StackTrack: An automated transactional approach to concurrent memory reclamation. EuroSys: European Conference on Computer Systems.","ieee":"D.-A. Alistarh, P. Eugster, M. Herlihy, A. Matveev, and N. Shavit, “StackTrack: An automated transactional approach to concurrent memory reclamation,” presented at the EuroSys: European Conference on Computer Systems, 2014.","apa":"Alistarh, D.-A., Eugster, P., Herlihy, M., Matveev, A., & Shavit, N. (2014). StackTrack: An automated transactional approach to concurrent memory reclamation. Presented at the EuroSys: European Conference on Computer Systems, ACM. https://doi.org/10.1145/2592798.2592808"},"month":"01","day":"01","article_processing_charge":"No","author":[{"full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"},{"first_name":"Patrick","last_name":"Eugster","full_name":"Eugster, Patrick"},{"full_name":"Herlihy, Maurice","first_name":"Maurice","last_name":"Herlihy"},{"full_name":"Matveev, Alexander","last_name":"Matveev","first_name":"Alexander"},{"first_name":"Nir","last_name":"Shavit","full_name":"Shavit, Nir"}],"date_created":"2018-12-11T11:48:24Z","date_updated":"2023-02-23T13:14:25Z","oa_version":"None","_id":"770","year":"2014","acknowledgement":"Dan Alistarh - Part of this work was performed while the author was a Postdoctoral\r\nAssociate a MIT CSAIL, supported in part by NSF grant CCF-1217921,\r\nDoE ASCR grant ER26116/DE-SC0008923, and by grants from the Oracle\r\nand Intel corporations.\r\nPatrick Eugester - Supported in part by DARPA grant N11AP20014 and NSF grant CNS-\r\n1117065.\r\nMaurice Herlihy - Supported by NSF grant 1301924.\r\nNir Shavit - Supported in part by NSF grants CCF-1217921 and CCF-1301926, DoE\r\nASCR grant ER26116/DE-SC0008923, and by grants from the Oracle and\r\nIntel corporations.","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","publication_status":"published","title":"StackTrack: An automated transactional approach to concurrent memory reclamation","status":"public","publisher":"ACM","abstract":[{"lang":"eng","text":"Dynamic memory reclamation is arguably the biggest open problem in concurrent data structure design: All known solutions induce high overhead, or must be customized to the specific data structure by the programmer, or both. This paper presents StackTrack, the first concurrent memory reclamation scheme that can be applied automatically by a compiler, while maintaining efficiency. StackTrack eliminates most of the expensive bookkeeping required for memory reclamation by leveraging the power of hardware transactional memory (HTM) in a new way: it tracks thread variables dynamically, and in an atomic fashion. This effectively makes all memory references visible without having threads pay the overhead of writing out this information. Our empirical results show that this new approach matches or outperforms prior, non-automated, techniques."}],"publist_id":"6888","extern":"1","type":"conference"},{"month":"01","day":"01","article_processing_charge":"No","conference":{"name":"PODC: Principles of Distributed Computing"},"date_published":"2014-01-01T00:00:00Z","doi":"10.1145/2611462.2611499","language":[{"iso":"eng"}],"citation":{"short":"D.-A. Alistarh, O. Denysyuk, L. Rodrígues, N. Shavit, in:, ACM, 2014, pp. 232–241.","mla":"Alistarh, Dan-Adrian, et al. Balls-into-Leaves: Sub-Logarithmic Renaming in Synchronous Message-Passing Systems. ACM, 2014, pp. 232–41, doi:10.1145/2611462.2611499.","chicago":"Alistarh, Dan-Adrian, Oksana Denysyuk, Luís Rodrígues, and Nir Shavit. “Balls-into-Leaves: Sub-Logarithmic Renaming in Synchronous Message-Passing Systems,” 232–41. ACM, 2014. https://doi.org/10.1145/2611462.2611499.","ama":"Alistarh D-A, Denysyuk O, Rodrígues L, Shavit N. Balls-into-Leaves: Sub-logarithmic renaming in synchronous message-passing systems. In: ACM; 2014:232-241. doi:10.1145/2611462.2611499","apa":"Alistarh, D.-A., Denysyuk, O., Rodrígues, L., & Shavit, N. (2014). Balls-into-Leaves: Sub-logarithmic renaming in synchronous message-passing systems (pp. 232–241). Presented at the PODC: Principles of Distributed Computing, ACM. https://doi.org/10.1145/2611462.2611499","ieee":"D.-A. Alistarh, O. Denysyuk, L. Rodrígues, and N. Shavit, “Balls-into-Leaves: Sub-logarithmic renaming in synchronous message-passing systems,” presented at the PODC: Principles of Distributed Computing, 2014, pp. 232–241.","ista":"Alistarh D-A, Denysyuk O, Rodrígues L, Shavit N. 2014. Balls-into-Leaves: Sub-logarithmic renaming in synchronous message-passing systems. PODC: Principles of Distributed Computing, 232–241."},"page":"232 - 241","abstract":[{"lang":"eng","text":"We consider the following natural problem: n failure-prone servers, communicating synchronously through message passing, must assign themselves one-to-one to n distinct items. Existing literature suggests two possible approaches to this problem. First, model it as an instance of tight renaming in synchronous message-passing systems; for deterministic solutions, a tight bound of ©(logn) communication rounds is known. Second, model the scenario as an instance of randomized load-balancing, for which elegant sub-logarithmic solutions exist. However, careful examination reveals that known load-balancing schemes do not apply to our scenario, because they either do not tolerate faults or do not ensure one-to-one allocation. It is thus natural to ask if sublogarithmic solutions exist for this apparently simple but intriguing problem. In this paper, we combine the two approaches to provide a new randomized solution for tight renaming, which terminates in O (log log n) communication rounds with high probability, against a strong adaptive adversary. Our solution, called Balls-into-Leaves, combines the deterministic approach with a new randomized scheme to obtain perfectly balanced allocations. The algorithm arranges the items as leaves of a tree, and participants repeatedly perform random choices among the leaves. The algorithm exchanges information in each round to split the participants into progressively smaller groups whose random choices do not conflict. We then extend the algorithm to terminate early in O(log log) rounds w.h.p., where is the actual number of failures. These results imply an exponential separation between deterministic and randomized algorithms for the tight renaming problem in message-passing systems."}],"publist_id":"6884","extern":"1","type":"conference","author":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"},{"full_name":"Denysyuk, Oksana","first_name":"Oksana","last_name":"Denysyuk"},{"first_name":"Luís","last_name":"Rodrígues","full_name":"Rodrígues, Luís"},{"full_name":"Shavit, Nir","first_name":"Nir","last_name":"Shavit"}],"date_updated":"2023-02-23T13:14:49Z","date_created":"2018-12-11T11:48:25Z","oa_version":"None","_id":"771","year":"2014","acknowledgement":"Dan Alistarh was partially supported by the SNF Post-\r\ndoctoral Fellows Program, NSF grant CCF-1217921, DoE\r\nASCR grant ER26116/DE-SC0008923, and by grants from\r\nthe Oracle and Intel corporations.\r\nOksana Denysyuk and Lu ́ıs Rodrigues were partially supported by Funda ̧c ̃ao para a Ciˆencia e Tecnologia (FCT) via\r\nthe project PEPITA (PTDC/EEI-SCR/2776/2012) and via\r\nthe INESC-ID multi-annual funding through the PIDDAC\r\nProgram fund grant, under project PEst-OE/EEI/LA0021/\r\n2013.\r\nNir Shavit was supported in part by NSF grants CCF-1217921 and CCF-1301926, DoE ASCR grant ER26116/DE-SC0008923, and by grants from the Oracle and Intel corporations.","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","publication_status":"published","title":"Balls-into-Leaves: Sub-logarithmic renaming in synchronous message-passing systems","status":"public","publisher":"ACM"},{"article_processing_charge":"No","month":"01","day":"01","language":[{"iso":"eng"}],"date_published":"2014-01-01T00:00:00Z","doi":"10.1145/2591796.2591836","conference":{"name":"STOC: Symposium on Theory of Computing"},"page":"714 - 723","external_id":{"arxiv":["1311.3200"]},"citation":{"apa":"Alistarh, D.-A., Censor Hillel, K., & Shavit, N. (2014). Are lock-free concurrent algorithms practically wait-free? (pp. 714–723). Presented at the STOC: Symposium on Theory of Computing, ACM. https://doi.org/10.1145/2591796.2591836","ieee":"D.-A. Alistarh, K. Censor Hillel, and N. Shavit, “Are lock-free concurrent algorithms practically wait-free?,” presented at the STOC: Symposium on Theory of Computing, 2014, pp. 714–723.","ista":"Alistarh D-A, Censor Hillel K, Shavit N. 2014. Are lock-free concurrent algorithms practically wait-free? STOC: Symposium on Theory of Computing, 714–723.","ama":"Alistarh D-A, Censor Hillel K, Shavit N. Are lock-free concurrent algorithms practically wait-free? In: ACM; 2014:714-723. doi:10.1145/2591796.2591836","chicago":"Alistarh, Dan-Adrian, Keren Censor Hillel, and Nir Shavit. “Are Lock-Free Concurrent Algorithms Practically Wait-Free?,” 714–23. ACM, 2014. https://doi.org/10.1145/2591796.2591836.","short":"D.-A. Alistarh, K. Censor Hillel, N. Shavit, in:, ACM, 2014, pp. 714–723.","mla":"Alistarh, Dan-Adrian, et al. Are Lock-Free Concurrent Algorithms Practically Wait-Free? ACM, 2014, pp. 714–23, doi:10.1145/2591796.2591836."},"oa":1,"main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/1311.3200"}],"extern":"1","publist_id":"6885","abstract":[{"lang":"eng","text":"Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free, guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. While obtaining efficient wait-free algorithms has been a long-time goal for the theory community, most non-blocking commercial code is only lock-free. This paper suggests a simple solution to this problem. We show that, for a large class of lock-free algorithms, under scheduling conditions which approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, programmers can keep on designing simple lock-free algorithms instead of complex wait-free ones, and in practice, they will get wait-free progress. Our main contribution is a new way of analyzing a general class of lock-free algorithms under a stochastic scheduler. Our analysis relates the individual performance of processes with the global performance of the system using Markov chain lifting between a complex per-process chain and a simpler system progress chain. We show that lock-free algorithms are not only wait-free with probability 1, but that in fact a general subset of lock-free algorithms can be closely bounded in terms of the average number of steps required until an operation completes. To the best of our knowledge, this is the first attempt to analyze progress conditions, typically stated in relation to a worst case adversary, in a stochastic model capturing their expected asymptotic behavior."}],"type":"conference","oa_version":"Preprint","date_updated":"2023-02-23T13:15:13Z","date_created":"2018-12-11T11:48:25Z","author":[{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian"},{"full_name":"Censor Hillel, Keren","last_name":"Censor Hillel","first_name":"Keren"},{"first_name":"Nir","last_name":"Shavit","full_name":"Shavit, Nir"}],"publisher":"ACM","title":"Are lock-free concurrent algorithms practically wait-free?","status":"public","publication_status":"published","acknowledgement":"Dan Alistarh - Part of this work was performed while the author was a Postdoctoral Associate at MIT CSAIL, where he was supported by SNF\r\nPostdoctoral Fellows Program, NSF grant CCF-1217921, DoE\r\nASCR grant ER26116/DE-SC0008923, and by grants from the Oracle and Intel corporations.\r\nKeron Censor-Hillel - Shalon Fellow\r\nNir Shavit - This work was supported in part by NSF grants CCF-1217921 and\r\nCCF-1301926, DoE ASCR grant ER26116/DE-SC0008923, and\r\nby grants from the Oracle and Intel corporations.","_id":"772","year":"2014","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87"},{"conference":{"end_date":"2014-10-15","location":"Austin, USA","start_date":"2014-10-12","name":"DISC: Distributed Computing"},"date_published":"2014-01-01T00:00:00Z","doi":"10.1007/978-3-662-45174-8_5","language":[{"iso":"eng"}],"citation":{"ama":"Alistarh D-A, Aspnes J, King V, Saia J. Communication-efficient randomized consensus. In: Kuhn F, ed. Vol 8784. Springer; 2014:61-75. doi:10.1007/978-3-662-45174-8_5","ista":"Alistarh D-A, Aspnes J, King V, Saia J. 2014. Communication-efficient randomized consensus. DISC: Distributed Computing, LNCS, vol. 8784, 61–75.","apa":"Alistarh, D.-A., Aspnes, J., King, V., & Saia, J. (2014). Communication-efficient randomized consensus. In F. Kuhn (Ed.) (Vol. 8784, pp. 61–75). Presented at the DISC: Distributed Computing, Austin, USA: Springer. https://doi.org/10.1007/978-3-662-45174-8_5","ieee":"D.-A. Alistarh, J. Aspnes, V. King, and J. Saia, “Communication-efficient randomized consensus,” presented at the DISC: Distributed Computing, Austin, USA, 2014, vol. 8784, pp. 61–75.","mla":"Alistarh, Dan-Adrian, et al. Communication-Efficient Randomized Consensus. Edited by Fabian Kuhn, vol. 8784, Springer, 2014, pp. 61–75, doi:10.1007/978-3-662-45174-8_5.","short":"D.-A. Alistarh, J. Aspnes, V. King, J. Saia, in:, F. Kuhn (Ed.), Springer, 2014, pp. 61–75.","chicago":"Alistarh, Dan-Adrian, James Aspnes, Valerie King, and Jared Saia. “Communication-Efficient Randomized Consensus.” edited by Fabian Kuhn, 8784:61–75. Springer, 2014. https://doi.org/10.1007/978-3-662-45174-8_5."},"page":"61 - 75","day":"01","month":"01","article_processing_charge":"No","author":[{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian"},{"first_name":"James","last_name":"Aspnes","full_name":"Aspnes, James"},{"last_name":"King","first_name":"Valerie","full_name":"King, Valerie"},{"last_name":"Saia","first_name":"Jared","full_name":"Saia, Jared"}],"date_created":"2018-12-11T11:48:25Z","date_updated":"2023-02-23T13:15:36Z","volume":8784,"oa_version":"None","_id":"773","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2014","title":"Communication-efficient randomized consensus","publication_status":"published","status":"public","intvolume":" 8784","publisher":"Springer","editor":[{"last_name":"Kuhn","first_name":"Fabian","full_name":"Kuhn, Fabian"}],"abstract":[{"lang":"eng","text":"We describe a new randomized consensus protocol with expected message complexity O(n2log2n) when fewer than n/2 processes may fail by crashing. This is an almost-linear improvement over the best previously known protocol, and within logarithmic factors of a known Ω(n2) message lower bound. The protocol further ensures that no process sends more than O(n log3n) messages in expectation, which is again within logarithmic factors of optimal.We also present a generalization of the algorithm to an arbitrary number of failures t, which uses expected O(nt + t2log2t) total messages. Our protocol uses messages of size O(log n), and can therefore scale to large networks.\r\n\r\nWe consider the problem of consensus in the challenging classic model. In this model, the adversary is adaptive; it can choose which processors crash at any point during the course of the algorithm. Further, communication is via asynchronous message passing: there is no known upper bound on the time to send a message from one processor to another, and all messages and coin flips are seen by the adversary.\r\n\r\nOur approach is to build a message-efficient, resilient mechanism for aggregating individual processor votes, implementing the message-passing equivalent of a weak shared coin. Roughly, in our protocol, a processor first announces its votes to small groups, then propagates them to increasingly larger groups as it generates more and more votes. To bound the number of messages that an individual process might have to send or receive, the protocol progressively increases the weight of generated votes. The main technical challenge is bounding the impact of votes that are still “in flight” (generated, but not fully propagated) on the final outcome of the shared coin, especially since such votes might have different weights. We achieve this by leveraging the structure of the algorithm, and a technical argument based on martingale concentration bounds. Overall, we show that it is possible to build an efficient message-passing implementation of a shared coin, and in the process (almost-optimally) solve the classic consensus problem in the asynchronous message-passing model."}],"publist_id":"6881","extern":"1","type":"conference","alternative_title":["LNCS"]}]