{"oa":1,"date_created":"2024-05-28T13:45:20Z","page":"6726-6743","article_processing_charge":"Yes","department":[{"_id":"DaAl"}],"publication_status":"published","alternative_title":["PMLR"],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"07","title":"SPDY: Accurate pruning with speedup guarantees","intvolume":"       162","acknowledgement":"We gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No 805223 ScaleML),\r\nas well as computational support from AWS EC2. We thank Eldar Kurtic for code and hyper-parameters for BERT pruning, and the Neural Magic Team, notably Michael Goin and\r\nMark Kurtz, for support with their software.","quality_controlled":"1","external_id":{"isi":["000922378801029"]},"_id":"17059","file_date_updated":"2024-08-19T06:54:41Z","author":[{"id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","full_name":"Frantar, Elias","last_name":"Frantar","first_name":"Elias"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh"}],"scopus_import":"1","conference":{"name":"ICML: International Conference on Machine Learning","start_date":"2022-07-17","location":"Baltimore, MD, United States","end_date":"2022-07-23"},"has_accepted_license":"1","ec_funded":1,"status":"public","day":"20","publication":"39th International Conference on Machine Learning","date_updated":"2024-08-19T06:55:55Z","corr_author":"1","publisher":"ML Research Press","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png","short":"CC BY (4.0)"},"file":[{"success":1,"creator":"dernst","access_level":"open_access","date_updated":"2024-08-19T06:54:41Z","file_size":615916,"relation":"main_file","date_created":"2024-08-19T06:54:41Z","file_id":"17440","file_name":"2022_PMLR_Frantar.pdf","checksum":"5179a1e4dfc0fbfab6674907299e414a","content_type":"application/pdf"}],"date_published":"2022-07-20T00:00:00Z","oa_version":"Published Version","isi":1,"project":[{"name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223"}],"year":"2022","volume":162,"ddc":["000"],"type":"conference","language":[{"iso":"eng"}],"citation":{"chicago":"Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” In <i>39th International Conference on Machine Learning</i>, 162:6726–43. ML Research Press, 2022.","apa":"Frantar, E., &#38; Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup guarantees. In <i>39th International Conference on Machine Learning</i> (Vol. 162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press.","mla":"Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” <i>39th International Conference on Machine Learning</i>, vol. 162, ML Research Press, 2022, pp. 6726–43.","short":"E. Frantar, D.-A. Alistarh, in:, 39th International Conference on Machine Learning, ML Research Press, 2022, pp. 6726–6743.","ama":"Frantar E, Alistarh D-A. SPDY: Accurate pruning with speedup guarantees. In: <i>39th International Conference on Machine Learning</i>. Vol 162. ML Research Press; 2022:6726-6743.","ista":"Frantar E, Alistarh D-A. 2022. SPDY: Accurate pruning with speedup guarantees. 39th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 162, 6726–6743.","ieee":"E. Frantar and D.-A. Alistarh, “SPDY: Accurate pruning with speedup guarantees,” in <i>39th International Conference on Machine Learning</i>, Baltimore, MD, United States, 2022, vol. 162, pp. 6726–6743."},"abstract":[{"lang":"eng","text":"The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational support for efficiently executing the unstructured-sparse models obtained via pruning. Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. SPDY is the composition of two new techniques. The first is an efficient and general dynamic programming algorithm for solving constrained layer-wise compression problems, given a set of layer-wise error scores. The second technique is a local search procedure for automatically determining such scores in an accurate and robust manner. Experiments across popular vision and language models show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches. We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern."}]}