{"date_created":"2024-02-18T23:01:03Z","month":"01","conference":{"location":"Hongkong, China","end_date":"2024-01-06","start_date":"2024-01-03","name":"CPAL: Conference on Parsimony and Learning"},"abstract":[{"lang":"eng","text":"Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach."}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","alternative_title":["PMLR"],"title":"How to prune your language model: Recovering accuracy on the \"Sparsity May Cry\" benchmark","date_updated":"2024-02-26T10:30:52Z","oa":1,"publication_identifier":{"eissn":["2640-3498"]},"publication_status":"published","author":[{"last_name":"Kurtic","full_name":"Kurtic, Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218","first_name":"Eldar"},{"first_name":"Torsten","last_name":"Hoefler","full_name":"Hoefler, Torsten"},{"last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X"}],"publication":"Proceedings of Machine Learning Research","department":[{"_id":"DaAl"}],"main_file_link":[{"open_access":"1","url":"https://proceedings.mlr.press/v234/kurtic24a"}],"external_id":{"arxiv":["2312.13547"]},"scopus_import":"1","language":[{"iso":"eng"}],"quality_controlled":"1","year":"2024","intvolume":"       234","page":"542-553","day":"08","publisher":"ML Research Press","citation":{"ista":"Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.","ama":"Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In: <i>Proceedings of Machine Learning Research</i>. Vol 234. ML Research Press; 2024:542-553.","ieee":"E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in <i>Proceedings of Machine Learning Research</i>, Hongkong, China, 2024, vol. 234, pp. 542–553.","mla":"Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” <i>Proceedings of Machine Learning Research</i>, vol. 234, ML Research Press, 2024, pp. 542–53.","chicago":"Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” In <i>Proceedings of Machine Learning Research</i>, 234:542–53. ML Research Press, 2024.","short":"E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning Research, ML Research Press, 2024, pp. 542–553.","apa":"Kurtic, E., Hoefler, T., &#38; Alistarh, D.-A. (2024). How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In <i>Proceedings of Machine Learning Research</i> (Vol. 234, pp. 542–553). Hongkong, China: ML Research Press."},"article_processing_charge":"No","volume":234,"type":"conference","date_published":"2024-01-08T00:00:00Z","oa_version":"Preprint","status":"public","_id":"15011"}