How to prune your language model: Recovering accuracy on the "Sparsity May Cry" benchmark
Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.
Download (ext.)
https://proceedings.mlr.press/v234/kurtic24a
[Preprint]
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Series Title
PMLR
Abstract
Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach.
Publishing Year
Date Published
2024-01-08
Proceedings Title
Proceedings of Machine Learning Research
Publisher
ML Research Press
Volume
234
Page
542-553
Conference
CPAL: Conference on Parsimony and Learning
Conference Location
Hongkong, China
Conference Date
2024-01-03 – 2024-01-06
eISSN
IST-REx-ID
Cite this
Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In: Proceedings of Machine Learning Research. Vol 234. ML Research Press; 2024:542-553.
Kurtic, E., Hoefler, T., & Alistarh, D.-A. (2024). How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In Proceedings of Machine Learning Research (Vol. 234, pp. 542–553). Hongkong, China: ML Research Press.
Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” In Proceedings of Machine Learning Research, 234:542–53. ML Research Press, 2024.
E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in Proceedings of Machine Learning Research, Hongkong, China, 2024, vol. 234, pp. 542–553.
Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.
Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” Proceedings of Machine Learning Research, vol. 234, ML Research Press, 2024, pp. 542–53.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Open Access
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2312.13547