{"year":"2025","publication_identifier":{"eissn":["2522-803X"],"eisbn":["9783031857478"],"issn":["2522-8021"],"isbn":["9783031857461"]},"page":"83-97","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","doi":"10.1007/978-3-031-85747-8_6","language":[{"iso":"eng"}],"acknowledgement":"We would like to thank Eugenia Iofinova for useful comments on an earlier version of this draft, and Artur Niederfahrenhorst for useful suggestions regarding fine-tuning on the GSM8k dataset.","publication_status":"published","OA_type":"green","date_created":"2026-02-16T15:57:53Z","_id":"21257","publication":"Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques","external_id":{"arxiv":["2310.06927"]},"title":"Sparse Fine-Tuning for Inference Acceleration of Large Language Models","date_published":"2025-07-05T00:00:00Z","corr_author":"1","month":"07","author":[{"last_name":"Kurtic","id":"47beb3a5-07b5-11eb-9b87-b108ec578218","full_name":"Kurtic, Eldar","first_name":"Eldar"},{"last_name":"Kuznedelev","full_name":"Kuznedelev, Denis","first_name":"Denis"},{"last_name":"Frantar","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","first_name":"Elias","full_name":"Frantar, Elias"},{"last_name":"Goinv","full_name":"Goinv, Michael","first_name":"Michael"},{"full_name":"Pandit, Shubhra","first_name":"Shubhra","last_name":"Pandit"},{"first_name":"Abhinav","full_name":"Agarwalla, Abhinav","last_name":"Agarwalla"},{"full_name":"Nguyen, Tuan","first_name":"Tuan","last_name":"Nguyen"},{"last_name":"Marques","full_name":"Marques, Alexandre","first_name":"Alexandre"},{"full_name":"Kurtz, Mark","first_name":"Mark","last_name":"Kurtz"},{"first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"oa_version":"Preprint","quality_controlled":"1","article_processing_charge":"No","citation":{"short":"E. Kurtic, D. Kuznedelev, E. Frantar, M. Goinv, S. Pandit, A. Agarwalla, T. Nguyen, A. Marques, M. Kurtz, D.-A. Alistarh, in:, P. Passban, A. Way, M. Rezagholizadeh (Eds.), Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques, Springer Nature, 2025, pp. 83–97.","apa":"Kurtic, E., Kuznedelev, D., Frantar, E., Goinv, M., Pandit, S., Agarwalla, A., … Alistarh, D.-A. (2025). Sparse Fine-Tuning for Inference Acceleration of Large Language Models. In P. Passban, A. Way, &#38; M. Rezagholizadeh (Eds.), <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i> (pp. 83–97). Springer Nature. <a href=\"https://doi.org/10.1007/978-3-031-85747-8_6\">https://doi.org/10.1007/978-3-031-85747-8_6</a>","ama":"Kurtic E, Kuznedelev D, Frantar E, et al. Sparse Fine-Tuning for Inference Acceleration of Large Language Models. In: Passban P, Way A, Rezagholizadeh M, eds. <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>. Springer Nature; 2025:83-97. doi:<a href=\"https://doi.org/10.1007/978-3-031-85747-8_6\">10.1007/978-3-031-85747-8_6</a>","mla":"Kurtic, Eldar, et al. “Sparse Fine-Tuning for Inference Acceleration of Large Language Models.” <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>, edited by Peyman Passban et al., Springer Nature, 2025, pp. 83–97, doi:<a href=\"https://doi.org/10.1007/978-3-031-85747-8_6\">10.1007/978-3-031-85747-8_6</a>.","ieee":"E. Kurtic <i>et al.</i>, “Sparse Fine-Tuning for Inference Acceleration of Large Language Models,” in <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>, P. Passban, A. Way, and M. Rezagholizadeh, Eds. Springer Nature, 2025, pp. 83–97.","ista":"Kurtic E, Kuznedelev D, Frantar E, Goinv M, Pandit S, Agarwalla A, Nguyen T, Marques A, Kurtz M, Alistarh D-A. 2025.Sparse Fine-Tuning for Inference Acceleration of Large Language Models. In: Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques. Machine Translation: Technologies and Applications, , 83–97.","chicago":"Kurtic, Eldar, Denis Kuznedelev, Elias Frantar, Michael Goinv, Shubhra Pandit, Abhinav Agarwalla, Tuan Nguyen, Alexandre Marques, Mark Kurtz, and Dan-Adrian Alistarh. “Sparse Fine-Tuning for Inference Acceleration of Large Language Models.” In <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>, edited by Peyman Passban, Andy Way, and Mehdi Rezagholizadeh, 83–97. Springer Nature, 2025. <a href=\"https://doi.org/10.1007/978-3-031-85747-8_6\">https://doi.org/10.1007/978-3-031-85747-8_6</a>."},"abstract":[{"text":"We investigate the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fine-tuning pre-trained LLMs on specialized tasks, while inducing sparsity in their weights. Our work is motivated by experiments showing that standard loss-based fine-tuning methods are not able to achieve high accuracy in this setting, especially at high sparsity targets. To address this issue, we perform a detailed study of knowledge distillation losses for fine-tuning of sparse models. We determine an L2-based distillation approach that we term ‘SquareHead’, which enables accurate recovery even at higher sparsities. Investigating the question of efficient inference, we show that sparse LLMs can be executed faster by taking advantage of sparsity. Specifically, we exhibit end-to-end results showing speedups enabled by sparsity, while recovering accuracy, on the following models and tasks, respectively: T5 for language translation, Whisper for speech translation, and open GPT-type models such as the Mosaic Pre-Trained Transformer (MPT) and Llama-2 models for text generation. In particular, for popular generative tasks, we show for the first time that sparse fine-tuning can reach 75% sparsity without drops in accuracy, and provide notable end-to-end speedups for inference on CPUs. Moreover, we also highlight that sparsity is compatible with other compression approaches, such as quantization.","lang":"eng"}],"arxiv":1,"OA_place":"repository","day":"05","type":"book_chapter","department":[{"_id":"DaAl"},{"_id":"GradSch"}],"oa":1,"alternative_title":["Machine Translation: Technologies and Applications"],"publisher":"Springer Nature","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2310.06927"}],"editor":[{"last_name":"Passban","first_name":"Peyman","full_name":"Passban, Peyman"},{"last_name":"Way","full_name":"Way, Andy","first_name":"Andy"},{"full_name":"Rezagholizadeh, Mehdi","first_name":"Mehdi","last_name":"Rezagholizadeh"}],"status":"public","date_updated":"2026-02-19T09:26:54Z"}