---
OA_place: repository
OA_type: green
_id: '21257'
abstract:
- lang: eng
  text: 'We investigate the problem of accurate sparse fine-tuning of large language
    models (LLMs), that is, fine-tuning pre-trained LLMs on specialized tasks, while
    inducing sparsity in their weights. Our work is motivated by experiments showing
    that standard loss-based fine-tuning methods are not able to achieve high accuracy
    in this setting, especially at high sparsity targets. To address this issue, we
    perform a detailed study of knowledge distillation losses for fine-tuning of sparse
    models. We determine an L2-based distillation approach that we term ‘SquareHead’,
    which enables accurate recovery even at higher sparsities. Investigating the question
    of efficient inference, we show that sparse LLMs can be executed faster by taking
    advantage of sparsity. Specifically, we exhibit end-to-end results showing speedups
    enabled by sparsity, while recovering accuracy, on the following models and tasks,
    respectively: T5 for language translation, Whisper for speech translation, and
    open GPT-type models such as the Mosaic Pre-Trained Transformer (MPT) and Llama-2
    models for text generation. In particular, for popular generative tasks, we show
    for the first time that sparse fine-tuning can reach 75% sparsity without drops
    in accuracy, and provide notable end-to-end speedups for inference on CPUs. Moreover,
    we also highlight that sparsity is compatible with other compression approaches,
    such as quantization.'
acknowledgement: We would like to thank Eugenia Iofinova for useful comments on an
  earlier version of this draft, and Artur Niederfahrenhorst for useful suggestions
  regarding fine-tuning on the GSM8k dataset.
alternative_title:
- 'Machine Translation: Technologies and Applications'
article_processing_charge: No
arxiv: 1
author:
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Denis
  full_name: Kuznedelev, Denis
  last_name: Kuznedelev
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Michael
  full_name: Goinv, Michael
  last_name: Goinv
- first_name: Shubhra
  full_name: Pandit, Shubhra
  last_name: Pandit
- first_name: Abhinav
  full_name: Agarwalla, Abhinav
  last_name: Agarwalla
- first_name: Tuan
  full_name: Nguyen, Tuan
  last_name: Nguyen
- first_name: Alexandre
  full_name: Marques, Alexandre
  last_name: Marques
- first_name: Mark
  full_name: Kurtz, Mark
  last_name: Kurtz
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Kurtic E, Kuznedelev D, Frantar E, et al. Sparse Fine-Tuning for Inference
    Acceleration of Large Language Models. In: Passban P, Way A, Rezagholizadeh M,
    eds. <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>.
    Springer Nature; 2025:83-97. doi:<a href="https://doi.org/10.1007/978-3-031-85747-8_6">10.1007/978-3-031-85747-8_6</a>'
  apa: Kurtic, E., Kuznedelev, D., Frantar, E., Goinv, M., Pandit, S., Agarwalla,
    A., … Alistarh, D.-A. (2025). Sparse Fine-Tuning for Inference Acceleration of
    Large Language Models. In P. Passban, A. Way, &#38; M. Rezagholizadeh (Eds.),
    <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>
    (pp. 83–97). Springer Nature. <a href="https://doi.org/10.1007/978-3-031-85747-8_6">https://doi.org/10.1007/978-3-031-85747-8_6</a>
  chicago: Kurtic, Eldar, Denis Kuznedelev, Elias Frantar, Michael Goinv, Shubhra
    Pandit, Abhinav Agarwalla, Tuan Nguyen, Alexandre Marques, Mark Kurtz, and Dan-Adrian
    Alistarh. “Sparse Fine-Tuning for Inference Acceleration of Large Language Models.”
    In <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques</i>,
    edited by Peyman Passban, Andy Way, and Mehdi Rezagholizadeh, 83–97. Springer
    Nature, 2025. <a href="https://doi.org/10.1007/978-3-031-85747-8_6">https://doi.org/10.1007/978-3-031-85747-8_6</a>.
  ieee: E. Kurtic <i>et al.</i>, “Sparse Fine-Tuning for Inference Acceleration of
    Large Language Models,” in <i>Enhancing LLM Performance. Efficacy, Fine-Tuning,
    and Inference Techniques</i>, P. Passban, A. Way, and M. Rezagholizadeh, Eds.
    Springer Nature, 2025, pp. 83–97.
  ista: 'Kurtic E, Kuznedelev D, Frantar E, Goinv M, Pandit S, Agarwalla A, Nguyen
    T, Marques A, Kurtz M, Alistarh D-A. 2025.Sparse Fine-Tuning for Inference Acceleration
    of Large Language Models. In: Enhancing LLM Performance. Efficacy, Fine-Tuning,
    and Inference Techniques. Machine Translation: Technologies and Applications,
    , 83–97.'
  mla: Kurtic, Eldar, et al. “Sparse Fine-Tuning for Inference Acceleration of Large
    Language Models.” <i>Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference
    Techniques</i>, edited by Peyman Passban et al., Springer Nature, 2025, pp. 83–97,
    doi:<a href="https://doi.org/10.1007/978-3-031-85747-8_6">10.1007/978-3-031-85747-8_6</a>.
  short: E. Kurtic, D. Kuznedelev, E. Frantar, M. Goinv, S. Pandit, A. Agarwalla,
    T. Nguyen, A. Marques, M. Kurtz, D.-A. Alistarh, in:, P. Passban, A. Way, M. Rezagholizadeh
    (Eds.), Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques,
    Springer Nature, 2025, pp. 83–97.
corr_author: '1'
date_created: 2026-02-16T15:57:53Z
date_published: 2025-07-05T00:00:00Z
date_updated: 2026-02-19T09:26:54Z
day: '05'
department:
- _id: DaAl
- _id: GradSch
doi: 10.1007/978-3-031-85747-8_6
editor:
- first_name: Peyman
  full_name: Passban, Peyman
  last_name: Passban
- first_name: Andy
  full_name: Way, Andy
  last_name: Way
- first_name: Mehdi
  full_name: Rezagholizadeh, Mehdi
  last_name: Rezagholizadeh
external_id:
  arxiv:
  - '2310.06927'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2310.06927
month: '07'
oa: 1
oa_version: Preprint
page: 83-97
publication: Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques
publication_identifier:
  eisbn:
  - '9783031857478'
  eissn:
  - 2522-803X
  isbn:
  - '9783031857461'
  issn:
  - 2522-8021
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
status: public
title: Sparse Fine-Tuning for Inference Acceleration of Large Language Models
type: book_chapter
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2025'
...
