---
OA_place: repository
OA_type: green
_id: '19518'
abstract:
- lang: eng
  text: "The rising footprint of machine learning has led to a focus on imposing model\r\nsparsity
    as a means of reducing computational and memory costs. For deep neural\r\nnetworks
    (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics\r\ninspired
    by the classical Optimal Brain Surgeon (OBS) framework [LeCun et al.,\r\n1989,
    Hassibi and Stork, 1992, Hassibi et al., 1993], which leverages loss curvature\r\ninformation
    to make better pruning decisions. Yet, these results still lack a solid\r\ntheoretical
    understanding, and it is unclear whether they can be improved by\r\nleveraging
    connections to the wealth of work on sparse recovery algorithms. In this\r\npaper,
    we draw new connections between these two areas and present new sparse\r\nrecovery
    algorithms inspired by the OBS framework that comes with theoretical\r\nguarantees
    under reasonable assumptions and have strong practical performance.\r\nSpecifically,
    our work starts from the observation that we can leverage curvature\r\ninformation
    in OBS-like fashion upon the projection step of classic iterative sparse\r\nrecovery
    algorithms such as IHT. We show for the first time that this leads both\r\nto
    improved convergence bounds under standard assumptions. Furthermore, we\r\npresent
    extensions of this approach to the practical task of obtaining accurate sparse\r\nDNNs,
    and validate it experimentally at scale for Transformer-based models on\r\nvision
    and language tasks."
acknowledged_ssus:
- _id: CampIT
acknowledgement: The authors thank the anonymous NeurIPS reviewers for their useful
  comments and feedback, the IT department from the Institute of Science and Technology
  Austria for the hardware support, and Weights and Biases for the infrastructure
  to track all our experiments. Mher Safaryan has received funding from the European
  Union’s Horizon 2020 research and innovation program under the Maria Skłodowska-Curie
  grant agreement No 101034413.
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Diyuan
  full_name: Wu, Diyuan
  id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
  last_name: Wu
- first_name: Ionut-Vlad
  full_name: Modoranu, Ionut-Vlad
  id: 449f7a18-f128-11eb-9611-9b430c0c6333
  last_name: Modoranu
- first_name: Mher
  full_name: Safaryan, Mher
  id: dd546b39-0804-11ed-9c55-ef075c39778d
  last_name: Safaryan
- first_name: Denis
  full_name: Kuznedelev, Denis
  last_name: Kuznedelev
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative
    optimal brain surgeon: Faster sparse recovery by leveraging second-order information.
    In: <i>38th Conference on Neural Information Processing Systems</i>. Vol 37. Neural
    Information Processing Systems Foundation; 2024.'
  apa: 'Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., &#38; Alistarh, D.-A.
    (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging
    second-order information. In <i>38th Conference on Neural Information Processing
    Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing Systems
    Foundation.'
  chicago: 'Wu, Diyuan, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, and
    Dan-Adrian Alistarh. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery
    by Leveraging Second-Order Information.” In <i>38th Conference on Neural Information
    Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation,
    2024.'
  ieee: 'D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, and D.-A. Alistarh, “The
    iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order
    information,” in <i>38th Conference on Neural Information Processing Systems</i>,
    Vancouver, Canada, 2024, vol. 37.'
  ista: 'Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. 2024. The iterative
    optimal brain surgeon: Faster sparse recovery by leveraging second-order information.
    38th Conference on Neural Information Processing Systems. NeurIPS: Neural Information
    Processing Systems, Advances in Neural Information Processing Systems, vol. 37.'
  mla: 'Wu, Diyuan, et al. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery
    by Leveraging Second-Order Information.” <i>38th Conference on Neural Information
    Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation,
    2024.'
  short: D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, D.-A. Alistarh, in:, 38th
    Conference on Neural Information Processing Systems, Neural Information Processing
    Systems Foundation, 2024.
conference:
  end_date: 2024-12-15
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-09
corr_author: '1'
date_created: 2025-04-06T22:01:32Z
date_published: 2024-12-20T00:00:00Z
date_updated: 2025-05-14T11:37:10Z
day: '20'
department:
- _id: DaAl
- _id: MaMo
ec_funded: 1
external_id:
  arxiv:
  - '2408.17163'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2408.17163
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'The iterative optimal brain surgeon: Faster sparse recovery by leveraging
  second-order information'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...