<?xml version="1.0" encoding="UTF-8"?>

<modsCollection xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods version="3.3">

<genre>conference paper</genre>

<titleInfo><title>The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information</title></titleInfo>

  
  
<titleInfo type="alternative">
  
  <title>Advances in Neural Information Processing Systems</title>
</titleInfo>

<note type="publicationStatus">published</note>


<note type="qualityControlled">yes</note>

<name type="personal">
  <namePart type="given">Diyuan</namePart>
  <namePart type="family">Wu</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">1a5914c2-896a-11ed-bdf8-fb80621a0635</identifier></name>
<name type="personal">
  <namePart type="given">Ionut-Vlad</namePart>
  <namePart type="family">Modoranu</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">449f7a18-f128-11eb-9611-9b430c0c6333</identifier></name>
<name type="personal">
  <namePart type="given">Mher</namePart>
  <namePart type="family">Safaryan</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">dd546b39-0804-11ed-9c55-ef075c39778d</identifier></name>
<name type="personal">
  <namePart type="given">Denis</namePart>
  <namePart type="family">Kuznedelev</namePart>
  <role><roleTerm type="text">author</roleTerm> </role></name>
<name type="personal">
  <namePart type="given">Dan-Adrian</namePart>
  <namePart type="family">Alistarh</namePart>
  <role><roleTerm type="text">author</roleTerm> </role><identifier type="local">4A899BFC-F248-11E8-B48F-1D18A9856A87</identifier><description xsi:type="identifierDefinition" type="orcid">0000-0003-3650-940X</description></name>







<name type="corporate">
  <namePart></namePart>
  <identifier type="local">DaAl</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>

<name type="corporate">
  <namePart></namePart>
  <identifier type="local">MaMo</identifier>
  <role>
    <roleTerm type="text">department</roleTerm>
  </role>
</name>



<name type="conference">
  <namePart>NeurIPS: Neural Information Processing Systems</namePart>
</name>



<name type="corporate">
  <namePart>IST-BRIDGE: International postdoctoral program</namePart>
  <role><roleTerm type="text">project</roleTerm></role>
</name>



<abstract lang="eng">The rising footprint of machine learning has led to a focus on imposing model
sparsity as a means of reducing computational and memory costs. For deep neural
networks (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics
inspired by the classical Optimal Brain Surgeon (OBS) framework [LeCun et al.,
1989, Hassibi and Stork, 1992, Hassibi et al., 1993], which leverages loss curvature
information to make better pruning decisions. Yet, these results still lack a solid
theoretical understanding, and it is unclear whether they can be improved by
leveraging connections to the wealth of work on sparse recovery algorithms. In this
paper, we draw new connections between these two areas and present new sparse
recovery algorithms inspired by the OBS framework that comes with theoretical
guarantees under reasonable assumptions and have strong practical performance.
Specifically, our work starts from the observation that we can leverage curvature
information in OBS-like fashion upon the projection step of classic iterative sparse
recovery algorithms such as IHT. We show for the first time that this leads both
to improved convergence bounds under standard assumptions. Furthermore, we
present extensions of this approach to the practical task of obtaining accurate sparse
DNNs, and validate it experimentally at scale for Transformer-based models on
vision and language tasks.</abstract>

<originInfo><publisher>Neural Information Processing Systems Foundation</publisher><dateIssued encoding="w3cdtf">2024</dateIssued><place><placeTerm type="text">Vancouver, Canada</placeTerm></place>
</originInfo>
<language><languageTerm authority="iso639-2b" type="code">eng</languageTerm>
</language>



<relatedItem type="host"><titleInfo><title>38th Conference on Neural Information Processing Systems</title></titleInfo>
  <identifier type="issn">1049-5258</identifier>
  <identifier type="arXiv">2408.17163</identifier>
<part><detail type="volume"><number>37</number></detail>
</part>
</relatedItem>


<extension>
<bibliographicCitation>
<chicago>Wu, Diyuan, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, and Dan-Adrian Alistarh. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information.” In &lt;i&gt;38th Conference on Neural Information Processing Systems&lt;/i&gt;, Vol. 37. Neural Information Processing Systems Foundation, 2024.</chicago>
<ieee>D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, and D.-A. Alistarh, “The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information,” in &lt;i&gt;38th Conference on Neural Information Processing Systems&lt;/i&gt;, Vancouver, Canada, 2024, vol. 37.</ieee>
<apa>Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., &amp;#38; Alistarh, D.-A. (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In &lt;i&gt;38th Conference on Neural Information Processing Systems&lt;/i&gt; (Vol. 37). Vancouver, Canada: Neural Information Processing Systems Foundation.</apa>
<ista>Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. 2024. The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. 38th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 37.</ista>
<mla>Wu, Diyuan, et al. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information.” &lt;i&gt;38th Conference on Neural Information Processing Systems&lt;/i&gt;, vol. 37, Neural Information Processing Systems Foundation, 2024.</mla>
<ama>Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In: &lt;i&gt;38th Conference on Neural Information Processing Systems&lt;/i&gt;. Vol 37. Neural Information Processing Systems Foundation; 2024.</ama>
<short>D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, D.-A. Alistarh, in:, 38th Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2024.</short>
</bibliographicCitation>
</extension>
<recordInfo><recordIdentifier>19518</recordIdentifier><recordCreationDate encoding="w3cdtf">2025-04-06T22:01:32Z</recordCreationDate><recordChangeDate encoding="w3cdtf">2025-05-14T11:37:10Z</recordChangeDate>
</recordInfo>
</mods>
</modsCollection>
