{"OA_type":"green","year":"2024","date_created":"2025-04-06T22:01:32Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","month":"12","article_processing_charge":"No","acknowledged_ssus":[{"_id":"CampIT"}],"external_id":{"arxiv":["2408.17163"]},"scopus_import":"1","intvolume":"        37","publisher":"Neural Information Processing Systems Foundation","title":"The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information","publication_identifier":{"issn":["1049-5258"]},"citation":{"ista":"Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. 2024. The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. 38th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 37.","ieee":"D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, and D.-A. Alistarh, “The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information,” in <i>38th Conference on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.","ama":"Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In: <i>38th Conference on Neural Information Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation; 2024.","chicago":"Wu, Diyuan, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, and Dan-Adrian Alistarh. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information.” In <i>38th Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation, 2024.","short":"D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, D.-A. Alistarh, in:, 38th Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2024.","mla":"Wu, Diyuan, et al. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information.” <i>38th Conference on Neural Information Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.","apa":"Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., &#38; Alistarh, D.-A. (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In <i>38th Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing Systems Foundation."},"day":"20","publication":"38th Conference on Neural Information Processing Systems","corr_author":"1","publication_status":"published","date_published":"2024-12-20T00:00:00Z","_id":"19518","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2408.17163"}],"department":[{"_id":"DaAl"},{"_id":"MaMo"}],"project":[{"name":"IST-BRIDGE: International postdoctoral program","_id":"fc2ed2f7-9c52-11eb-aca3-c01059dda49c","grant_number":"101034413","call_identifier":"H2020"}],"quality_controlled":"1","oa":1,"abstract":[{"lang":"eng","text":"The rising footprint of machine learning has led to a focus on imposing model\r\nsparsity as a means of reducing computational and memory costs. For deep neural\r\nnetworks (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics\r\ninspired by the classical Optimal Brain Surgeon (OBS) framework [LeCun et al.,\r\n1989, Hassibi and Stork, 1992, Hassibi et al., 1993], which leverages loss curvature\r\ninformation to make better pruning decisions. Yet, these results still lack a solid\r\ntheoretical understanding, and it is unclear whether they can be improved by\r\nleveraging connections to the wealth of work on sparse recovery algorithms. In this\r\npaper, we draw new connections between these two areas and present new sparse\r\nrecovery algorithms inspired by the OBS framework that comes with theoretical\r\nguarantees under reasonable assumptions and have strong practical performance.\r\nSpecifically, our work starts from the observation that we can leverage curvature\r\ninformation in OBS-like fashion upon the projection step of classic iterative sparse\r\nrecovery algorithms such as IHT. We show for the first time that this leads both\r\nto improved convergence bounds under standard assumptions. Furthermore, we\r\npresent extensions of this approach to the practical task of obtaining accurate sparse\r\nDNNs, and validate it experimentally at scale for Transformer-based models on\r\nvision and language tasks."}],"acknowledgement":"The authors thank the anonymous NeurIPS reviewers for their useful comments and feedback, the IT department from the Institute of Science and Technology Austria for the hardware support, and Weights and Biases for the infrastructure to track all our experiments. Mher Safaryan has received funding from the European Union’s Horizon 2020 research and innovation program under the Maria Skłodowska-Curie grant agreement No 101034413.","conference":{"start_date":"2024-12-09","location":"Vancouver, Canada","name":"NeurIPS: Neural Information Processing Systems","end_date":"2024-12-15"},"type":"conference","alternative_title":["Advances in Neural Information Processing Systems"],"oa_version":"Preprint","ec_funded":1,"author":[{"id":"1a5914c2-896a-11ed-bdf8-fb80621a0635","full_name":"Wu, Diyuan","first_name":"Diyuan","last_name":"Wu"},{"full_name":"Modoranu, Ionut-Vlad","id":"449f7a18-f128-11eb-9611-9b430c0c6333","first_name":"Ionut-Vlad","last_name":"Modoranu"},{"last_name":"Safaryan","first_name":"Mher","full_name":"Safaryan, Mher","id":"dd546b39-0804-11ed-9c55-ef075c39778d"},{"first_name":"Denis","last_name":"Kuznedelev","full_name":"Kuznedelev, Denis"},{"full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"}],"arxiv":1,"language":[{"iso":"eng"}],"status":"public","date_updated":"2025-05-14T11:37:10Z","volume":37,"OA_place":"repository"}