{"intvolume":" 202","scopus_import":"1","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","conference":{"end_date":"2023-07-29","name":"ICML: International Conference on Machine Learning","location":"Honolulu, Hawaii, HI, United States","start_date":"2023-07-23"},"date_updated":"2023-10-31T09:59:42Z","status":"public","publication_status":"published","page":"10323-10337","type":"conference","volume":202,"acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl.","quality_controlled":"1","title":"SparseGPT: Massive language models can be accurately pruned in one-shot","date_published":"2023-07-30T00:00:00Z","article_processing_charge":"No","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2301.00774","open_access":"1"}],"alternative_title":["PMLR"],"_id":"14458","acknowledged_ssus":[{"_id":"ScienComp"}],"external_id":{"arxiv":["2301.00774"]},"publisher":"ML Research Press","oa":1,"date_created":"2023-10-29T23:01:16Z","month":"07","project":[{"grant_number":"805223","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"department":[{"_id":"DaAl"}],"author":[{"id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","first_name":"Elias","full_name":"Frantar, Elias","last_name":"Frantar"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","orcid":"0000-0003-3650-940X"}],"year":"2023","language":[{"iso":"eng"}],"day":"30","publication_identifier":{"eissn":["2640-3498"]},"ec_funded":1,"abstract":[{"lang":"eng","text":"We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt."}],"oa_version":"Preprint","publication":"Proceedings of the 40th International Conference on Machine Learning","citation":{"ista":"Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.","ama":"Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:10323-10337.","chicago":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.","short":"E. Frantar, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 10323–10337.","apa":"Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.","ieee":"E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be accurately pruned in one-shot,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 10323–10337.","mla":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 10323–37."}}