{"publisher":"ML Research Press","scopus_import":"1","file_date_updated":"2025-12-16T12:32:40Z","title":"EvoPress: Accurate dynamic model compression via evolutionary search","year":"2025","month":"05","arxiv":1,"oa":1,"OA_type":"gold","department":[{"_id":"DaAl"}],"status":"public","corr_author":"1","date_updated":"2025-12-16T12:34:32Z","author":[{"full_name":"Sieberling, Oliver","first_name":"Oliver","last_name":"Sieberling"},{"last_name":"Kuznedelev","first_name":"Denis","full_name":"Kuznedelev, Denis"},{"full_name":"Kurtic, Eldar","first_name":"Eldar","last_name":"Kurtic","id":"47beb3a5-07b5-11eb-9b87-b108ec578218"},{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"publication":"42nd International Conference on Machine Learning","ddc":["000"],"day":"01","_id":"20820","OA_place":"publisher","quality_controlled":"1","oa_version":"Published Version","publication_status":"published","page":"55556-55590","type":"conference","citation":{"ama":"Sieberling O, Kuznedelev D, Kurtic E, Alistarh D-A. EvoPress: Accurate dynamic model compression via evolutionary search. In: 42nd International Conference on Machine Learning. Vol 267. ML Research Press; 2025:55556-55590.","ieee":"O. Sieberling, D. Kuznedelev, E. Kurtic, and D.-A. Alistarh, “EvoPress: Accurate dynamic model compression via evolutionary search,” in 42nd International Conference on Machine Learning, Vancouver, Canada, 2025, vol. 267, pp. 55556–55590.","mla":"Sieberling, Oliver, et al. “EvoPress: Accurate Dynamic Model Compression via Evolutionary Search.” 42nd International Conference on Machine Learning, vol. 267, ML Research Press, 2025, pp. 55556–90.","chicago":"Sieberling, Oliver, Denis Kuznedelev, Eldar Kurtic, and Dan-Adrian Alistarh. “EvoPress: Accurate Dynamic Model Compression via Evolutionary Search.” In 42nd International Conference on Machine Learning, 267:55556–90. ML Research Press, 2025.","ista":"Sieberling O, Kuznedelev D, Kurtic E, Alistarh D-A. 2025. EvoPress: Accurate dynamic model compression via evolutionary search. 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 55556–55590.","apa":"Sieberling, O., Kuznedelev, D., Kurtic, E., & Alistarh, D.-A. (2025). EvoPress: Accurate dynamic model compression via evolutionary search. In 42nd International Conference on Machine Learning (Vol. 267, pp. 55556–55590). Vancouver, Canada: ML Research Press.","short":"O. Sieberling, D. Kuznedelev, E. Kurtic, D.-A. Alistarh, in:, 42nd International Conference on Machine Learning, ML Research Press, 2025, pp. 55556–55590."},"file":[{"creator":"dernst","date_updated":"2025-12-16T12:32:40Z","date_created":"2025-12-16T12:32:40Z","content_type":"application/pdf","relation":"main_file","access_level":"open_access","file_name":"2025_ICML_Sieberling.pdf","success":1,"file_id":"20828","file_size":908379,"checksum":"1d744fbaeb199b08e8b6f48bc0dd047e"}],"article_processing_charge":"No","intvolume":" 267","volume":267,"language":[{"iso":"eng"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","date_created":"2025-12-14T23:02:05Z","tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","image":"/images/cc_by.png","short":"CC BY (4.0)"},"external_id":{"arxiv":["2410.14649"]},"publication_identifier":{"eissn":["2640-3498"]},"conference":{"location":"Vancouver, Canada","start_date":"2025-07-13","end_date":"2025-07-19","name":"ICML: International Conference on Machine Learning"},"date_published":"2025-05-01T00:00:00Z","has_accepted_license":"1","alternative_title":["PMLR"],"abstract":[{"lang":"eng","text":"The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the \"importance\" of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths."}]}