EvoPress: Accurate dynamic model compression via evolutionary search
Sieberling O, Kuznedelev D, Kurtic E, Alistarh D-A. 2025. EvoPress: Accurate dynamic model compression via evolutionary search. 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 55556–55590.
Download
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Series Title
PMLR
Abstract
The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the "importance" of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths.
Publishing Year
Date Published
2025-05-01
Proceedings Title
42nd International Conference on Machine Learning
Publisher
ML Research Press
Volume
267
Page
55556-55590
Conference
ICML: International Conference on Machine Learning
Conference Location
Vancouver, Canada
Conference Date
2025-07-13 – 2025-07-19
eISSN
IST-REx-ID
Cite this
Sieberling O, Kuznedelev D, Kurtic E, Alistarh D-A. EvoPress: Accurate dynamic model compression via evolutionary search. In: 42nd International Conference on Machine Learning. Vol 267. ML Research Press; 2025:55556-55590.
Sieberling, O., Kuznedelev, D., Kurtic, E., & Alistarh, D.-A. (2025). EvoPress: Accurate dynamic model compression via evolutionary search. In 42nd International Conference on Machine Learning (Vol. 267, pp. 55556–55590). Vancouver, Canada: ML Research Press.
Sieberling, Oliver, Denis Kuznedelev, Eldar Kurtic, and Dan-Adrian Alistarh. “EvoPress: Accurate Dynamic Model Compression via Evolutionary Search.” In 42nd International Conference on Machine Learning, 267:55556–90. ML Research Press, 2025.
O. Sieberling, D. Kuznedelev, E. Kurtic, and D.-A. Alistarh, “EvoPress: Accurate dynamic model compression via evolutionary search,” in 42nd International Conference on Machine Learning, Vancouver, Canada, 2025, vol. 267, pp. 55556–55590.
Sieberling O, Kuznedelev D, Kurtic E, Alistarh D-A. 2025. EvoPress: Accurate dynamic model compression via evolutionary search. 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 55556–55590.
Sieberling, Oliver, et al. “EvoPress: Accurate Dynamic Model Compression via Evolutionary Search.” 42nd International Conference on Machine Learning, vol. 267, ML Research Press, 2025, pp. 55556–90.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
2025_ICML_Sieberling.pdf
908.38 KB
Access Level
Open Access
Date Uploaded
2025-12-16
MD5 Checksum
1d744fbaeb199b08e8b6f48bc0dd047e
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2410.14649
