“Give me BF16 or give me death”? Accuracy-performance trade-offs in LLM quantization
Kurtic E, Marques A, Pandit S, Kurtz M, Alistarh D-A. 2025. “Give me BF16 or give me death”? Accuracy-performance trade-offs in LLM quantization. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. ACL: Meeting of the Association for Computational Linguistics, 26872–26886.
Download
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Abstract
Quantization is a powerful tool for accelerating large language model (LLM) inference, but the accuracy-performance trade-offs across different formats remain unclear. In this paper, we conduct the most comprehensive empirical study to date, evaluating FP8, INT8, and INT4
quantization across academic benchmarks and real-world tasks on the entire Llama-3.1 model
family. Through over 500,000 evaluations, our investigation yields several key findings: (1) FP8 (W8A8-FP) is effectively lossless across all model scales, (2) well-tuned INT8 (W8A8-INT) achieves surprisingly low (1-3%) accuracy degradation, and (3) INT4 weightonly (W4A16-INT) is more competitive than expected, rivaling 8-bit quantization. Further, we investigate the optimal quantization format for different deployments by analyzing inference performance through the popular vLLM framework. Our analysis provides clear deployment recommendations: W4A16 is the most cost-efficient for synchronous setups, while W8A8 dominates in asynchronous
continuous batching. For mixed workloads, the optimal choice depends on the specific use
case. Our findings offer practical, data-driven guidelines for deploying quantized LLMs at scale—ensuring the best balance between speed, efficiency, and accuracy.
Publishing Year
Date Published
2025-08-01
Proceedings Title
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
Publisher
Association for Computational Linguistics
Page
26872-26886
Conference
ACL: Meeting of the Association for Computational Linguistics
Conference Location
Vienna, Austria
Conference Date
2025-07-27 – 2025-08-01
ISBN
ISSN
IST-REx-ID
Cite this
Kurtic E, Marques A, Pandit S, Kurtz M, Alistarh D-A. “Give me BF16 or give me death”? Accuracy-performance trade-offs in LLM quantization. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics; 2025:26872-26886.
Kurtic, E., Marques, A., Pandit, S., Kurtz, M., & Alistarh, D.-A. (2025). “Give me BF16 or give me death”? Accuracy-performance trade-offs in LLM quantization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (pp. 26872–26886). Vienna, Austria: Association for Computational Linguistics.
Kurtic, Eldar, Alexandre Marques, Shubhra Pandit, Mark Kurtz, and Dan-Adrian Alistarh. “‘Give Me BF16 or Give Me Death’? Accuracy-Performance Trade-Offs in LLM Quantization.” In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 26872–86. Association for Computational Linguistics, 2025.
E. Kurtic, A. Marques, S. Pandit, M. Kurtz, and D.-A. Alistarh, “‘Give me BF16 or give me death’? Accuracy-performance trade-offs in LLM quantization,” in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, 2025, pp. 26872–26886.
Kurtic E, Marques A, Pandit S, Kurtz M, Alistarh D-A. 2025. “Give me BF16 or give me death”? Accuracy-performance trade-offs in LLM quantization. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. ACL: Meeting of the Association for Computational Linguistics, 26872–26886.
Kurtic, Eldar, et al. “‘Give Me BF16 or Give Me Death’? Accuracy-Performance Trade-Offs in LLM Quantization.” Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2025, pp. 26872–86.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
2025_ACL_Kurtic.pdf
417.45 KB
Access Level
Open Access
Date Uploaded
2025-11-26
MD5 Checksum
4c066ee20f9ab17619c95652c0eb75f1
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2411.02355
