Elias Frantar
17 Publications
2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar, E., Castro, R. L., Chen, J., Hoefler, T., & Alistarh, D.-A. (2025). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (pp. 239–251). Las Vegas, NV, United States: Association for Computing Machinery. https://doi.org/10.1145/3710848.3710871
[Published Version]
View
| Files available
| DOI
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian, V., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., & Alistarh, D.-A. (2024). Extreme compression of large language models via additive quantization. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 12284–12303). Vienna, Austria: ML Research Press.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu, I.-V., Kalinov, A., Kurtic, E., Frantar, E., & Alistarh, D.-A. (2024). Error feedback can accurately compress preconditioners. In 41st International Conference on Machine Learning (Vol. 235, pp. 35910–35933). Vienna, Austria: ML Research Press.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers, T., Svirschevski, R. A., Egiazarian, V., Kuznedelev, D., Frantar, E., Ashkboos, S., … Alistarh, D.-A. (2024). SpQR: A sparse-quantized representation for near-lossless LLM weight compression. In 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Thesis | IST-REx-ID: 17485 |

Frantar, E. (2024). Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17485
[Published Version]
View
| Files available
| DOI
2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar, E., & Alistarh, D.-A. (2024). QMoE: Sub-1-bit compression of trillion parameter models. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Santa Clara, CA, USA.
[Published Version]
View
| Files available
| Download Published Version (ext.)
2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar, E., Ruiz, C. R., Houlsby, N., Alistarh, D.-A., & Evci, U. (2024). Scaling laws for sparsely-connected foundation models. In The Twelfth International Conference on Learning Representations. Vienna, Austria.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar, A. S., Iofinova, E. B., Frantar, E., & Alistarh, D.-A. (2024). SPADE: Sparsity-guided debugging for deep neural networks. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 45955–45987). Vienna, Austria: ML Research Press.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov, I., Alimohammadi, K., Frantar, E., & Alistarh, D.-A. (2024). L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Athens, Greece: Association for Computing Machinery.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar, E., Castro, R., Chen, J., Hoefler, T., & Alistarh, D.-A. (2024). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Zenodo. https://doi.org/10.5281/ZENODO.14213091
[Published Version]
View
| Files available
| DOI
| Download Published Version (ext.)
2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D.-A. (2023). OPTQ: Accurate post-training quantization for generative pre-trained transformers. In 11th International Conference on Learning Representations . Kigali, Rwanda: International Conference on Learning Representations.
[Published Version]
View
| Files available
2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic, E., Campos, D., Nguyen, T., Frantar, E., Kurtz, M., Fineran, B., … Alistarh, D.-A. (2022). The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 4163–4181). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.279
[Published Version]
View
| Files available
| DOI
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar, E., Singh, S. P., & Alistarh, D.-A. (2022). Optimal brain compression: A framework for accurate post-training quantization and pruning. In 36th Conference on Neural Information Processing Systems (Vol. 35). New Orleans, LA, United States: ML Research Press.
[Submitted Version]
View
| Files available
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar, E., & Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup guarantees. In 39th International Conference on Machine Learning (Vol. 162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press.
[Published Version]
View
| Files available
| WoS
2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Neural Information Processing Systems Foundation.
[Published Version]
View
| Download Published Version (ext.)
| arXiv
2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov, N. H., Frantar, E., Alistarh, D.-A., & Lampert, C. (2020). On the sample complexity of adversarial multi-source PAC learning. In Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 5416–5425). Online: ML Research Press.
[Published Version]
View
| Files available
| arXiv
Grants
17 Publications
2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar, E., Castro, R. L., Chen, J., Hoefler, T., & Alistarh, D.-A. (2025). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (pp. 239–251). Las Vegas, NV, United States: Association for Computing Machinery. https://doi.org/10.1145/3710848.3710871
[Published Version]
View
| Files available
| DOI
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian, V., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., & Alistarh, D.-A. (2024). Extreme compression of large language models via additive quantization. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 12284–12303). Vienna, Austria: ML Research Press.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu, I.-V., Kalinov, A., Kurtic, E., Frantar, E., & Alistarh, D.-A. (2024). Error feedback can accurately compress preconditioners. In 41st International Conference on Machine Learning (Vol. 235, pp. 35910–35933). Vienna, Austria: ML Research Press.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers, T., Svirschevski, R. A., Egiazarian, V., Kuznedelev, D., Frantar, E., Ashkboos, S., … Alistarh, D.-A. (2024). SpQR: A sparse-quantized representation for near-lossless LLM weight compression. In 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Thesis | IST-REx-ID: 17485 |

Frantar, E. (2024). Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17485
[Published Version]
View
| Files available
| DOI
2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar, E., & Alistarh, D.-A. (2024). QMoE: Sub-1-bit compression of trillion parameter models. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Santa Clara, CA, USA.
[Published Version]
View
| Files available
| Download Published Version (ext.)
2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar, E., Ruiz, C. R., Houlsby, N., Alistarh, D.-A., & Evci, U. (2024). Scaling laws for sparsely-connected foundation models. In The Twelfth International Conference on Learning Representations. Vienna, Austria.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar, A. S., Iofinova, E. B., Frantar, E., & Alistarh, D.-A. (2024). SPADE: Sparsity-guided debugging for deep neural networks. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 45955–45987). Vienna, Austria: ML Research Press.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov, I., Alimohammadi, K., Frantar, E., & Alistarh, D.-A. (2024). L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Athens, Greece: Association for Computing Machinery.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar, E., Castro, R., Chen, J., Hoefler, T., & Alistarh, D.-A. (2024). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Zenodo. https://doi.org/10.5281/ZENODO.14213091
[Published Version]
View
| Files available
| DOI
| Download Published Version (ext.)
2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D.-A. (2023). OPTQ: Accurate post-training quantization for generative pre-trained transformers. In 11th International Conference on Learning Representations . Kigali, Rwanda: International Conference on Learning Representations.
[Published Version]
View
| Files available
2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic, E., Campos, D., Nguyen, T., Frantar, E., Kurtz, M., Fineran, B., … Alistarh, D.-A. (2022). The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 4163–4181). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.279
[Published Version]
View
| Files available
| DOI
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar, E., Singh, S. P., & Alistarh, D.-A. (2022). Optimal brain compression: A framework for accurate post-training quantization and pruning. In 36th Conference on Neural Information Processing Systems (Vol. 35). New Orleans, LA, United States: ML Research Press.
[Submitted Version]
View
| Files available
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar, E., & Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup guarantees. In 39th International Conference on Machine Learning (Vol. 162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press.
[Published Version]
View
| Files available
| WoS
2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Neural Information Processing Systems Foundation.
[Published Version]
View
| Download Published Version (ext.)
| arXiv
2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov, N. H., Frantar, E., Alistarh, D.-A., & Lampert, C. (2020). On the sample complexity of adversarial multi-source PAC learning. In Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 5416–5425). Online: ML Research Press.
[Published Version]
View
| Files available
| arXiv