Elias Frantar
17 Publications
2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar, Elias, Roberto L. Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 239–51. Association for Computing Machinery, 2025. https://doi.org/10.1145/3710848.3710871.
[Published Version]
View
| Files available
| DOI
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian, Vage, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, and Dan-Adrian Alistarh. “Extreme Compression of Large Language Models via Additive Quantization.” In Proceedings of the 41st International Conference on Machine Learning, 235:12284–303. ML Research Press, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu, Ionut-Vlad, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, and Dan-Adrian Alistarh. “Error Feedback Can Accurately Compress Preconditioners.” In 41st International Conference on Machine Learning, 235:35910–33. ML Research Press, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers, Tim, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan-Adrian Alistarh. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight Compression.” In 12th International Conference on Learning Representations. OpenReview, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Thesis | IST-REx-ID: 17485 |

Frantar, Elias. “Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17485.
[Published Version]
View
| Files available
| DOI
2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar, Elias, and Dan-Adrian Alistarh. “QMoE: Sub-1-Bit Compression of Trillion Parameter Models.” In Proceedings of Machine Learning and Systems, edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar, Elias, Carlos Riquelme Ruiz, Neil Houlsby, Dan-Adrian Alistarh, and Utku Evci. “Scaling Laws for Sparsely-Connected Foundation Models.” In The Twelfth International Conference on Learning Representations, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar, Arshia Soltani, Eugenia B Iofinova, Elias Frantar, and Dan-Adrian Alistarh. “SPADE: Sparsity-Guided Debugging for Deep Neural Networks.” In Proceedings of the 41st International Conference on Machine Learning, 235:45955–87. ML Research Press, 2024.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov, Ilia, Kaveh Alimohammadi, Elias Frantar, and Dan-Adrian Alistarh. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” In Proceedings of Machine Learning and Systems , edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6. Association for Computing Machinery, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar, Elias, Roberto Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” Zenodo, 2024. https://doi.org/10.5281/ZENODO.14213091.
[Published Version]
View
| Files available
| DOI
| Download Published Version (ext.)
2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” In 11th International Conference on Learning Representations . International Conference on Learning Representations, 2023.
[Published Version]
View
| Files available
2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic, Eldar, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, and Dan-Adrian Alistarh. “The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 4163–81. Association for Computational Linguistics, 2022. https://doi.org/10.18653/v1/2022.emnlp-main.279.
[Published Version]
View
| Files available
| DOI
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar, Elias, Sidak Pal Singh, and Dan-Adrian Alistarh. “Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.” In 36th Conference on Neural Information Processing Systems, Vol. 35. ML Research Press, 2022.
[Submitted Version]
View
| Files available
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” In 39th International Conference on Machine Learning, 162:6726–43. ML Research Press, 2022.
[Published Version]
View
| Files available
| WoS
2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Neural Information Processing Systems Foundation, 2021.
[Published Version]
View
| Download Published Version (ext.)
| arXiv
2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov, Nikola H, Elias Frantar, Dan-Adrian Alistarh, and Christoph Lampert. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” In Proceedings of the 37th International Conference on Machine Learning, 119:5416–25. ML Research Press, 2020.
[Published Version]
View
| Files available
| arXiv
Search
Filter Publications
Display / Sort
Export / Embed
Grants
17 Publications
2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar, Elias, Roberto L. Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 239–51. Association for Computing Machinery, 2025. https://doi.org/10.1145/3710848.3710871.
[Published Version]
View
| Files available
| DOI
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian, Vage, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, and Dan-Adrian Alistarh. “Extreme Compression of Large Language Models via Additive Quantization.” In Proceedings of the 41st International Conference on Machine Learning, 235:12284–303. ML Research Press, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu, Ionut-Vlad, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, and Dan-Adrian Alistarh. “Error Feedback Can Accurately Compress Preconditioners.” In 41st International Conference on Machine Learning, 235:35910–33. ML Research Press, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers, Tim, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan-Adrian Alistarh. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight Compression.” In 12th International Conference on Learning Representations. OpenReview, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Thesis | IST-REx-ID: 17485 |

Frantar, Elias. “Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17485.
[Published Version]
View
| Files available
| DOI
2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar, Elias, and Dan-Adrian Alistarh. “QMoE: Sub-1-Bit Compression of Trillion Parameter Models.” In Proceedings of Machine Learning and Systems, edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar, Elias, Carlos Riquelme Ruiz, Neil Houlsby, Dan-Adrian Alistarh, and Utku Evci. “Scaling Laws for Sparsely-Connected Foundation Models.” In The Twelfth International Conference on Learning Representations, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar, Arshia Soltani, Eugenia B Iofinova, Elias Frantar, and Dan-Adrian Alistarh. “SPADE: Sparsity-Guided Debugging for Deep Neural Networks.” In Proceedings of the 41st International Conference on Machine Learning, 235:45955–87. ML Research Press, 2024.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov, Ilia, Kaveh Alimohammadi, Elias Frantar, and Dan-Adrian Alistarh. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” In Proceedings of Machine Learning and Systems , edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6. Association for Computing Machinery, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar, Elias, Roberto Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” Zenodo, 2024. https://doi.org/10.5281/ZENODO.14213091.
[Published Version]
View
| Files available
| DOI
| Download Published Version (ext.)
2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” In 11th International Conference on Learning Representations . International Conference on Learning Representations, 2023.
[Published Version]
View
| Files available
2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic, Eldar, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, and Dan-Adrian Alistarh. “The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 4163–81. Association for Computational Linguistics, 2022. https://doi.org/10.18653/v1/2022.emnlp-main.279.
[Published Version]
View
| Files available
| DOI
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar, Elias, Sidak Pal Singh, and Dan-Adrian Alistarh. “Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.” In 36th Conference on Neural Information Processing Systems, Vol. 35. ML Research Press, 2022.
[Submitted Version]
View
| Files available
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” In 39th International Conference on Machine Learning, 162:6726–43. ML Research Press, 2022.
[Published Version]
View
| Files available
| WoS
2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Neural Information Processing Systems Foundation, 2021.
[Published Version]
View
| Download Published Version (ext.)
| arXiv
2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov, Nikola H, Elias Frantar, Dan-Adrian Alistarh, and Christoph Lampert. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” In Proceedings of the 37th International Conference on Machine Learning, 119:5416–25. ML Research Press, 2020.
[Published Version]
View
| Files available
| arXiv