Elias Frantar
17 Publications
2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar, Elias, et al. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2025, pp. 239–51, doi:10.1145/3710848.3710871.
[Published Version]
View
| Files available
| DOI
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian, Vage, et al. “Extreme Compression of Large Language Models via Additive Quantization.” Proceedings of the 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 12284–303.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu, Ionut-Vlad, et al. “Error Feedback Can Accurately Compress Preconditioners.” 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 35910–33.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers, Tim, et al. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight Compression.” 12th International Conference on Learning Representations, OpenReview, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Thesis | IST-REx-ID: 17485 |

Frantar, Elias. Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17485.
[Published Version]
View
| Files available
| DOI
2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar, Elias, and Dan-Adrian Alistarh. “QMoE: Sub-1-Bit Compression of Trillion Parameter Models.” Proceedings of Machine Learning and Systems, edited by P. Gibbons et al., vol. 6, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar, Elias, et al. “Scaling Laws for Sparsely-Connected Foundation Models.” The Twelfth International Conference on Learning Representations, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar, Arshia Soltani, et al. “SPADE: Sparsity-Guided Debugging for Deep Neural Networks.” Proceedings of the 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 45955–87.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov, Ilia, et al. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” Proceedings of Machine Learning and Systems , edited by P. Gibbons et al., vol. 6, Association for Computing Machinery, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar, Elias, et al. MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. Zenodo, 2024, doi:10.5281/ZENODO.14213091.
[Published Version]
View
| Files available
| DOI
| Download Published Version (ext.)
2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar, Elias, et al. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” 11th International Conference on Learning Representations , International Conference on Learning Representations, 2023.
[Published Version]
View
| Files available
2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 10323–37.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic, Eldar, et al. “The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2022, pp. 4163–81, doi:10.18653/v1/2022.emnlp-main.279.
[Published Version]
View
| Files available
| DOI
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar, Elias, et al. “Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.” 36th Conference on Neural Information Processing Systems, vol. 35, ML Research Press, 2022.
[Submitted Version]
View
| Files available
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” 39th International Conference on Machine Learning, vol. 162, ML Research Press, 2022, pp. 6726–43.
[Published Version]
View
| Files available
| WoS
2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” 35th Conference on Neural Information Processing Systems, vol. 34, Neural Information Processing Systems Foundation, 2021, pp. 14873–86.
[Published Version]
View
| Download Published Version (ext.)
| arXiv
2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov, Nikola H., et al. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” Proceedings of the 37th International Conference on Machine Learning, vol. 119, ML Research Press, 2020, pp. 5416–25.
[Published Version]
View
| Files available
| arXiv
Grants
17 Publications
2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar, Elias, et al. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, Association for Computing Machinery, 2025, pp. 239–51, doi:10.1145/3710848.3710871.
[Published Version]
View
| Files available
| DOI
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian, Vage, et al. “Extreme Compression of Large Language Models via Additive Quantization.” Proceedings of the 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 12284–303.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu, Ionut-Vlad, et al. “Error Feedback Can Accurately Compress Preconditioners.” 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 35910–33.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers, Tim, et al. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight Compression.” 12th International Conference on Learning Representations, OpenReview, 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Thesis | IST-REx-ID: 17485 |

Frantar, Elias. Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17485.
[Published Version]
View
| Files available
| DOI
2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar, Elias, and Dan-Adrian Alistarh. “QMoE: Sub-1-Bit Compression of Trillion Parameter Models.” Proceedings of Machine Learning and Systems, edited by P. Gibbons et al., vol. 6, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar, Elias, et al. “Scaling Laws for Sparsely-Connected Foundation Models.” The Twelfth International Conference on Learning Representations, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar, Arshia Soltani, et al. “SPADE: Sparsity-Guided Debugging for Deep Neural Networks.” Proceedings of the 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 45955–87.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov, Ilia, et al. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” Proceedings of Machine Learning and Systems , edited by P. Gibbons et al., vol. 6, Association for Computing Machinery, 2024.
[Published Version]
View
| Files available
| Download Published Version (ext.)
| arXiv
2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar, Elias, et al. MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. Zenodo, 2024, doi:10.5281/ZENODO.14213091.
[Published Version]
View
| Files available
| DOI
| Download Published Version (ext.)
2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar, Elias, et al. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” 11th International Conference on Learning Representations , International Conference on Learning Representations, 2023.
[Published Version]
View
| Files available
2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 10323–37.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic, Eldar, et al. “The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2022, pp. 4163–81, doi:10.18653/v1/2022.emnlp-main.279.
[Published Version]
View
| Files available
| DOI
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar, Elias, et al. “Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.” 36th Conference on Neural Information Processing Systems, vol. 35, ML Research Press, 2022.
[Submitted Version]
View
| Files available
| arXiv
2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” 39th International Conference on Machine Learning, vol. 162, ML Research Press, 2022, pp. 6726–43.
[Published Version]
View
| Files available
| WoS
2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” 35th Conference on Neural Information Processing Systems, vol. 34, Neural Information Processing Systems Foundation, 2021, pp. 14873–86.
[Published Version]
View
| Download Published Version (ext.)
| arXiv
2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov, Nikola H., et al. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” Proceedings of the 37th International Conference on Machine Learning, vol. 119, ML Research Press, 2020, pp. 5416–25.
[Published Version]
View
| Files available
| arXiv