17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar, Elias, Roberto L. Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 239–51. Association for Computing Machinery, 2025. https://doi.org/10.1145/3710848.3710871.
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian, Vage, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, and Dan-Adrian Alistarh. “Extreme Compression of Large Language Models via Additive Quantization.” In Proceedings of the 41st International Conference on Machine Learning, 235:12284–303. ML Research Press, 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu, Ionut-Vlad, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, and Dan-Adrian Alistarh. “Error Feedback Can Accurately Compress Preconditioners.” In 41st International Conference on Machine Learning, 235:35910–33. ML Research Press, 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers, Tim, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan-Adrian Alistarh. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight Compression.” In 12th International Conference on Learning Representations. OpenReview, 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar, Elias. “Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17485.
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar, Elias, and Dan-Adrian Alistarh. “QMoE: Sub-1-Bit Compression of Trillion Parameter Models.” In Proceedings of Machine Learning and Systems, edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6, 2024.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar, Elias, Carlos Riquelme Ruiz, Neil Houlsby, Dan-Adrian Alistarh, and Utku Evci. “Scaling Laws for Sparsely-Connected Foundation Models.” In The Twelfth International Conference on Learning Representations, 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar, Arshia Soltani, Eugenia B Iofinova, Elias Frantar, and Dan-Adrian Alistarh. “SPADE: Sparsity-Guided Debugging for Deep Neural Networks.” In Proceedings of the 41st International Conference on Machine Learning, 235:45955–87. ML Research Press, 2024.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov, Ilia, Kaveh Alimohammadi, Elias Frantar, and Dan-Adrian Alistarh. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” In Proceedings of Machine Learning and Systems , edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6. Association for Computing Machinery, 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar, Elias, Roberto Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” Zenodo, 2024. https://doi.org/10.5281/ZENODO.14213091.
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” In 11th International Conference on Learning Representations . International Conference on Learning Representations, 2023.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic, Eldar, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, and Dan-Adrian Alistarh. “The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 4163–81. Association for Computational Linguistics, 2022. https://doi.org/10.18653/v1/2022.emnlp-main.279.
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar, Elias, Sidak Pal Singh, and Dan-Adrian Alistarh. “Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.” In 36th Conference on Neural Information Processing Systems, Vol. 35. ML Research Press, 2022.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” In 39th International Conference on Machine Learning, 162:6726–43. ML Research Press, 2022.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Neural Information Processing Systems Foundation, 2021.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov, Nikola H, Elias Frantar, Dan-Adrian Alistarh, and Christoph Lampert. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” In Proceedings of the 37th International Conference on Machine Learning, 119:5416–25. ML Research Press, 2020.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: Chicago

Export / Embed

Grants


17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar, Elias, Roberto L. Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 239–51. Association for Computing Machinery, 2025. https://doi.org/10.1145/3710848.3710871.
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian, Vage, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, and Dan-Adrian Alistarh. “Extreme Compression of Large Language Models via Additive Quantization.” In Proceedings of the 41st International Conference on Machine Learning, 235:12284–303. ML Research Press, 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu, Ionut-Vlad, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, and Dan-Adrian Alistarh. “Error Feedback Can Accurately Compress Preconditioners.” In 41st International Conference on Machine Learning, 235:35910–33. ML Research Press, 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers, Tim, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan-Adrian Alistarh. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight Compression.” In 12th International Conference on Learning Representations. OpenReview, 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar, Elias. “Compressing Large Neural Networks : Algorithms, Systems and Scaling Laws.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17485.
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar, Elias, and Dan-Adrian Alistarh. “QMoE: Sub-1-Bit Compression of Trillion Parameter Models.” In Proceedings of Machine Learning and Systems, edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6, 2024.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar, Elias, Carlos Riquelme Ruiz, Neil Houlsby, Dan-Adrian Alistarh, and Utku Evci. “Scaling Laws for Sparsely-Connected Foundation Models.” In The Twelfth International Conference on Learning Representations, 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar, Arshia Soltani, Eugenia B Iofinova, Elias Frantar, and Dan-Adrian Alistarh. “SPADE: Sparsity-Guided Debugging for Deep Neural Networks.” In Proceedings of the 41st International Conference on Machine Learning, 235:45955–87. ML Research Press, 2024.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov, Ilia, Kaveh Alimohammadi, Elias Frantar, and Dan-Adrian Alistarh. “L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient Data-Parallel Deep Learning.” In Proceedings of Machine Learning and Systems , edited by P. Gibbons, G. Pekhimenko, and C. De Sa, Vol. 6. Association for Computing Machinery, 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar, Elias, Roberto Castro, Jiale Chen, Torsten Hoefler, and Dan-Adrian Alistarh. “MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.” Zenodo, 2024. https://doi.org/10.5281/ZENODO.14213091.
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh. “OPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers.” In 11th International Conference on Learning Representations . International Conference on Learning Representations, 2023.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic, Eldar, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, and Dan-Adrian Alistarh. “The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 4163–81. Association for Computational Linguistics, 2022. https://doi.org/10.18653/v1/2022.emnlp-main.279.
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar, Elias, Sidak Pal Singh, and Dan-Adrian Alistarh. “Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.” In 36th Conference on Neural Information Processing Systems, Vol. 35. ML Research Press, 2022.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar, Elias, and Dan-Adrian Alistarh. “SPDY: Accurate Pruning with Speedup Guarantees.” In 39th International Conference on Machine Learning, 162:6726–43. ML Research Press, 2022.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Neural Information Processing Systems Foundation, 2021.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov, Nikola H, Elias Frantar, Dan-Adrian Alistarh, and Christoph Lampert. “On the Sample Complexity of Adversarial Multi-Source PAC Learning.” In Proceedings of the 37th International Conference on Machine Learning, 119:5416–25. ML Research Press, 2020.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: Chicago

Export / Embed