17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar, E., Castro, R. L., Chen, J., Hoefler, T., & Alistarh, D.-A. (2025). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (pp. 239–251). Las Vegas, NV, United States: Association for Computing Machinery. https://doi.org/10.1145/3710848.3710871
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian, V., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., & Alistarh, D.-A. (2024). Extreme compression of large language models via additive quantization. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 12284–12303). Vienna, Austria: ML Research Press.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu, I.-V., Kalinov, A., Kurtic, E., Frantar, E., & Alistarh, D.-A. (2024). Error feedback can accurately compress preconditioners. In 41st International Conference on Machine Learning (Vol. 235, pp. 35910–35933). Vienna, Austria: ML Research Press.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers, T., Svirschevski, R. A., Egiazarian, V., Kuznedelev, D., Frantar, E., Ashkboos, S., … Alistarh, D.-A. (2024). SpQR: A sparse-quantized representation for near-lossless LLM weight compression. In 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar, E. (2024). Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17485
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar, E., & Alistarh, D.-A. (2024). QMoE: Sub-1-bit compression of trillion parameter models. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Santa Clara, CA, USA.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar, E., Ruiz, C. R., Houlsby, N., Alistarh, D.-A., & Evci, U. (2024). Scaling laws for sparsely-connected foundation models. In The Twelfth International Conference on Learning Representations. Vienna, Austria.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar, A. S., Iofinova, E. B., Frantar, E., & Alistarh, D.-A. (2024). SPADE: Sparsity-guided debugging for deep neural networks. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 45955–45987). Vienna, Austria: ML Research Press.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov, I., Alimohammadi, K., Frantar, E., & Alistarh, D.-A. (2024). L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Athens, Greece: Association for Computing Machinery.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar, E., Castro, R., Chen, J., Hoefler, T., & Alistarh, D.-A. (2024). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Zenodo. https://doi.org/10.5281/ZENODO.14213091
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D.-A. (2023). OPTQ: Accurate post-training quantization for generative pre-trained transformers. In 11th International Conference on Learning Representations . Kigali, Rwanda: International Conference on Learning Representations.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic, E., Campos, D., Nguyen, T., Frantar, E., Kurtz, M., Fineran, B., … Alistarh, D.-A. (2022). The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 4163–4181). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.279
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar, E., Singh, S. P., & Alistarh, D.-A. (2022). Optimal brain compression: A framework for accurate post-training quantization and pruning. In 36th Conference on Neural Information Processing Systems (Vol. 35). New Orleans, LA, United States: ML Research Press.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar, E., & Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup guarantees. In 39th International Conference on Machine Learning (Vol. 162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Neural Information Processing Systems Foundation.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov, N. H., Frantar, E., Alistarh, D.-A., & Lampert, C. (2020). On the sample complexity of adversarial multi-source PAC learning. In Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 5416–5425). Online: ML Research Press.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: APA

Export / Embed

Grants


17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar, E., Castro, R. L., Chen, J., Hoefler, T., & Alistarh, D.-A. (2025). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. In Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (pp. 239–251). Las Vegas, NV, United States: Association for Computing Machinery. https://doi.org/10.1145/3710848.3710871
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian, V., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., & Alistarh, D.-A. (2024). Extreme compression of large language models via additive quantization. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 12284–12303). Vienna, Austria: ML Research Press.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu, I.-V., Kalinov, A., Kurtic, E., Frantar, E., & Alistarh, D.-A. (2024). Error feedback can accurately compress preconditioners. In 41st International Conference on Machine Learning (Vol. 235, pp. 35910–35933). Vienna, Austria: ML Research Press.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers, T., Svirschevski, R. A., Egiazarian, V., Kuznedelev, D., Frantar, E., Ashkboos, S., … Alistarh, D.-A. (2024). SpQR: A sparse-quantized representation for near-lossless LLM weight compression. In 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar, E. (2024). Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17485
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar, E., & Alistarh, D.-A. (2024). QMoE: Sub-1-bit compression of trillion parameter models. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Santa Clara, CA, USA.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar, E., Ruiz, C. R., Houlsby, N., Alistarh, D.-A., & Evci, U. (2024). Scaling laws for sparsely-connected foundation models. In The Twelfth International Conference on Learning Representations. Vienna, Austria.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar, A. S., Iofinova, E. B., Frantar, E., & Alistarh, D.-A. (2024). SPADE: Sparsity-guided debugging for deep neural networks. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 45955–45987). Vienna, Austria: ML Research Press.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov, I., Alimohammadi, K., Frantar, E., & Alistarh, D.-A. (2024). L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In P. Gibbons, G. Pekhimenko, & C. De Sa (Eds.), Proceedings of Machine Learning and Systems (Vol. 6). Athens, Greece: Association for Computing Machinery.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar, E., Castro, R., Chen, J., Hoefler, T., & Alistarh, D.-A. (2024). MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Zenodo. https://doi.org/10.5281/ZENODO.14213091
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar, E., Ashkboos, S., Hoefler, T., & Alistarh, D.-A. (2023). OPTQ: Accurate post-training quantization for generative pre-trained transformers. In 11th International Conference on Learning Representations . Kigali, Rwanda: International Conference on Learning Representations.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic, E., Campos, D., Nguyen, T., Frantar, E., Kurtz, M., Fineran, B., … Alistarh, D.-A. (2022). The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (pp. 4163–4181). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.emnlp-main.279
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar, E., Singh, S. P., & Alistarh, D.-A. (2022). Optimal brain compression: A framework for accurate post-training quantization and pruning. In 36th Conference on Neural Information Processing Systems (Vol. 35). New Orleans, LA, United States: ML Research Press.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar, E., & Alistarh, D.-A. (2022). SPDY: Accurate pruning with speedup guarantees. In 39th International Conference on Machine Learning (Vol. 162, pp. 6726–6743). Baltimore, MD, United States: ML Research Press.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Neural Information Processing Systems Foundation.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov, N. H., Frantar, E., Alistarh, D.-A., & Lampert, C. (2020). On the sample complexity of adversarial multi-source PAC learning. In Proceedings of the 37th International Conference on Machine Learning (Vol. 119, pp. 5416–5425). Online: ML Research Press.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: APA

Export / Embed