17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar E, Castro RL, Chen J, Hoefler T, Alistarh D-A. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. In: Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2025:239-251. doi:10.1145/3710848.3710871
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian V, Panferov A, Kuznedelev D, Frantar E, Babenko A, Alistarh D-A. Extreme compression of large language models via additive quantization. In: Proceedings of the 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:12284-12303.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. Error feedback can accurately compress preconditioners. In: 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:35910-35933.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers T, Svirschevski RA, Egiazarian V, et al. SpQR: A sparse-quantized representation for near-lossless LLM weight compression. In: 12th International Conference on Learning Representations. OpenReview; 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar E. Compressing large neural networks : Algorithms, systems and scaling laws. 2024. doi:10.15479/at:ista:17485
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar E, Alistarh D-A. QMoE: Sub-1-bit compression of trillion parameter models. In: Gibbons P, Pekhimenko G, De Sa C, eds. Proceedings of Machine Learning and Systems. Vol 6. ; 2024.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. Scaling laws for sparsely-connected foundation models. In: The Twelfth International Conference on Learning Representations. ; 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar AS, Iofinova EB, Frantar E, Alistarh D-A. SPADE: Sparsity-guided debugging for deep neural networks. In: Proceedings of the 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:45955-45987.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov I, Alimohammadi K, Frantar E, Alistarh D-A. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In: Gibbons P, Pekhimenko G, De Sa C, eds. Proceedings of Machine Learning and Systems . Vol 6. Association for Computing Machinery; 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar E, Castro R, Chen J, Hoefler T, Alistarh D-A. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. 2024. doi:10.5281/ZENODO.14213091
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. OPTQ: Accurate post-training quantization for generative pre-trained transformers. In: 11th International Conference on Learning Representations . International Conference on Learning Representations; 2023.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:10323-10337.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic E, Campos D, Nguyen T, et al. The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2022:4163-4181. doi:10.18653/v1/2022.emnlp-main.279
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar E, Singh SP, Alistarh D-A. Optimal brain compression: A framework for accurate post-training quantization and pruning. In: 36th Conference on Neural Information Processing Systems. Vol 35. ML Research Press; 2022.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar E, Alistarh D-A. SPDY: Accurate pruning with speedup guarantees. In: 39th International Conference on Machine Learning. Vol 162. ML Research Press; 2022:6726-6743.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations of second-order information. In: 35th Conference on Neural Information Processing Systems. Vol 34. Neural Information Processing Systems Foundation; 2021:14873-14886.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. On the sample complexity of adversarial multi-source PAC learning. In: Proceedings of the 37th International Conference on Machine Learning. Vol 119. ML Research Press; 2020:5416-5425.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: AMA

Export / Embed

Grants


17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar E, Castro RL, Chen J, Hoefler T, Alistarh D-A. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. In: Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery; 2025:239-251. doi:10.1145/3710848.3710871
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian V, Panferov A, Kuznedelev D, Frantar E, Babenko A, Alistarh D-A. Extreme compression of large language models via additive quantization. In: Proceedings of the 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:12284-12303.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. Error feedback can accurately compress preconditioners. In: 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:35910-35933.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers T, Svirschevski RA, Egiazarian V, et al. SpQR: A sparse-quantized representation for near-lossless LLM weight compression. In: 12th International Conference on Learning Representations. OpenReview; 2024.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar E. Compressing large neural networks : Algorithms, systems and scaling laws. 2024. doi:10.15479/at:ista:17485
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar E, Alistarh D-A. QMoE: Sub-1-bit compression of trillion parameter models. In: Gibbons P, Pekhimenko G, De Sa C, eds. Proceedings of Machine Learning and Systems. Vol 6. ; 2024.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. Scaling laws for sparsely-connected foundation models. In: The Twelfth International Conference on Learning Representations. ; 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar AS, Iofinova EB, Frantar E, Alistarh D-A. SPADE: Sparsity-guided debugging for deep neural networks. In: Proceedings of the 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:45955-45987.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov I, Alimohammadi K, Frantar E, Alistarh D-A. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. In: Gibbons P, Pekhimenko G, De Sa C, eds. Proceedings of Machine Learning and Systems . Vol 6. Association for Computing Machinery; 2024.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar E, Castro R, Chen J, Hoefler T, Alistarh D-A. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. 2024. doi:10.5281/ZENODO.14213091
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. OPTQ: Accurate post-training quantization for generative pre-trained transformers. In: 11th International Conference on Learning Representations . International Conference on Learning Representations; 2023.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:10323-10337.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic E, Campos D, Nguyen T, et al. The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2022:4163-4181. doi:10.18653/v1/2022.emnlp-main.279
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar E, Singh SP, Alistarh D-A. Optimal brain compression: A framework for accurate post-training quantization and pruning. In: 36th Conference on Neural Information Processing Systems. Vol 35. ML Research Press; 2022.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar E, Alistarh D-A. SPDY: Accurate pruning with speedup guarantees. In: 39th International Conference on Machine Learning. Vol 162. ML Research Press; 2022:6726-6743.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations of second-order information. In: 35th Conference on Neural Information Processing Systems. Vol 34. Neural Information Processing Systems Foundation; 2021:14873-14886.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. On the sample complexity of adversarial multi-source PAC learning. In: Proceedings of the 37th International Conference on Machine Learning. Vol 119. ML Research Press; 2020:5416-5425.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: AMA

Export / Embed