17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar E, Castro RL, Chen J, Hoefler T, Alistarh D-A. 2025. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. PPoPP: Symposium on Principles and Practice of Parallel Programming, 239–251.
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian V, Panferov A, Kuznedelev D, Frantar E, Babenko A, Alistarh D-A. 2024. Extreme compression of large language models via additive quantization. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 12284–12303.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. 2024. Error feedback can accurately compress preconditioners. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 35910–35933.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers T, Svirschevski RA, Egiazarian V, Kuznedelev D, Frantar E, Ashkboos S, Borzunov A, Hoefler T, Alistarh D-A. 2024. SpQR: A sparse-quantized representation for near-lossless LLM weight compression. 12th International Conference on Learning Representations. ICLR: International Conference on Learning Representations.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar E. 2024. Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria.
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar E, Alistarh D-A. 2024. QMoE: Sub-1-bit compression of trillion parameter models. Proceedings of Machine Learning and Systems. MLSys: Machine Learning and Systems vol. 6.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. 2024. Scaling laws for sparsely-connected foundation models. The Twelfth International Conference on Learning Representations. ICLR: International Conference on Learning Representations.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar AS, Iofinova EB, Frantar E, Alistarh D-A. 2024. SPADE: Sparsity-guided debugging for deep neural networks. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 45955–45987.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov I, Alimohammadi K, Frantar E, Alistarh D-A. 2024. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. Proceedings of Machine Learning and Systems . MLSys: Machine Learning and Systems vol. 6.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar E, Castro R, Chen J, Hoefler T, Alistarh D-A. 2024. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models, Zenodo, 10.5281/ZENODO.14213091.
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. 2023. OPTQ: Accurate post-training quantization for generative pre-trained transformers. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic E, Campos D, Nguyen T, Frantar E, Kurtz M, Fineran B, Goin M, Alistarh D-A. 2022. The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. EMNLP: Conference on Empirical Methods in Natural Language Processing, 4163–4181.
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar E, Singh SP, Alistarh D-A. 2022. Optimal brain compression: A framework for accurate post-training quantization and pruning. 36th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 35.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar E, Alistarh D-A. 2022. SPDY: Accurate pruning with speedup guarantees. 39th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 162, 6726–6743.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations of second-order information. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 34, 14873–14886.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. 2020. On the sample complexity of adversarial multi-source PAC learning. Proceedings of the 37th International Conference on Machine Learning. ICML: International Conference on Machine Learning vol. 119, 5416–5425.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: ISTA Annual Report

Export / Embed

Grants


17 Publications

Mark all

[17]
2025 | Published | Conference Paper | IST-REx-ID: 19877 | OA
Frantar E, Castro RL, Chen J, Hoefler T, Alistarh D-A. 2025. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. PPoPP: Symposium on Principles and Practice of Parallel Programming, 239–251.
[Published Version] View | Files available | DOI | arXiv
 
[16]
2024 | Published | Conference Paper | IST-REx-ID: 18113 | OA
Egiazarian V, Panferov A, Kuznedelev D, Frantar E, Babenko A, Alistarh D-A. 2024. Extreme compression of large language models via additive quantization. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 12284–12303.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[15]
2024 | Published | Conference Paper | IST-REx-ID: 18975 | OA
Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. 2024. Error feedback can accurately compress preconditioners. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 35910–35933.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[14]
2024 | Published | Conference Paper | IST-REx-ID: 18977 | OA
Dettmers T, Svirschevski RA, Egiazarian V, Kuznedelev D, Frantar E, Ashkboos S, Borzunov A, Hoefler T, Alistarh D-A. 2024. SpQR: A sparse-quantized representation for near-lossless LLM weight compression. 12th International Conference on Learning Representations. ICLR: International Conference on Learning Representations.
[Preprint] View | Download Preprint (ext.) | arXiv
 
[13]
2024 | Published | Thesis | IST-REx-ID: 17485 | OA
Frantar E. 2024. Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria.
[Published Version] View | Files available | DOI
 
[12]
2024 | Published | Conference Paper | IST-REx-ID: 18061 | OA
Frantar E, Alistarh D-A. 2024. QMoE: Sub-1-bit compression of trillion parameter models. Proceedings of Machine Learning and Systems. MLSys: Machine Learning and Systems vol. 6.
[Published Version] View | Files available | Download Published Version (ext.)
 
[11]
2024 | Published | Conference Paper | IST-REx-ID: 18062 | OA
Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. 2024. Scaling laws for sparsely-connected foundation models. The Twelfth International Conference on Learning Representations. ICLR: International Conference on Learning Representations.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[10]
2024 | Published | Conference Paper | IST-REx-ID: 18121 | OA
Moakhar AS, Iofinova EB, Frantar E, Alistarh D-A. 2024. SPADE: Sparsity-guided debugging for deep neural networks. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 45955–45987.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[9]
2024 | Published | Conference Paper | IST-REx-ID: 17456 | OA
Markov I, Alimohammadi K, Frantar E, Alistarh D-A. 2024. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. Proceedings of Machine Learning and Systems . MLSys: Machine Learning and Systems vol. 6.
[Published Version] View | Files available | Download Published Version (ext.) | arXiv
 
[8]
2024 | Research Data Reference | IST-REx-ID: 19884 | OA
Frantar E, Castro R, Chen J, Hoefler T, Alistarh D-A. 2024. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models, Zenodo, 10.5281/ZENODO.14213091.
[Published Version] View | Files available | DOI | Download Published Version (ext.)
 
[7]
2023 | Published | Conference Paper | IST-REx-ID: 17378 | OA
Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. 2023. OPTQ: Accurate post-training quantization for generative pre-trained transformers. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations.
[Published Version] View | Files available
 
[6]
2023 | Published | Conference Paper | IST-REx-ID: 14458 | OA
Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.
[Preprint] View | Files available | Download Preprint (ext.) | arXiv
 
[5]
2022 | Published | Conference Paper | IST-REx-ID: 17088 | OA
Kurtic E, Campos D, Nguyen T, Frantar E, Kurtz M, Fineran B, Goin M, Alistarh D-A. 2022. The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. EMNLP: Conference on Empirical Methods in Natural Language Processing, 4163–4181.
[Published Version] View | Files available | DOI | arXiv
 
[4]
2022 | Published | Conference Paper | IST-REx-ID: 17087 | OA
Frantar E, Singh SP, Alistarh D-A. 2022. Optimal brain compression: A framework for accurate post-training quantization and pruning. 36th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 35.
[Submitted Version] View | Files available | arXiv
 
[3]
2022 | Published | Conference Paper | IST-REx-ID: 17059 | OA
Frantar E, Alistarh D-A. 2022. SPDY: Accurate pruning with speedup guarantees. 39th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 162, 6726–6743.
[Published Version] View | Files available | WoS
 
[2]
2021 | Published | Conference Paper | IST-REx-ID: 11463 | OA
Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations of second-order information. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 34, 14873–14886.
[Published Version] View | Download Published Version (ext.) | arXiv
 
[1]
2020 | Published | Conference Paper | IST-REx-ID: 8724 | OA
Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. 2020. On the sample complexity of adversarial multi-source PAC learning. Proceedings of the 37th International Conference on Machine Learning. ICML: International Conference on Machine Learning vol. 119, 5416–5425.
[Published Version] View | Files available | arXiv
 

Search

Filter Publications

Display / Sort

Citation Style: ISTA Annual Report

Export / Embed