ISTA Research Explorer

Elias Frantar

18 Publications

[18]

2025 | Published | Conference Paper | IST-REx-ID: 19877 |

Frantar E, Castro RL, Chen J, Hoefler T, Alistarh D-A. 2025. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models. Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. PPoPP: Symposium on Principles and Practice of Parallel Programming, 239–251.

[Published Version] View | Files available | DOI | WoS | arXiv

[17]

2025 | Published | Book Chapter | IST-REx-ID: 21257 |

Kurtic E, Kuznedelev D, Frantar E, Goinv M, Pandit S, Agarwalla A, Nguyen T, Marques A, Kurtz M, Alistarh D-A. 2025.Sparse Fine-Tuning for Inference Acceleration of Large Language Models. In: Enhancing LLM Performance. Efficacy, Fine-Tuning, and Inference Techniques. Machine Translation: Technologies and Applications, , 83–97.

[Preprint] View | DOI | Download Preprint (ext.) | arXiv

[16]

2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar E, Alistarh D-A. 2024. QMoE: Sub-1-bit compression of trillion parameter models. Proceedings of Machine Learning and Systems. MLSys: Machine Learning and Systems vol. 6.

[Published Version] View | Files available | Download Published Version (ext.)

[15]

2024 | Published | Conference Paper | IST-REx-ID: 18062 |

Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. 2024. Scaling laws for sparsely-connected foundation models. The Twelfth International Conference on Learning Representations. ICLR: International Conference on Learning Representations.

[Published Version] View | Files available | Download Published Version (ext.) | arXiv

[14]

2024 | Published | Conference Paper | IST-REx-ID: 18113 |

Egiazarian V, Panferov A, Kuznedelev D, Frantar E, Babenko A, Alistarh D-A. 2024. Extreme compression of large language models via additive quantization. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 12284–12303.

[Preprint] View | Download Preprint (ext.) | arXiv

[13]

2024 | Published | Conference Paper | IST-REx-ID: 18121 |

Moakhar AS, Iofinova EB, Frantar E, Alistarh D-A. 2024. SPADE: Sparsity-guided debugging for deep neural networks. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 45955–45987.

[Preprint] View | Files available | Download Preprint (ext.) | arXiv

[12]

2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar E, Castro R, Chen J, Hoefler T, Alistarh D-A. 2024. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models, Zenodo, 10.5281/ZENODO.14213091.

[Published Version] View | Files available | DOI | Download Published Version (ext.)

[11]

2024 | Published | Conference Paper | IST-REx-ID: 18975 |

Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. 2024. Error feedback can accurately compress preconditioners. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 35910–35933.

[Preprint] View | Download Preprint (ext.) | arXiv

[10]

2024 | Published | Conference Paper | IST-REx-ID: 18977 |

Dettmers T, Svirschevski RA, Egiazarian V, Kuznedelev D, Frantar E, Ashkboos S, Borzunov A, Hoefler T, Alistarh D-A. 2024. SpQR: A sparse-quantized representation for near-lossless LLM weight compression. 12th International Conference on Learning Representations. ICLR: International Conference on Learning Representations.

[Preprint] View | Download Preprint (ext.) | arXiv

[9]

2024 | Published | Conference Paper | IST-REx-ID: 17456 |

Markov I, Alimohammadi K, Frantar E, Alistarh D-A. 2024. L-GreCo: Layerwise-adaptive gradient compression for efficient data-parallel deep learning. Proceedings of Machine Learning and Systems . MLSys: Machine Learning and Systems vol. 6.

[Published Version] View | Files available | Download Published Version (ext.) | arXiv

[8]

2024 | Published | Thesis | PhD | IST-REx-ID: 17485 |

Frantar E. 2024. Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria.

[Published Version] View | Files available | DOI

[7]

2023 | Published | Conference Paper | IST-REx-ID: 14458 |

Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.

[Preprint] View | Files available | Download Preprint (ext.) | arXiv

[6]

2023 | Published | Conference Paper | IST-REx-ID: 17378 |

Frantar E, Ashkboos S, Hoefler T, Alistarh D-A. 2023. OPTQ: Accurate post-training quantization for generative pre-trained transformers. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations.

[Published Version] View | Files available

[5]

2022 | Published | Conference Paper | IST-REx-ID: 17059 |

Frantar E, Alistarh D-A. 2022. SPDY: Accurate pruning with speedup guarantees. 39th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 162, 6726–6743.

[Published Version] View | Files available | WoS

[4]

2022 | Published | Conference Paper | IST-REx-ID: 17087 |

Frantar E, Singh SP, Alistarh D-A. 2022. Optimal brain compression: A framework for accurate post-training quantization and pruning. 36th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 35.

[Submitted Version] View | Files available | arXiv

[3]

2022 | Published | Conference Paper | IST-REx-ID: 17088 |

Kurtic E, Campos D, Nguyen T, Frantar E, Kurtz M, Fineran B, Goin M, Alistarh D-A. 2022. The optimal BERT surgeon: Scalable and accurate second-order pruning for large language models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. EMNLP: Conference on Empirical Methods in Natural Language Processing, 4163–4181.

[Published Version] View | Files available | DOI | arXiv

[2]

2021 | Published | Conference Paper | IST-REx-ID: 11463 |

Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations of second-order information. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 34, 14873–14886.

[Published Version] View | Download Published Version (ext.) | arXiv

[1]

2020 | Published | Conference Paper | IST-REx-ID: 8724 |

Konstantinov NH, Frantar E, Alistarh D-A, Lampert C. 2020. On the sample complexity of adversarial multi-source PAC learning. Proceedings of the 37th International Conference on Machine Learning. ICML: International Conference on Machine Learning vol. 119, 5416–5425.

[Published Version] View | Files available | arXiv

Grants

18 Publications

Mark all

[18]

2025 | Published | Conference Paper | IST-REx-ID: 19877 |

[Published Version] View | Files available | DOI | WoS | arXiv

[17]

2025 | Published | Book Chapter | IST-REx-ID: 21257 |

[Preprint] View | DOI | Download Preprint (ext.) | arXiv

[16]

2024 | Published | Conference Paper | IST-REx-ID: 18061 |

Frantar E, Alistarh D-A. 2024. QMoE: Sub-1-bit compression of trillion parameter models. Proceedings of Machine Learning and Systems. MLSys: Machine Learning and Systems vol. 6.

[Published Version] View | Files available | Download Published Version (ext.)

[15]

2024 | Published | Conference Paper | IST-REx-ID: 18062 |

[Published Version] View | Files available | Download Published Version (ext.) | arXiv

[14]

2024 | Published | Conference Paper | IST-REx-ID: 18113 |

[Preprint] View | Download Preprint (ext.) | arXiv

[13]

2024 | Published | Conference Paper | IST-REx-ID: 18121 |

[Preprint] View | Files available | Download Preprint (ext.) | arXiv

[12]

2024 | Research Data Reference | IST-REx-ID: 19884 |

Frantar E, Castro R, Chen J, Hoefler T, Alistarh D-A. 2024. MARLIN: Mixed-precision auto-regressive parallel inference on Large Language Models, Zenodo, 10.5281/ZENODO.14213091.

[Published Version] View | Files available | DOI | Download Published Version (ext.)

[11]

2024 | Published | Conference Paper | IST-REx-ID: 18975 |

[Preprint] View | Download Preprint (ext.) | arXiv

[10]

2024 | Published | Conference Paper | IST-REx-ID: 18977 |

[Preprint] View | Download Preprint (ext.) | arXiv

[9]

2024 | Published | Conference Paper | IST-REx-ID: 17456 |

[Published Version] View | Files available | Download Published Version (ext.) | arXiv

[8]

2024 | Published | Thesis | PhD | IST-REx-ID: 17485 |

Frantar E. 2024. Compressing large neural networks : Algorithms, systems and scaling laws. Institute of Science and Technology Austria.

[Published Version] View | Files available | DOI

[7]

2023 | Published | Conference Paper | IST-REx-ID: 14458 |

[Preprint] View | Files available | Download Preprint (ext.) | arXiv

[6]

2023 | Published | Conference Paper | IST-REx-ID: 17378 |

[Published Version] View | Files available

[5]

2022 | Published | Conference Paper | IST-REx-ID: 17059 |

[Published Version] View | Files available | WoS

[4]

2022 | Published | Conference Paper | IST-REx-ID: 17087 |

[Submitted Version] View | Files available | arXiv

[3]

2022 | Published | Conference Paper | IST-REx-ID: 17088 |

[Published Version] View | Files available | DOI | arXiv

[2]

2021 | Published | Conference Paper | IST-REx-ID: 11463 |

[Published Version] View | Download Published Version (ext.) | arXiv

[1]

2020 | Published | Conference Paper | IST-REx-ID: 8724 |

[Published Version] View | Files available | arXiv

Elias Frantar

18 Publications

Search

Filter Publications

Display / Sort

Export / Embed

Grants

18 Publications

Search

Filter Publications

Display / Sort

Export / Embed

Elias Frantar

18 Publications

Search

Filter Publications

Display / Sort

Export / Embed

Export Options

Grants

18 Publications

Search

Filter Publications

Display / Sort

Export / Embed

Export Options