Mher Safaryan
6 Publications
2025 | Published | Conference Paper | IST-REx-ID: 20034 |
Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: 13th International Conference on Learning Representations. ICLR; 2025:101877-101913.
[Published Version]
View
| Files available
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18976 |
Islamov R, Safaryan M, Alistarh D-A. AsGrad: A sharp unified analysis of asynchronous-SGD algorithms. In: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Vol 238. ML Research Press; 2024:649-657.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 19518 |
Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In: 38th Conference on Neural Information Processing Systems. Vol 37. Neural Information Processing Systems Foundation; 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 19510 |
Modoranu I-V, Safaryan M, Malinovsky G, et al. MICROADAM: Accurate adaptive optimization with low space overhead and provable convergence. In: 38th Conference on Neural Information Processing Systems. Vol 37. Neural Information Processing Systems Foundation; 2024.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2023 | Published | Journal Article | IST-REx-ID: 14815 |
Beznosikov A, Horvath S, Richtarik P, Safaryan M. On biased compression for distributed learning. Journal of Machine Learning Research. 2023;24:1-50.
[Published Version]
View
| Files available
| WoS
| arXiv
2023 | Published | Conference Paper | IST-REx-ID: 15363 |
Safaryan M, Krumes A, Alistarh D-A. Knowledge distillation performs partial variance reduction. In: 36th Conference on Neural Information Processing Systems. Vol 36. ; 2023.
[Published Version]
View
| Files available
| arXiv
Grants
6 Publications
2025 | Published | Conference Paper | IST-REx-ID: 20034 |
Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: 13th International Conference on Learning Representations. ICLR; 2025:101877-101913.
[Published Version]
View
| Files available
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 18976 |
Islamov R, Safaryan M, Alistarh D-A. AsGrad: A sharp unified analysis of asynchronous-SGD algorithms. In: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics. Vol 238. ML Research Press; 2024:649-657.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 19518 |
Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In: 38th Conference on Neural Information Processing Systems. Vol 37. Neural Information Processing Systems Foundation; 2024.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 | Published | Conference Paper | IST-REx-ID: 19510 |
Modoranu I-V, Safaryan M, Malinovsky G, et al. MICROADAM: Accurate adaptive optimization with low space overhead and provable convergence. In: 38th Conference on Neural Information Processing Systems. Vol 37. Neural Information Processing Systems Foundation; 2024.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2023 | Published | Journal Article | IST-REx-ID: 14815 |
Beznosikov A, Horvath S, Richtarik P, Safaryan M. On biased compression for distributed learning. Journal of Machine Learning Research. 2023;24:1-50.
[Published Version]
View
| Files available
| WoS
| arXiv
2023 | Published | Conference Paper | IST-REx-ID: 15363 |
Safaryan M, Krumes A, Alistarh D-A. Knowledge distillation performs partial variance reduction. In: 36th Conference on Neural Information Processing Systems. Vol 36. ; 2023.
[Published Version]
View
| Files available
| arXiv