Mher Safaryan
6 Publications
2025 |
Published |
Conference Paper |
IST-REx-ID: 20034 |
Robert, T., Safaryan, M., Modoranu, I.-V., & Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In 13th International Conference on Learning Representations (pp. 101877–101913). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2024 |
Published |
Conference Paper |
IST-REx-ID: 18976 |
Islamov, R., Safaryan, M., & Alistarh, D.-A. (2024). AsGrad: A sharp unified analysis of asynchronous-SGD algorithms. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (Vol. 238, pp. 649–657). Valencia, Spain: ML Research Press.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 |
Published |
Conference Paper |
IST-REx-ID: 19510 |
Modoranu, I.-V., Safaryan, M., Malinovsky, G., Kurtic, E., Robert, T., Richtárik, P., & Alistarh, D.-A. (2024). MICROADAM: Accurate adaptive optimization with low space overhead and provable convergence. In 38th Conference on Neural Information Processing Systems (Vol. 37). Neural Information Processing Systems Foundation.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 |
Published |
Conference Paper |
IST-REx-ID: 19518 |
Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., & Alistarh, D.-A. (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In 38th Conference on Neural Information Processing Systems (Vol. 37). Vancouver, Canada: Neural Information Processing Systems Foundation.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2023 |
Published |
Journal Article |
IST-REx-ID: 14815 |
Beznosikov, A., Horvath, S., Richtarik, P., & Safaryan, M. (2023). On biased compression for distributed learning. Journal of Machine Learning Research. Journal of Machine Learning Research.
[Published Version]
View
| Files available
| WoS
| arXiv
2023 |
Published |
Conference Paper |
IST-REx-ID: 15363 |
Safaryan, M., Krumes, A., & Alistarh, D.-A. (2023). Knowledge distillation performs partial variance reduction. In 36th Conference on Neural Information Processing Systems (Vol. 36). New Orleans, LA, United States.
[Published Version]
View
| Files available
| arXiv
Grants
6 Publications
2025 |
Published |
Conference Paper |
IST-REx-ID: 20034 |
Robert, T., Safaryan, M., Modoranu, I.-V., & Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In 13th International Conference on Learning Representations (pp. 101877–101913). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2024 |
Published |
Conference Paper |
IST-REx-ID: 18976 |
Islamov, R., Safaryan, M., & Alistarh, D.-A. (2024). AsGrad: A sharp unified analysis of asynchronous-SGD algorithms. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (Vol. 238, pp. 649–657). Valencia, Spain: ML Research Press.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2024 |
Published |
Conference Paper |
IST-REx-ID: 19510 |
Modoranu, I.-V., Safaryan, M., Malinovsky, G., Kurtic, E., Robert, T., Richtárik, P., & Alistarh, D.-A. (2024). MICROADAM: Accurate adaptive optimization with low space overhead and provable convergence. In 38th Conference on Neural Information Processing Systems (Vol. 37). Neural Information Processing Systems Foundation.
[Preprint]
View
| Files available
| Download Preprint (ext.)
| arXiv
2024 |
Published |
Conference Paper |
IST-REx-ID: 19518 |
Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., & Alistarh, D.-A. (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order information. In 38th Conference on Neural Information Processing Systems (Vol. 37). Vancouver, Canada: Neural Information Processing Systems Foundation.
[Preprint]
View
| Download Preprint (ext.)
| arXiv
2023 |
Published |
Journal Article |
IST-REx-ID: 14815 |
Beznosikov, A., Horvath, S., Richtarik, P., & Safaryan, M. (2023). On biased compression for distributed learning. Journal of Machine Learning Research. Journal of Machine Learning Research.
[Published Version]
View
| Files available
| WoS
| arXiv
2023 |
Published |
Conference Paper |
IST-REx-ID: 15363 |
Safaryan, M., Krumes, A., & Alistarh, D.-A. (2023). Knowledge distillation performs partial variance reduction. In 36th Conference on Neural Information Processing Systems (Vol. 36). New Orleans, LA, United States.
[Published Version]
View
| Files available
| arXiv