{"department":[{"_id":"DaAl"}],"date_published":"2024-12-20T00:00:00Z","project":[{"_id":"fc2ed2f7-9c52-11eb-aca3-c01059dda49c","name":"IST-BRIDGE: International postdoctoral program","grant_number":"101034413","call_identifier":"H2020"}],"quality_controlled":"1","volume":37,"related_material":{"link":[{"relation":"software","url":"https://github.com/IST-DASLab/MicroAdam"}]},"date_created":"2025-04-06T22:01:32Z","month":"12","year":"2024","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","type":"conference","status":"public","publication_identifier":{"issn":["1049-5258"]},"abstract":[{"text":"We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called\r\nMICROADAM that specifically minimizes memory overheads, while maintaining\r\ntheoretical convergence guarantees. We achieve this by compressing the gradient\r\ninformation before it is fed into the optimizer state, thereby reducing its memory\r\nfootprint significantly. We control the resulting compression error via a novel\r\ninstance of the classical error feedback mechanism from distributed optimization [Seide et al., 2014, Alistarh et al., 2018, Karimireddy et al., 2019] in which\r\nthe error correction information is itself compressed to allow for practical memory\r\ngains. We prove that the resulting approach maintains theoretical convergence\r\nguarantees competitive to those of AMSGrad, while providing good practical performance. Specifically, we show that MICROADAM can be implemented efficiently\r\non GPUs: on both million-scale (BERT) and billion-scale (LLaMA) models, MICROADAM provides practical convergence competitive to that of the uncompressed\r\nAdam baseline, with lower memory usage and similar running time. Our code is\r\navailable at https://github.com/IST-DASLab/MicroAdam.","lang":"eng"}],"oa_version":"Preprint","title":"MICROADAM: Accurate adaptive optimization with low space overhead and provable convergence","external_id":{"arxiv":["2405.15593"]},"publication":"38th Conference on Neural Information Processing Systems","_id":"19510","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2405.15593"}],"scopus_import":"1","author":[{"id":"449f7a18-f128-11eb-9611-9b430c0c6333","full_name":"Modoranu, Ionut-Vlad","first_name":"Ionut-Vlad","last_name":"Modoranu"},{"id":"dd546b39-0804-11ed-9c55-ef075c39778d","full_name":"Safaryan, Mher","first_name":"Mher","last_name":"Safaryan"},{"first_name":"Grigory","full_name":"Malinovsky, Grigory","last_name":"Malinovsky"},{"last_name":"Kurtic","first_name":"Eldar","full_name":"Kurtic, Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218"},{"first_name":"Thomas","full_name":"Robert, Thomas","id":"de632733-1457-11f0-ae22-b5914b8c1c41","last_name":"Robert"},{"last_name":"Richtárik","full_name":"Richtárik, Peter","first_name":"Peter"},{"last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian"}],"oa":1,"acknowledged_ssus":[{"_id":"CampIT"}],"OA_place":"repository","language":[{"iso":"eng"}],"alternative_title":["Advances in Neural Information Processing Systems"],"corr_author":"1","publisher":"Neural Information Processing Systems Foundation","arxiv":1,"date_updated":"2025-04-14T07:54:58Z","intvolume":" 37","article_processing_charge":"No","day":"20","acknowledgement":"The authors thank Razvan Pascanu, Mahdi Nikdan and Soroush Tabesh for their valuable feedback, the IT department from Institute of Science and Technology Austria for the hardware support and Weights and Biases for the infrastructure to track all our experiments. Mher Safaryan has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 101034413.","OA_type":"green","publication_status":"published","ec_funded":1}