{"publication_identifier":{"isbn":["9781713829546"]},"publication":"Advances in Neural Information Processing Systems","project":[{"grant_number":"805223","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Adaptive gradient quantization for data-parallel SGD","publication_status":"published","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2010.12460"}],"conference":{"name":"NeurIPS: Neural Information Processing Systems","end_date":"2020-12-12","location":"Vancouver, Canada","start_date":"2020-12-06"},"month":"12","article_processing_charge":"No","type":"conference","abstract":[{"text":"Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.\r\n\r\n","lang":"eng"}],"_id":"15086","department":[{"_id":"DaAl"}],"volume":33,"year":"2020","oa_version":"Preprint","language":[{"iso":"eng"}],"alternative_title":["NeurIPS"],"author":[{"last_name":"Faghri","full_name":"Faghri, Fartash ","first_name":"Fartash "},{"full_name":"Tabrizian, Iman ","first_name":"Iman ","last_name":"Tabrizian"},{"last_name":"Markov","full_name":"Markov, Ilia","first_name":"Ilia","id":"D0CF4148-C985-11E9-8066-0BDEE5697425"},{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","last_name":"Alistarh"},{"full_name":"Roy, Daniel ","first_name":"Daniel ","last_name":"Roy"},{"full_name":"Ramezani-Kebrya, Ali ","first_name":"Ali ","last_name":"Ramezani-Kebrya"}],"acknowledgement":"The authors would like to thank Blair Bilodeau, David Fleet, Mufan Li, and Jeffrey Negrea for\r\nhelpful discussions. FF was supported by OGS Scholarship. DA and IM were supported the\r\nEuropean Research Council (ERC) under the European Union’s Horizon 2020 research and innovation\r\nprogramme (grant agreement No 805223 ScaleML). DMR was supported by an NSERC Discovery\r\nGrant. ARK was supported by NSERC Postdoctoral Fellowship. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.","intvolume":" 33","date_created":"2024-03-06T08:35:58Z","external_id":{"arxiv":["2010.12460"]},"status":"public","ec_funded":1,"oa":1,"citation":{"mla":"Faghri, Fartash, et al. “Adaptive Gradient Quantization for Data-Parallel SGD.” Advances in Neural Information Processing Systems, vol. 33, Neural Information Processing Systems Foundation, 2020.","ista":"Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. 2020. Adaptive gradient quantization for data-parallel SGD. Advances in Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, vol. 33.","chicago":"Faghri, Fartash , Iman Tabrizian, Ilia Markov, Dan-Adrian Alistarh, Daniel Roy, and Ali Ramezani-Kebrya. “Adaptive Gradient Quantization for Data-Parallel SGD.” In Advances in Neural Information Processing Systems, Vol. 33. Neural Information Processing Systems Foundation, 2020.","ama":"Faghri F, Tabrizian I, Markov I, Alistarh D-A, Roy D, Ramezani-Kebrya A. Adaptive gradient quantization for data-parallel SGD. In: Advances in Neural Information Processing Systems. Vol 33. Neural Information Processing Systems Foundation; 2020.","short":"F. Faghri, I. Tabrizian, I. Markov, D.-A. Alistarh, D. Roy, A. Ramezani-Kebrya, in:, Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2020.","ieee":"F. Faghri, I. Tabrizian, I. Markov, D.-A. Alistarh, D. Roy, and A. Ramezani-Kebrya, “Adaptive gradient quantization for data-parallel SGD,” in Advances in Neural Information Processing Systems, Vancouver, Canada, 2020, vol. 33.","apa":"Faghri, F., Tabrizian, I., Markov, I., Alistarh, D.-A., Roy, D., & Ramezani-Kebrya, A. (2020). Adaptive gradient quantization for data-parallel SGD. In Advances in Neural Information Processing Systems (Vol. 33). Vancouver, Canada: Neural Information Processing Systems Foundation."},"date_published":"2020-12-10T00:00:00Z","publisher":"Neural Information Processing Systems Foundation","date_updated":"2024-04-29T07:14:13Z","quality_controlled":"1","day":"10"}