{"day":"13","author":[{"full_name":"Shevchenko, Alexander","first_name":"Alexander","last_name":"Shevchenko"},{"last_name":"Mondelli","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","full_name":"Mondelli, Marco"}],"article_processing_charge":"No","file_date_updated":"2021-03-02T15:38:14Z","_id":"9198","oa":1,"file":[{"date_updated":"2021-03-02T15:38:14Z","file_name":"2020_PMLR_Shevchenko.pdf","access_level":"open_access","creator":"dernst","relation":"main_file","file_size":5336380,"success":1,"file_id":"9217","content_type":"application/pdf","checksum":"f042c8d4316bd87c6361aa76f1fbdbbe","date_created":"2021-03-02T15:38:14Z"}],"volume":119,"project":[{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"}],"publication":"Proceedings of the 37th International Conference on Machine Learning","page":"8773-8784","citation":{"short":"A. Shevchenko, M. Mondelli, in:, Proceedings of the 37th International Conference on Machine Learning, ML Research Press, 2020, pp. 8773–8784.","chicago":"Shevchenko, Alexander, and Marco Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” In <i>Proceedings of the 37th International Conference on Machine Learning</i>, 119:8773–84. ML Research Press, 2020.","apa":"Shevchenko, A., &#38; Mondelli, M. (2020). Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. In <i>Proceedings of the 37th International Conference on Machine Learning</i> (Vol. 119, pp. 8773–8784). ML Research Press.","ama":"Shevchenko A, Mondelli M. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. In: <i>Proceedings of the 37th International Conference on Machine Learning</i>. Vol 119. ML Research Press; 2020:8773-8784.","ista":"Shevchenko A, Mondelli M. 2020. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. Proceedings of the 37th International Conference on Machine Learning. vol. 119, 8773–8784.","mla":"Shevchenko, Alexander, and Marco Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” <i>Proceedings of the 37th International Conference on Machine Learning</i>, vol. 119, ML Research Press, 2020, pp. 8773–84.","ieee":"A. Shevchenko and M. Mondelli, “Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks,” in <i>Proceedings of the 37th International Conference on Machine Learning</i>, 2020, vol. 119, pp. 8773–8784."},"status":"public","department":[{"_id":"MaMo"}],"type":"conference","ddc":["000"],"has_accepted_license":"1","intvolume":"       119","title":"Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks","publisher":"ML Research Press","publication_status":"published","date_updated":"2023-10-17T12:43:19Z","quality_controlled":"1","abstract":[{"lang":"eng","text":"The optimization of multilayer neural networks typically leads to a solution\r\nwith zero training error, yet the landscape can exhibit spurious local minima\r\nand the minima can be disconnected. In this paper, we shed light on this\r\nphenomenon: we show that the combination of stochastic gradient descent (SGD)\r\nand over-parameterization makes the landscape of multilayer neural networks\r\napproximately connected and thus more favorable to optimization. More\r\nspecifically, we prove that SGD solutions are connected via a piecewise linear\r\npath, and the increase in loss along this path vanishes as the number of\r\nneurons grows large. This result is a consequence of the fact that the\r\nparameters found by SGD are increasingly dropout stable as the network becomes\r\nwider. We show that, if we remove part of the neurons (and suitably rescale the\r\nremaining ones), the change in loss is independent of the total number of\r\nneurons, and it depends only on how many neurons are left. Our results exhibit\r\na mild dependence on the input dimension: they are dimension-free for two-layer\r\nnetworks and depend linearly on the dimension for multilayer networks. We\r\nvalidate our theoretical findings with numerical experiments for different\r\narchitectures and classification tasks."}],"date_published":"2020-07-13T00:00:00Z","language":[{"iso":"eng"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","date_created":"2021-02-25T09:36:22Z","external_id":{"arxiv":["1912.10095"]},"acknowledgement":"M. Mondelli was partially supported by the 2019 LopezLoreta Prize. The authors thank Phan-Minh Nguyen for helpful discussions and the IST Distributed Algorithms and Systems Lab for providing computational resources.","year":"2020","oa_version":"Published Version","month":"07"}