[{"date_updated":"2025-04-25T10:32:06Z","abstract":[{"lang":"eng","text":"In the modern age of machine learning, artificial neural networks have become an integral part\r\nof many practical systems. One of the key ingredients of the success of the deep learning\r\napproach is recent computational advances which allowed the training of models with billions\r\nof parameters on large-scale data. Such over-parameterized and data-hungry regimes pose a\r\nchallenge for the theoretical analysis of modern models since “classical” statistical wisdom\r\nis no longer applicable. In this view, it is paramount to extend or develop new machinery\r\nthat will allow tackling the neural network analysis under new challenging asymptotic regimes,\r\nwhich is the focus of this thesis.\r\nLarge neural network systems are usually optimized via “local” search algorithms, such\r\nas stochastic gradient descent (SGD). However, given the high-dimensional nature of the\r\nparameter space, it is a priori not clear why such a crude “local” approach works so remarkably\r\nwell in practice. We take a step towards demystifying this phenomenon by showing that\r\nthe landscape of the SGD training dynamics exhibits a few beneficial properties for the\r\noptimization. First, we show that along the SGD trajectory an over-parameterized network\r\nis dropout stable. The emergence of dropout stability allows to conclude that the minima\r\nfound by SGD are connected via a continuous path of small loss. This in turn means that\r\nthe high-dimensional landscape of the neural network optimization problem is provably not so\r\nunfavourable to gradient-based training, due to mode connectivity. Next, we show that SGD\r\nfor an over-parameterized network tends to find solutions that are functionally more “simple”.\r\nThis in turn means that the SGD minima are more robust, since a less complicated solution\r\nwill less likely overfit the data. More formally, for a prototypical example of a wide two-layer\r\nReLU network on a 1d regression task we show that the SGD algorithm is implicitly selective in\r\nits choice of an interpolating solution. Namely, at convergence the neural network implements\r\na piece-wise linear function with the number of linear regions depending only on the amount\r\nof training data. This is in contrast to a “smooth”-like behaviour which one would expect\r\ngiven such a severe over-parameterization of the model.\r\nDiverging from the generic supervised setting of classification and regression problems, we\r\nanalyze an auto-encoder model that is commonly used for representation learning and data\r\ncompression. Despite the wide applicability of the auto-encoding paradigm, the theoretical\r\nunderstanding of their behaviour is limited even in the simplistic shallow case. The related\r\nwork is restricted to extreme asymptotic regimes in which the auto-encoder is either severely\r\nover-parameterized or under-parameterized. In contrast, we provide a tight characterization\r\nfor the 1-bit compression of Gaussian signals in the challenging proportional regime, i.e., the\r\ninput dimension and the size of the compressed representation obey the same asymptotics.\r\nWe also show that gradient-based methods are able to find a globally optimal solution and\r\nthat the predictions made for Gaussian data extrapolate beyond - to the case of compression\r\nof natural images. Next, we relax the Gaussian assumption and study more structured input\r\nsources. We show that the shallow model is sometimes agnostic to the structure of the data\r\nvii\r\nwhich results in a Gaussian-like behaviour. We prove that making the decoding component\r\nslightly less shallow is already enough to escape the “curse” of Gaussian performance.\r\n"}],"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"},{"_id":"9B9290DE-BA93-11EA-9121-9846C619BF3A","name":"Vienna Graduate School on Computational Optimization","grant_number":"W1260-N35"}],"date_created":"2024-08-28T15:14:25Z","title":"High-dimensional limits in artificial neural networks","author":[{"last_name":"Shevchenko","first_name":"Aleksandr","full_name":"Shevchenko, Aleksandr","id":"F2B06EC2-C99E-11E9-89F0-752EE6697425"}],"date_published":"2024-08-29T00:00:00Z","corr_author":"1","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","publication_status":"published","file_date_updated":"2024-10-05T22:30:05Z","acknowledged_ssus":[{"_id":"ScienComp"}],"publisher":"Institute of Science and Technology Austria","supervisor":[{"full_name":"Mondelli, Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","last_name":"Mondelli","orcid":"0000-0002-3242-7020"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","first_name":"Dan-Adrian","last_name":"Alistarh"}],"citation":{"short":"A. Shevchenko, High-Dimensional Limits in Artificial Neural Networks, Institute of Science and Technology Austria, 2024.","mla":"Shevchenko, Alexander. <i>High-Dimensional Limits in Artificial Neural Networks</i>. Institute of Science and Technology Austria, 2024, doi:<a href=\"https://doi.org/10.15479/at:ista:17465\">10.15479/at:ista:17465</a>.","ieee":"A. Shevchenko, “High-dimensional limits in artificial neural networks,” Institute of Science and Technology Austria, 2024.","ista":"Shevchenko A. 2024. High-dimensional limits in artificial neural networks. Institute of Science and Technology Austria.","chicago":"Shevchenko, Alexander. “High-Dimensional Limits in Artificial Neural Networks.” Institute of Science and Technology Austria, 2024. <a href=\"https://doi.org/10.15479/at:ista:17465\">https://doi.org/10.15479/at:ista:17465</a>.","apa":"Shevchenko, A. (2024). <i>High-dimensional limits in artificial neural networks</i>. Institute of Science and Technology Austria. <a href=\"https://doi.org/10.15479/at:ista:17465\">https://doi.org/10.15479/at:ista:17465</a>","ama":"Shevchenko A. High-dimensional limits in artificial neural networks. 2024. doi:<a href=\"https://doi.org/10.15479/at:ista:17465\">10.15479/at:ista:17465</a>"},"ddc":["519"],"day":"29","_id":"17465","article_processing_charge":"No","oa_version":"Published Version","alternative_title":["ISTA Thesis"],"department":[{"_id":"GradSch"},{"_id":"DaAl"},{"_id":"MaMo"}],"month":"08","doi":"10.15479/at:ista:17465","page":"232","has_accepted_license":"1","related_material":{"record":[{"relation":"part_of_dissertation","status":"public","id":"11420"},{"status":"public","id":"17469","relation":"part_of_dissertation"},{"relation":"part_of_dissertation","id":"14459","status":"public"},{"id":"9198","status":"public","relation":"part_of_dissertation"}]},"year":"2024","oa":1,"language":[{"iso":"eng"}],"file":[{"date_created":"2024-09-02T09:23:32Z","relation":"main_file","content_type":"application/pdf","checksum":"da6dd3166078934577f6af93d27000e2","file_size":4468610,"file_name":"thesis_a2b.pdf","date_updated":"2024-10-05T22:30:05Z","embargo":"2024-10-04","file_id":"17482","access_level":"open_access","creator":"ashevche"},{"file_name":"Thesis Alex - ISTA.zip","date_updated":"2024-10-05T22:30:05Z","file_size":15930999,"content_type":"application/zip","checksum":"76a39ef252239560923cdda4ce0a31a4","relation":"source_file","date_created":"2024-09-02T09:23:46Z","embargo_to":"open_access","creator":"ashevche","access_level":"closed","file_id":"17483"}],"publication_identifier":{"issn":["2663-337X"]},"status":"public","OA_place":"repository","type":"dissertation","degree_awarded":"PhD"},{"oa":1,"year":"2024","language":[{"iso":"eng"}],"related_material":{"record":[{"relation":"dissertation_contains","status":"public","id":"17465"}]},"conference":{"name":"ICML: International Conference on Machine Learning","end_date":"2024-07-27","location":"Vienna, Austria","start_date":"2024-07-21"},"page":"24964-25015","month":"07","arxiv":1,"acknowledgement":"Kevin Kogler, Alexander Shevchenko and Marco Mondelli are supported by the 2019 Lopez-Loreta Prize. Hamed\r\nHassani acknowledges the support by the NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science (EnCORE).","type":"conference","volume":235,"status":"public","publication_status":"published","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","publication":"Proceedings of the 41st International Conference on Machine Learning","date_published":"2024-07-01T00:00:00Z","corr_author":"1","project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"title":"Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth","author":[{"first_name":"Kevin","last_name":"Kögler","id":"94ec913c-dc85-11ea-9058-e5051ab2428b","full_name":"Kögler, Kevin"},{"full_name":"Shevchenko, Aleksandr","id":"F2B06EC2-C99E-11E9-89F0-752EE6697425","last_name":"Shevchenko","first_name":"Aleksandr"},{"full_name":"Hassani, Hamed","last_name":"Hassani","first_name":"Hamed"},{"full_name":"Mondelli, Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","last_name":"Mondelli","first_name":"Marco","orcid":"0000-0002-3242-7020"}],"external_id":{"arxiv":["2402.05013"]},"date_created":"2024-08-29T11:47:57Z","main_file_link":[{"url":"https://proceedings.mlr.press/v235/kogler24a.html","open_access":"1"}],"date_updated":"2026-06-15T22:30:06Z","abstract":[{"text":"Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.","lang":"eng"}],"scopus_import":"1","alternative_title":["PMLR"],"department":[{"_id":"DaAl"},{"_id":"MaMo"}],"intvolume":"       235","day":"01","article_processing_charge":"No","oa_version":"Published Version","_id":"17469","citation":{"chicago":"Kögler, Kevin, Alexander Shevchenko, Hamed Hassani, and Marco Mondelli. “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth.” In <i>Proceedings of the 41st International Conference on Machine Learning</i>, 235:24964–15. ML Research Press, 2024.","apa":"Kögler, K., Shevchenko, A., Hassani, H., &#38; Mondelli, M. (2024). Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. In <i>Proceedings of the 41st International Conference on Machine Learning</i> (Vol. 235, pp. 24964–25015). Vienna, Austria: ML Research Press.","ama":"Kögler K, Shevchenko A, Hassani H, Mondelli M. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. In: <i>Proceedings of the 41st International Conference on Machine Learning</i>. Vol 235. ML Research Press; 2024:24964-25015.","ieee":"K. Kögler, A. Shevchenko, H. Hassani, and M. Mondelli, “Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth,” in <i>Proceedings of the 41st International Conference on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 24964–25015.","short":"K. Kögler, A. Shevchenko, H. Hassani, M. Mondelli, in:, Proceedings of the 41st International Conference on Machine Learning, ML Research Press, 2024, pp. 24964–25015.","mla":"Kögler, Kevin, et al. “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth.” <i>Proceedings of the 41st International Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 24964–5015.","ista":"Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 24964–25015."},"quality_controlled":"1","publisher":"ML Research Press"},{"external_id":{"arxiv":["2212.13468"]},"date_created":"2023-10-29T23:01:17Z","author":[{"last_name":"Shevchenko","first_name":"Aleksandr","id":"F2B06EC2-C99E-11E9-89F0-752EE6697425","full_name":"Shevchenko, Aleksandr"},{"id":"94ec913c-dc85-11ea-9058-e5051ab2428b","full_name":"Kögler, Kevin","last_name":"Kögler","first_name":"Kevin"},{"full_name":"Hassani, Hamed","first_name":"Hamed","last_name":"Hassani"},{"first_name":"Marco","last_name":"Mondelli","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","full_name":"Mondelli, Marco"}],"title":"Fundamental limits of two-layer autoencoders, and achieving them with gradient methods","project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"abstract":[{"lang":"eng","text":"Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions."}],"date_updated":"2026-06-15T22:30:06Z","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2212.13468","open_access":"1"}],"publication":"Proceedings of the 40th International Conference on Machine Learning","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","publication_status":"published","corr_author":"1","date_published":"2023-07-30T00:00:00Z","citation":{"apa":"Shevchenko, A., Kögler, K., Hassani, H., &#38; Mondelli, M. (2023). Fundamental limits of two-layer autoencoders, and achieving them with gradient methods. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 31151–31209). Honolulu, Hawaii, HI, United States: ML Research Press.","ama":"Shevchenko A, Kögler K, Hassani H, Mondelli M. Fundamental limits of two-layer autoencoders, and achieving them with gradient methods. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:31151-31209.","chicago":"Shevchenko, Alexander, Kevin Kögler, Hamed Hassani, and Marco Mondelli. “Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient Methods.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:31151–209. ML Research Press, 2023.","ista":"Shevchenko A, Kögler K, Hassani H, Mondelli M. 2023. Fundamental limits of two-layer autoencoders, and achieving them with gradient methods. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 31151–31209.","ieee":"A. Shevchenko, K. Kögler, H. Hassani, and M. Mondelli, “Fundamental limits of two-layer autoencoders, and achieving them with gradient methods,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 31151–31209.","mla":"Shevchenko, Alexander, et al. “Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient Methods.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 31151–209.","short":"A. Shevchenko, K. Kögler, H. Hassani, M. Mondelli, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 31151–31209."},"quality_controlled":"1","publisher":"ML Research Press","intvolume":"       202","alternative_title":["PMLR"],"department":[{"_id":"MaMo"},{"_id":"DaAl"}],"scopus_import":"1","_id":"14459","article_processing_charge":"No","oa_version":"Preprint","day":"30","month":"07","arxiv":1,"language":[{"iso":"eng"}],"year":"2023","oa":1,"page":"31151-31209","related_material":{"record":[{"relation":"dissertation_contains","id":"17465","status":"public"}]},"conference":{"location":"Honolulu, Hawaii, HI, United States","end_date":"2023-07-29","name":"ICML: International Conference on Machine Learning","start_date":"2023-07-23"},"volume":202,"status":"public","publication_identifier":{"eissn":["2640-3498"]},"acknowledgement":"Aleksandr Shevchenko, Kevin Kogler and Marco Mondelli are supported by the 2019 Lopez-Loreta Prize. Hamed Hassani acknowledges the support by the NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science (EnCORE).","type":"conference"},{"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"title":"Mean-field analysis of piecewise linear solutions for wide ReLU networks","author":[{"id":"F2B06EC2-C99E-11E9-89F0-752EE6697425","full_name":"Shevchenko, Aleksandr","first_name":"Aleksandr","last_name":"Shevchenko"},{"first_name":"Vyacheslav","last_name":"Kungurtsev","full_name":"Kungurtsev, Vyacheslav"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","last_name":"Mondelli","first_name":"Marco"}],"external_id":{"arxiv":["2111.02278"]},"date_created":"2022-05-29T22:01:54Z","date_updated":"2026-06-15T22:30:06Z","abstract":[{"text":"Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via noisy-SGD for a univariate regularized regression problem. Our main result is that SGD with vanishingly small noise injected in the gradients is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of “knot” points -- i.e., points where the tangent of the ReLU network estimator changes -- between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in the analysis of the estimator resulting from this minimizer: we show that its second derivative vanishes everywhere, except at some specific locations which represent the “knot” points. We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory.","lang":"eng"}],"publication_status":"published","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","publication":"Journal of Machine Learning Research","file_date_updated":"2022-05-30T08:22:55Z","date_published":"2022-04-01T00:00:00Z","corr_author":"1","quality_controlled":"1","citation":{"chicago":"Shevchenko, Alexander, Vyacheslav Kungurtsev, and Marco Mondelli. “Mean-Field Analysis of Piecewise Linear Solutions for Wide ReLU Networks.” <i>Journal of Machine Learning Research</i>. Journal of Machine Learning Research, 2022.","apa":"Shevchenko, A., Kungurtsev, V., &#38; Mondelli, M. (2022). Mean-field analysis of piecewise linear solutions for wide ReLU networks. <i>Journal of Machine Learning Research</i>. Journal of Machine Learning Research.","ama":"Shevchenko A, Kungurtsev V, Mondelli M. Mean-field analysis of piecewise linear solutions for wide ReLU networks. <i>Journal of Machine Learning Research</i>. 2022;23(130):1-55.","short":"A. Shevchenko, V. Kungurtsev, M. Mondelli, Journal of Machine Learning Research 23 (2022) 1–55.","mla":"Shevchenko, Alexander, et al. “Mean-Field Analysis of Piecewise Linear Solutions for Wide ReLU Networks.” <i>Journal of Machine Learning Research</i>, vol. 23, no. 130, Journal of Machine Learning Research, 2022, pp. 1–55.","ieee":"A. Shevchenko, V. Kungurtsev, and M. Mondelli, “Mean-field analysis of piecewise linear solutions for wide ReLU networks,” <i>Journal of Machine Learning Research</i>, vol. 23, no. 130. Journal of Machine Learning Research, pp. 1–55, 2022.","ista":"Shevchenko A, Kungurtsev V, Mondelli M. 2022. Mean-field analysis of piecewise linear solutions for wide ReLU networks. Journal of Machine Learning Research. 23(130), 1–55."},"ddc":["000"],"publisher":"Journal of Machine Learning Research","article_type":"original","scopus_import":"1","department":[{"_id":"MaMo"},{"_id":"DaAl"}],"intvolume":"        23","day":"01","oa_version":"Published Version","article_processing_charge":"No","_id":"11420","month":"04","arxiv":1,"oa":1,"year":"2022","tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","short":"CC BY (4.0)","image":"/images/cc_by.png","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode"},"file":[{"access_level":"open_access","success":1,"creator":"cchlebak","file_id":"11422","checksum":"d4ff5d1affb34848b5c5e4002483fc62","file_size":1521701,"content_type":"application/pdf","date_updated":"2022-05-30T08:22:55Z","file_name":"21-1365.pdf","date_created":"2022-05-30T08:22:55Z","relation":"main_file"}],"language":[{"iso":"eng"}],"related_material":{"record":[{"id":"17465","status":"public","relation":"dissertation_contains"}],"link":[{"relation":"other","url":"https://www.jmlr.org/papers/v23/21-1365.html"}]},"page":"1-55","has_accepted_license":"1","issue":"130","volume":23,"publication_identifier":{"issn":["1532-4435"],"eissn":["1533-7928"]},"status":"public","acknowledgement":"We would like to thank Mert Pilanci for several exploratory discussions in the early stage\r\nof the project, Jan Maas for clarifications about Jordan et al. (1998), and Max Zimmer for\r\nsuggestive numerical experiments. A. Shevchenko and M. Mondelli are partially supported\r\nby the 2019 Lopez-Loreta Prize. V. Kungurtsev acknowledges support to the OP VVV\r\nproject CZ.02.1.01/0.0/0.0/16 019/0000765 Research Center for Informatics.\r\n","type":"journal_article"},{"year":"2020","oa":1,"language":[{"iso":"eng"}],"file":[{"success":1,"creator":"dernst","access_level":"open_access","file_id":"9217","content_type":"application/pdf","file_size":5336380,"checksum":"f042c8d4316bd87c6361aa76f1fbdbbe","file_name":"2020_PMLR_Shevchenko.pdf","date_updated":"2021-03-02T15:38:14Z","relation":"main_file","date_created":"2021-03-02T15:38:14Z"}],"has_accepted_license":"1","related_material":{"record":[{"status":"public","id":"17465","relation":"dissertation_contains"}]},"page":"8773-8784","month":"07","arxiv":1,"acknowledgement":"M. Mondelli was partially supported by the 2019 LopezLoreta Prize. The authors thank Phan-Minh Nguyen for helpful discussions and the IST Distributed Algorithms and Systems Lab for providing computational resources.","type":"conference","volume":119,"status":"public","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","publication_status":"published","file_date_updated":"2021-03-02T15:38:14Z","publication":"Proceedings of the 37th International Conference on Machine Learning","date_published":"2020-07-13T00:00:00Z","project":[{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"}],"external_id":{"arxiv":["1912.10095"]},"date_created":"2021-02-25T09:36:22Z","author":[{"full_name":"Shevchenko, Aleksandr","id":"F2B06EC2-C99E-11E9-89F0-752EE6697425","first_name":"Aleksandr","last_name":"Shevchenko"},{"first_name":"Marco","last_name":"Mondelli","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","full_name":"Mondelli, Marco"}],"title":"Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks","date_updated":"2026-06-15T22:30:06Z","abstract":[{"text":"The optimization of multilayer neural networks typically leads to a solution\r\nwith zero training error, yet the landscape can exhibit spurious local minima\r\nand the minima can be disconnected. In this paper, we shed light on this\r\nphenomenon: we show that the combination of stochastic gradient descent (SGD)\r\nand over-parameterization makes the landscape of multilayer neural networks\r\napproximately connected and thus more favorable to optimization. More\r\nspecifically, we prove that SGD solutions are connected via a piecewise linear\r\npath, and the increase in loss along this path vanishes as the number of\r\nneurons grows large. This result is a consequence of the fact that the\r\nparameters found by SGD are increasingly dropout stable as the network becomes\r\nwider. We show that, if we remove part of the neurons (and suitably rescale the\r\nremaining ones), the change in loss is independent of the total number of\r\nneurons, and it depends only on how many neurons are left. Our results exhibit\r\na mild dependence on the input dimension: they are dimension-free for two-layer\r\nnetworks and depend linearly on the dimension for multilayer networks. We\r\nvalidate our theoretical findings with numerical experiments for different\r\narchitectures and classification tasks.","lang":"eng"}],"intvolume":"       119","department":[{"_id":"MaMo"},{"_id":"DaAl"}],"day":"13","_id":"9198","oa_version":"Published Version","article_processing_charge":"No","citation":{"ama":"Shevchenko A, Mondelli M. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. In: <i>Proceedings of the 37th International Conference on Machine Learning</i>. Vol 119. ML Research Press; 2020:8773-8784.","apa":"Shevchenko, A., &#38; Mondelli, M. (2020). Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. In <i>Proceedings of the 37th International Conference on Machine Learning</i> (Vol. 119, pp. 8773–8784). ML Research Press.","chicago":"Shevchenko, Aleksandr, and Marco Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” In <i>Proceedings of the 37th International Conference on Machine Learning</i>, 119:8773–84. ML Research Press, 2020.","ista":"Shevchenko A, Mondelli M. 2020. Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks. Proceedings of the 37th International Conference on Machine Learning. vol. 119, 8773–8784.","mla":"Shevchenko, Aleksandr, and Marco Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” <i>Proceedings of the 37th International Conference on Machine Learning</i>, vol. 119, ML Research Press, 2020, pp. 8773–84.","short":"A. Shevchenko, M. Mondelli, in:, Proceedings of the 37th International Conference on Machine Learning, ML Research Press, 2020, pp. 8773–8784.","ieee":"A. Shevchenko and M. Mondelli, “Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks,” in <i>Proceedings of the 37th International Conference on Machine Learning</i>, 2020, vol. 119, pp. 8773–8784."},"quality_controlled":"1","ddc":["000"],"publisher":"ML Research Press"}]
