---
OA_place: repository
_id: '17465'
abstract:
- lang: eng
  text: "In the modern age of machine learning, artificial neural networks have become
    an integral part\r\nof many practical systems. One of the key ingredients of the
    success of the deep learning\r\napproach is recent computational advances which
    allowed the training of models with billions\r\nof parameters on large-scale data.
    Such over-parameterized and data-hungry regimes pose a\r\nchallenge for the theoretical
    analysis of modern models since “classical” statistical wisdom\r\nis no longer
    applicable. In this view, it is paramount to extend or develop new machinery\r\nthat
    will allow tackling the neural network analysis under new challenging asymptotic
    regimes,\r\nwhich is the focus of this thesis.\r\nLarge neural network systems
    are usually optimized via “local” search algorithms, such\r\nas stochastic gradient
    descent (SGD). However, given the high-dimensional nature of the\r\nparameter
    space, it is a priori not clear why such a crude “local” approach works so remarkably\r\nwell
    in practice. We take a step towards demystifying this phenomenon by showing that\r\nthe
    landscape of the SGD training dynamics exhibits a few beneficial properties for
    the\r\noptimization. First, we show that along the SGD trajectory an over-parameterized
    network\r\nis dropout stable. The emergence of dropout stability allows to conclude
    that the minima\r\nfound by SGD are connected via a continuous path of small loss.
    This in turn means that\r\nthe high-dimensional landscape of the neural network
    optimization problem is provably not so\r\nunfavourable to gradient-based training,
    due to mode connectivity. Next, we show that SGD\r\nfor an over-parameterized
    network tends to find solutions that are functionally more “simple”.\r\nThis in
    turn means that the SGD minima are more robust, since a less complicated solution\r\nwill
    less likely overfit the data. More formally, for a prototypical example of a wide
    two-layer\r\nReLU network on a 1d regression task we show that the SGD algorithm
    is implicitly selective in\r\nits choice of an interpolating solution. Namely,
    at convergence the neural network implements\r\na piece-wise linear function with
    the number of linear regions depending only on the amount\r\nof training data.
    This is in contrast to a “smooth”-like behaviour which one would expect\r\ngiven
    such a severe over-parameterization of the model.\r\nDiverging from the generic
    supervised setting of classification and regression problems, we\r\nanalyze an
    auto-encoder model that is commonly used for representation learning and data\r\ncompression.
    Despite the wide applicability of the auto-encoding paradigm, the theoretical\r\nunderstanding
    of their behaviour is limited even in the simplistic shallow case. The related\r\nwork
    is restricted to extreme asymptotic regimes in which the auto-encoder is either
    severely\r\nover-parameterized or under-parameterized. In contrast, we provide
    a tight characterization\r\nfor the 1-bit compression of Gaussian signals in the
    challenging proportional regime, i.e., the\r\ninput dimension and the size of
    the compressed representation obey the same asymptotics.\r\nWe also show that
    gradient-based methods are able to find a globally optimal solution and\r\nthat
    the predictions made for Gaussian data extrapolate beyond - to the case of compression\r\nof
    natural images. Next, we relax the Gaussian assumption and study more structured
    input\r\nsources. We show that the shallow model is sometimes agnostic to the
    structure of the data\r\nvii\r\nwhich results in a Gaussian-like behaviour. We
    prove that making the decoding component\r\nslightly less shallow is already enough
    to escape the “curse” of Gaussian performance.\r\n"
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
citation:
  ama: Shevchenko A. High-dimensional limits in artificial neural networks. 2024.
    doi:<a href="https://doi.org/10.15479/at:ista:17465">10.15479/at:ista:17465</a>
  apa: Shevchenko, A. (2024). <i>High-dimensional limits in artificial neural networks</i>.
    Institute of Science and Technology Austria. <a href="https://doi.org/10.15479/at:ista:17465">https://doi.org/10.15479/at:ista:17465</a>
  chicago: Shevchenko, Alexander. “High-Dimensional Limits in Artificial Neural Networks.”
    Institute of Science and Technology Austria, 2024. <a href="https://doi.org/10.15479/at:ista:17465">https://doi.org/10.15479/at:ista:17465</a>.
  ieee: A. Shevchenko, “High-dimensional limits in artificial neural networks,” Institute
    of Science and Technology Austria, 2024.
  ista: Shevchenko A. 2024. High-dimensional limits in artificial neural networks.
    Institute of Science and Technology Austria.
  mla: Shevchenko, Alexander. <i>High-Dimensional Limits in Artificial Neural Networks</i>.
    Institute of Science and Technology Austria, 2024, doi:<a href="https://doi.org/10.15479/at:ista:17465">10.15479/at:ista:17465</a>.
  short: A. Shevchenko, High-Dimensional Limits in Artificial Neural Networks, Institute
    of Science and Technology Austria, 2024.
corr_author: '1'
date_created: 2024-08-28T15:14:25Z
date_published: 2024-08-29T00:00:00Z
date_updated: 2025-04-25T10:32:06Z
day: '29'
ddc:
- '519'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: MaMo
doi: 10.15479/at:ista:17465
file:
- access_level: open_access
  checksum: da6dd3166078934577f6af93d27000e2
  content_type: application/pdf
  creator: ashevche
  date_created: 2024-09-02T09:23:32Z
  date_updated: 2024-10-05T22:30:05Z
  embargo: 2024-10-04
  file_id: '17482'
  file_name: thesis_a2b.pdf
  file_size: 4468610
  relation: main_file
- access_level: closed
  checksum: 76a39ef252239560923cdda4ce0a31a4
  content_type: application/zip
  creator: ashevche
  date_created: 2024-09-02T09:23:46Z
  date_updated: 2024-10-05T22:30:05Z
  embargo_to: open_access
  file_id: '17483'
  file_name: Thesis Alex - ISTA.zip
  file_size: 15930999
  relation: source_file
file_date_updated: 2024-10-05T22:30:05Z
has_accepted_license: '1'
language:
- iso: eng
month: '08'
oa: 1
oa_version: Published Version
page: '232'
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
  grant_number: W1260-N35
  name: Vienna Graduate School on Computational Optimization
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '11420'
    relation: part_of_dissertation
    status: public
  - id: '17469'
    relation: part_of_dissertation
    status: public
  - id: '14459'
    relation: part_of_dissertation
    status: public
  - id: '9198'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: High-dimensional limits in artificial neural networks
type: dissertation
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2024'
...
---
_id: '17469'
abstract:
- lang: eng
  text: 'Autoencoders are a prominent model in many empirical branches of machine
    learning and lossy data compression. However, basic theoretical questions remain
    unanswered even in a shallow two-layer setting. In particular, to what degree
    does a shallow autoencoder capture the structure of the underlying data distribution?
    For the prototypical case of the 1-bit compression of sparse Gaussian data, we
    prove that gradient descent converges to a solution that completely disregards
    the sparse structure of the input. Namely, the performance of the algorithm is
    the same as if it was compressing a Gaussian source - with no sparsity. For general
    data distributions, we give evidence of a phase transition phenomenon in the shape
    of the gradient descent minimizer, as a function of the data sparsity: below the
    critical sparsity level, the minimizer is a rotation taken uniformly at random
    (just like in the compression of non-sparse data); above the critical sparsity,
    the minimizer is the identity (up to a permutation). Finally, by exploiting a
    connection with approximate message passing algorithms, we show how to improve
    upon Gaussian performance for the compression of sparse data: adding a denoising
    function to a shallow architecture already reduces the loss provably, and a suitable
    multi-layer decoder leads to a further improvement. We validate our findings on
    image datasets, such as CIFAR-10 and MNIST.'
acknowledgement: "Kevin Kogler, Alexander Shevchenko and Marco Mondelli are supported
  by the 2019 Lopez-Loreta Prize. Hamed\r\nHassani acknowledges the support by the
  NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data
  Science (EnCORE)."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Kevin
  full_name: Kögler, Kevin
  id: 94ec913c-dc85-11ea-9058-e5051ab2428b
  last_name: Kögler
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Hamed
  full_name: Hassani, Hamed
  last_name: Hassani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Kögler K, Shevchenko A, Hassani H, Mondelli M. Compression of structured data
    with autoencoders: Provable benefit of nonlinearities and depth. In: <i>Proceedings
    of the 41st International Conference on Machine Learning</i>. Vol 235. ML Research
    Press; 2024:24964-25015.'
  apa: 'Kögler, K., Shevchenko, A., Hassani, H., &#38; Mondelli, M. (2024). Compression
    of structured data with autoencoders: Provable benefit of nonlinearities and depth.
    In <i>Proceedings of the 41st International Conference on Machine Learning</i>
    (Vol. 235, pp. 24964–25015). Vienna, Austria: ML Research Press.'
  chicago: 'Kögler, Kevin, Alexander Shevchenko, Hamed Hassani, and Marco Mondelli.
    “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities
    and Depth.” In <i>Proceedings of the 41st International Conference on Machine
    Learning</i>, 235:24964–15. ML Research Press, 2024.'
  ieee: 'K. Kögler, A. Shevchenko, H. Hassani, and M. Mondelli, “Compression of structured
    data with autoencoders: Provable benefit of nonlinearities and depth,” in <i>Proceedings
    of the 41st International Conference on Machine Learning</i>, Vienna, Austria,
    2024, vol. 235, pp. 24964–25015.'
  ista: 'Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured
    data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings
    of the 41st International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 235, 24964–25015.'
  mla: 'Kögler, Kevin, et al. “Compression of Structured Data with Autoencoders: Provable
    Benefit of Nonlinearities and Depth.” <i>Proceedings of the 41st International
    Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 24964–5015.'
  short: K. Kögler, A. Shevchenko, H. Hassani, M. Mondelli, in:, Proceedings of the
    41st International Conference on Machine Learning, ML Research Press, 2024, pp.
    24964–25015.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2024-08-29T11:47:57Z
date_published: 2024-07-01T00:00:00Z
date_updated: 2026-06-15T22:30:06Z
day: '01'
department:
- _id: DaAl
- _id: MaMo
external_id:
  arxiv:
  - '2402.05013'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.mlr.press/v235/kogler24a.html
month: '07'
oa: 1
oa_version: Published Version
page: 24964-25015
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 41st International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17465'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: 'Compression of structured data with autoencoders: Provable benefit of nonlinearities
  and depth'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
_id: '14459'
abstract:
- lang: eng
  text: Autoencoders are a popular model in many branches of machine learning and
    lossy data compression. However, their fundamental limits, the performance of
    gradient methods and the features learnt during optimization remain poorly understood,
    even in the two-layer setting. In fact, earlier work has considered either linear
    autoencoders or specific training regimes (leading to vanishing or diverging compression
    rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders
    trained in the challenging proportional regime in which the input dimension scales
    linearly with the size of the representation. Our results characterize the minimizers
    of the population risk, and show that such minimizers are achieved by gradient
    methods; their structure is also unveiled, thus leading to a concise description
    of the features obtained via training. For the special case of a sign activation
    function, our analysis establishes the fundamental limits for the lossy compression
    of Gaussian sources via (shallow) autoencoders. Finally, while the results are
    proved for Gaussian data, numerical simulations on standard datasets display the
    universality of the theoretical predictions.
acknowledgement: Aleksandr Shevchenko, Kevin Kogler and Marco Mondelli are supported
  by the 2019 Lopez-Loreta Prize. Hamed Hassani acknowledges the support by the NSF
  CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science
  (EnCORE).
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Kevin
  full_name: Kögler, Kevin
  id: 94ec913c-dc85-11ea-9058-e5051ab2428b
  last_name: Kögler
- first_name: Hamed
  full_name: Hassani, Hamed
  last_name: Hassani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Shevchenko A, Kögler K, Hassani H, Mondelli M. Fundamental limits of two-layer
    autoencoders, and achieving them with gradient methods. In: <i>Proceedings of
    the 40th International Conference on Machine Learning</i>. Vol 202. ML Research
    Press; 2023:31151-31209.'
  apa: 'Shevchenko, A., Kögler, K., Hassani, H., &#38; Mondelli, M. (2023). Fundamental
    limits of two-layer autoencoders, and achieving them with gradient methods. In
    <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol.
    202, pp. 31151–31209). Honolulu, Hawaii, HI, United States: ML Research Press.'
  chicago: Shevchenko, Alexander, Kevin Kögler, Hamed Hassani, and Marco Mondelli.
    “Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient
    Methods.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    202:31151–209. ML Research Press, 2023.
  ieee: A. Shevchenko, K. Kögler, H. Hassani, and M. Mondelli, “Fundamental limits
    of two-layer autoencoders, and achieving them with gradient methods,” in <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii,
    HI, United States, 2023, vol. 202, pp. 31151–31209.
  ista: 'Shevchenko A, Kögler K, Hassani H, Mondelli M. 2023. Fundamental limits of
    two-layer autoencoders, and achieving them with gradient methods. Proceedings
    of the 40th International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 202, 31151–31209.'
  mla: Shevchenko, Alexander, et al. “Fundamental Limits of Two-Layer Autoencoders,
    and Achieving Them with Gradient Methods.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 31151–209.
  short: A. Shevchenko, K. Kögler, H. Hassani, M. Mondelli, in:, Proceedings of the
    40th International Conference on Machine Learning, ML Research Press, 2023, pp.
    31151–31209.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
corr_author: '1'
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2026-06-15T22:30:06Z
day: '30'
department:
- _id: MaMo
- _id: DaAl
external_id:
  arxiv:
  - '2212.13468'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2212.13468
month: '07'
oa: 1
oa_version: Preprint
page: 31151-31209
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17465'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: Fundamental limits of two-layer autoencoders, and achieving them with gradient
  methods
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '11420'
abstract:
- lang: eng
  text: 'Understanding the properties of neural networks trained via stochastic gradient
    descent (SGD) is at the heart of the theory of deep learning. In this work, we
    take a mean-field view, and consider a two-layer ReLU network trained via noisy-SGD
    for a univariate regularized regression problem. Our main result is that SGD with
    vanishingly small noise injected in the gradients is biased towards a simple solution:
    at convergence, the ReLU network implements a piecewise linear map of the inputs,
    and the number of “knot” points -- i.e., points where the tangent of the ReLU
    network estimator changes -- between two consecutive training inputs is at most
    three. In particular, as the number of neurons of the network grows, the SGD dynamics
    is captured by the solution of a gradient flow and, at convergence, the distribution
    of the weights approaches the unique minimizer of a related free energy, which
    has a Gibbs form. Our key technical contribution consists in the analysis of the
    estimator resulting from this minimizer: we show that its second derivative vanishes
    everywhere, except at some specific locations which represent the “knot” points.
    We also provide empirical evidence that knots at locations distinct from the data
    points might occur, as predicted by our theory.'
acknowledgement: "We would like to thank Mert Pilanci for several exploratory discussions
  in the early stage\r\nof the project, Jan Maas for clarifications about Jordan et
  al. (1998), and Max Zimmer for\r\nsuggestive numerical experiments. A. Shevchenko
  and M. Mondelli are partially supported\r\nby the 2019 Lopez-Loreta Prize. V. Kungurtsev
  acknowledges support to the OP VVV\r\nproject CZ.02.1.01/0.0/0.0/16 019/0000765
  Research Center for Informatics.\r\n"
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Vyacheslav
  full_name: Kungurtsev, Vyacheslav
  last_name: Kungurtsev
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Shevchenko A, Kungurtsev V, Mondelli M. Mean-field analysis of piecewise linear
    solutions for wide ReLU networks. <i>Journal of Machine Learning Research</i>.
    2022;23(130):1-55.
  apa: Shevchenko, A., Kungurtsev, V., &#38; Mondelli, M. (2022). Mean-field analysis
    of piecewise linear solutions for wide ReLU networks. <i>Journal of Machine Learning
    Research</i>. Journal of Machine Learning Research.
  chicago: Shevchenko, Alexander, Vyacheslav Kungurtsev, and Marco Mondelli. “Mean-Field
    Analysis of Piecewise Linear Solutions for Wide ReLU Networks.” <i>Journal of
    Machine Learning Research</i>. Journal of Machine Learning Research, 2022.
  ieee: A. Shevchenko, V. Kungurtsev, and M. Mondelli, “Mean-field analysis of piecewise
    linear solutions for wide ReLU networks,” <i>Journal of Machine Learning Research</i>,
    vol. 23, no. 130. Journal of Machine Learning Research, pp. 1–55, 2022.
  ista: Shevchenko A, Kungurtsev V, Mondelli M. 2022. Mean-field analysis of piecewise
    linear solutions for wide ReLU networks. Journal of Machine Learning Research.
    23(130), 1–55.
  mla: Shevchenko, Alexander, et al. “Mean-Field Analysis of Piecewise Linear Solutions
    for Wide ReLU Networks.” <i>Journal of Machine Learning Research</i>, vol. 23,
    no. 130, Journal of Machine Learning Research, 2022, pp. 1–55.
  short: A. Shevchenko, V. Kungurtsev, M. Mondelli, Journal of Machine Learning Research
    23 (2022) 1–55.
corr_author: '1'
date_created: 2022-05-29T22:01:54Z
date_published: 2022-04-01T00:00:00Z
date_updated: 2026-06-15T22:30:06Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
- _id: DaAl
external_id:
  arxiv:
  - '2111.02278'
file:
- access_level: open_access
  checksum: d4ff5d1affb34848b5c5e4002483fc62
  content_type: application/pdf
  creator: cchlebak
  date_created: 2022-05-30T08:22:55Z
  date_updated: 2022-05-30T08:22:55Z
  file_id: '11422'
  file_name: 21-1365.pdf
  file_size: 1521701
  relation: main_file
  success: 1
file_date_updated: 2022-05-30T08:22:55Z
has_accepted_license: '1'
intvolume: '        23'
issue: '130'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
page: 1-55
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Journal of Machine Learning Research
publication_identifier:
  eissn:
  - 1533-7928
  issn:
  - 1532-4435
publication_status: published
publisher: Journal of Machine Learning Research
quality_controlled: '1'
related_material:
  link:
  - relation: other
    url: https://www.jmlr.org/papers/v23/21-1365.html
  record:
  - id: '17465'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: Mean-field analysis of piecewise linear solutions for wide ReLU networks
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
volume: 23
year: '2022'
...
---
_id: '9198'
abstract:
- lang: eng
  text: "The optimization of multilayer neural networks typically leads to a solution\r\nwith
    zero training error, yet the landscape can exhibit spurious local minima\r\nand
    the minima can be disconnected. In this paper, we shed light on this\r\nphenomenon:
    we show that the combination of stochastic gradient descent (SGD)\r\nand over-parameterization
    makes the landscape of multilayer neural networks\r\napproximately connected and
    thus more favorable to optimization. More\r\nspecifically, we prove that SGD solutions
    are connected via a piecewise linear\r\npath, and the increase in loss along this
    path vanishes as the number of\r\nneurons grows large. This result is a consequence
    of the fact that the\r\nparameters found by SGD are increasingly dropout stable
    as the network becomes\r\nwider. We show that, if we remove part of the neurons
    (and suitably rescale the\r\nremaining ones), the change in loss is independent
    of the total number of\r\nneurons, and it depends only on how many neurons are
    left. Our results exhibit\r\na mild dependence on the input dimension: they are
    dimension-free for two-layer\r\nnetworks and depend linearly on the dimension
    for multilayer networks. We\r\nvalidate our theoretical findings with numerical
    experiments for different\r\narchitectures and classification tasks."
acknowledgement: M. Mondelli was partially supported by the 2019 LopezLoreta Prize.
  The authors thank Phan-Minh Nguyen for helpful discussions and the IST Distributed
  Algorithms and Systems Lab for providing computational resources.
article_processing_charge: No
arxiv: 1
author:
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Shevchenko A, Mondelli M. Landscape connectivity and dropout stability of
    SGD solutions for over-parameterized neural networks. In: <i>Proceedings of the
    37th International Conference on Machine Learning</i>. Vol 119. ML Research Press;
    2020:8773-8784.'
  apa: Shevchenko, A., &#38; Mondelli, M. (2020). Landscape connectivity and dropout
    stability of SGD solutions for over-parameterized neural networks. In <i>Proceedings
    of the 37th International Conference on Machine Learning</i> (Vol. 119, pp. 8773–8784).
    ML Research Press.
  chicago: Shevchenko, Aleksandr, and Marco Mondelli. “Landscape Connectivity and
    Dropout Stability of SGD Solutions for Over-Parameterized Neural Networks.” In
    <i>Proceedings of the 37th International Conference on Machine Learning</i>, 119:8773–84.
    ML Research Press, 2020.
  ieee: A. Shevchenko and M. Mondelli, “Landscape connectivity and dropout stability
    of SGD solutions for over-parameterized neural networks,” in <i>Proceedings of
    the 37th International Conference on Machine Learning</i>, 2020, vol. 119, pp.
    8773–8784.
  ista: Shevchenko A, Mondelli M. 2020. Landscape connectivity and dropout stability
    of SGD solutions for over-parameterized neural networks. Proceedings of the 37th
    International Conference on Machine Learning. vol. 119, 8773–8784.
  mla: Shevchenko, Aleksandr, and Marco Mondelli. “Landscape Connectivity and Dropout
    Stability of SGD Solutions for Over-Parameterized Neural Networks.” <i>Proceedings
    of the 37th International Conference on Machine Learning</i>, vol. 119, ML Research
    Press, 2020, pp. 8773–84.
  short: A. Shevchenko, M. Mondelli, in:, Proceedings of the 37th International Conference
    on Machine Learning, ML Research Press, 2020, pp. 8773–8784.
date_created: 2021-02-25T09:36:22Z
date_published: 2020-07-13T00:00:00Z
date_updated: 2026-06-15T22:30:06Z
day: '13'
ddc:
- '000'
department:
- _id: MaMo
- _id: DaAl
external_id:
  arxiv:
  - '1912.10095'
file:
- access_level: open_access
  checksum: f042c8d4316bd87c6361aa76f1fbdbbe
  content_type: application/pdf
  creator: dernst
  date_created: 2021-03-02T15:38:14Z
  date_updated: 2021-03-02T15:38:14Z
  file_id: '9217'
  file_name: 2020_PMLR_Shevchenko.pdf
  file_size: 5336380
  relation: main_file
  success: 1
file_date_updated: 2021-03-02T15:38:14Z
has_accepted_license: '1'
intvolume: '       119'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 8773-8784
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 37th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17465'
    relation: dissertation_contains
    status: public
status: public
title: Landscape connectivity and dropout stability of SGD solutions for over-parameterized
  neural networks
type: conference
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
volume: 119
year: '2020'
...