High-dimensional limits in artificial neural networks
Shevchenko A. 2024. High-dimensional limits in artificial neural networks. Institute of Science and Technology Austria.
Download
Thesis
| PhD
| Published
| English
Author
Supervisor
Corresponding author has ISTA affiliation
Department
Series Title
ISTA Thesis
Abstract
In the modern age of machine learning, artificial neural networks have become an integral part
of many practical systems. One of the key ingredients of the success of the deep learning
approach is recent computational advances which allowed the training of models with billions
of parameters on large-scale data. Such over-parameterized and data-hungry regimes pose a
challenge for the theoretical analysis of modern models since “classical” statistical wisdom
is no longer applicable. In this view, it is paramount to extend or develop new machinery
that will allow tackling the neural network analysis under new challenging asymptotic regimes,
which is the focus of this thesis.
Large neural network systems are usually optimized via “local” search algorithms, such
as stochastic gradient descent (SGD). However, given the high-dimensional nature of the
parameter space, it is a priori not clear why such a crude “local” approach works so remarkably
well in practice. We take a step towards demystifying this phenomenon by showing that
the landscape of the SGD training dynamics exhibits a few beneficial properties for the
optimization. First, we show that along the SGD trajectory an over-parameterized network
is dropout stable. The emergence of dropout stability allows to conclude that the minima
found by SGD are connected via a continuous path of small loss. This in turn means that
the high-dimensional landscape of the neural network optimization problem is provably not so
unfavourable to gradient-based training, due to mode connectivity. Next, we show that SGD
for an over-parameterized network tends to find solutions that are functionally more “simple”.
This in turn means that the SGD minima are more robust, since a less complicated solution
will less likely overfit the data. More formally, for a prototypical example of a wide two-layer
ReLU network on a 1d regression task we show that the SGD algorithm is implicitly selective in
its choice of an interpolating solution. Namely, at convergence the neural network implements
a piece-wise linear function with the number of linear regions depending only on the amount
of training data. This is in contrast to a “smooth”-like behaviour which one would expect
given such a severe over-parameterization of the model.
Diverging from the generic supervised setting of classification and regression problems, we
analyze an auto-encoder model that is commonly used for representation learning and data
compression. Despite the wide applicability of the auto-encoding paradigm, the theoretical
understanding of their behaviour is limited even in the simplistic shallow case. The related
work is restricted to extreme asymptotic regimes in which the auto-encoder is either severely
over-parameterized or under-parameterized. In contrast, we provide a tight characterization
for the 1-bit compression of Gaussian signals in the challenging proportional regime, i.e., the
input dimension and the size of the compressed representation obey the same asymptotics.
We also show that gradient-based methods are able to find a globally optimal solution and
that the predictions made for Gaussian data extrapolate beyond - to the case of compression
of natural images. Next, we relax the Gaussian assumption and study more structured input
sources. We show that the shallow model is sometimes agnostic to the structure of the data
vii
which results in a Gaussian-like behaviour. We prove that making the decoding component
slightly less shallow is already enough to escape the “curse” of Gaussian performance.
Publishing Year
Date Published
2024-08-29
Publisher
Institute of Science and Technology Austria
Acknowledged SSUs
Page
232
ISSN
IST-REx-ID
Cite this
Shevchenko A. High-dimensional limits in artificial neural networks. 2024. doi:10.15479/at:ista:17465
Shevchenko, A. (2024). High-dimensional limits in artificial neural networks. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:17465
Shevchenko, Alexander. “High-Dimensional Limits in Artificial Neural Networks.” Institute of Science and Technology Austria, 2024. https://doi.org/10.15479/at:ista:17465.
A. Shevchenko, “High-dimensional limits in artificial neural networks,” Institute of Science and Technology Austria, 2024.
Shevchenko A. 2024. High-dimensional limits in artificial neural networks. Institute of Science and Technology Austria.
Shevchenko, Alexander. High-Dimensional Limits in Artificial Neural Networks. Institute of Science and Technology Austria, 2024, doi:10.15479/at:ista:17465.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
thesis_a2b.pdf
4.47 MB
Access Level
Open Access
Date Uploaded
2024-09-02
Embargo End Date
2024-10-04
MD5 Checksum
da6dd3166078934577f6af93d27000e2
Source File
File Name
Thesis Alex - ISTA.zip
15.93 MB
Access Level
Closed Access
Date Uploaded
2024-09-02
MD5 Checksum
76a39ef252239560923cdda4ce0a31a4
Material in ISTA:
Part of this Dissertation
Part of this Dissertation
Part of this Dissertation
Part of this Dissertation