Analysis of a two-layer neural network via displacement convexity

Javanmard, Adel; Mondelli, Marco; Montanari, Andrea

Analysis of a two-layer neural network via displacement convexity

Javanmard A, Mondelli M, Montanari A. 2020. Analysis of a two-layer neural network via displacement convexity. Annals of Statistics. 48(6), 3619–3642.

Download (ext.)

https://arxiv.org/abs/1901.01375 [Preprint]

DOI

10.1214/20-AOS1945

Journal Article | Published | English

Scopus indexed

Author

Javanmard, Adel; Mondelli, Marco^ISTA ; Montanari, Andrea

Department

Mondelli Group

Abstract

Fitting a function by using linear combinations of a large number N of `simple' components is one of the most fruitful ideas in statistical learning. This idea lies at the core of a variety of methods, from two-layer neural networks to kernel regression, to boosting. In general, the resulting risk minimization problem is non-convex and is solved by gradient descent or its variants. Unfortunately, little is known about global convergence properties of these approaches. Here we consider the problem of learning a concave function f on a compact convex domain Ω⊆ℝd, using linear combinations of `bump-like' components (neurons). The parameters to be fitted are the centers of N bumps, and the resulting empirical risk minimization problem is highly non-convex. We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over Ω. Further, when the bump width δ tends to 0, this gradient flow has a limit which is a viscous porous medium equation. Remarkably, the cost function optimized by this gradient flow exhibits a special property known as displacement convexity, which implies exponential convergence rates for N→∞, δ→0. Surprisingly, this asymptotic theory appears to capture well the behavior for moderate values of δ,N. Explaining this phenomenon, and understanding the dependence on δ,N in a quantitative manner remains an outstanding challenge.

Publishing Year

2020

Date Published

2020-12-11

Journal Title

Annals of Statistics

Publisher

Institute of Mathematical Statistics

Volume

Issue

Page

3619-3642

ISSN

1932-6157

eISSN

1941-7330

IST-REx-ID

6748

Cite this

Javanmard A, Mondelli M, Montanari A. Analysis of a two-layer neural network via displacement convexity. Annals of Statistics. 2020;48(6):3619-3642. doi:10.1214/20-AOS1945

Javanmard, A., Mondelli, M., & Montanari, A. (2020). Analysis of a two-layer neural network via displacement convexity. Annals of Statistics. Institute of Mathematical Statistics. https://doi.org/10.1214/20-AOS1945

Javanmard, Adel, Marco Mondelli, and Andrea Montanari. “Analysis of a Two-Layer Neural Network via Displacement Convexity.” Annals of Statistics. Institute of Mathematical Statistics, 2020. https://doi.org/10.1214/20-AOS1945.

A. Javanmard, M. Mondelli, and A. Montanari, “Analysis of a two-layer neural network via displacement convexity,” Annals of Statistics, vol. 48, no. 6. Institute of Mathematical Statistics, pp. 3619–3642, 2020.

Javanmard A, Mondelli M, Montanari A. 2020. Analysis of a two-layer neural network via displacement convexity. Annals of Statistics. 48(6), 3619–3642.

Javanmard, Adel, et al. “Analysis of a Two-Layer Neural Network via Displacement Convexity.” Annals of Statistics, vol. 48, no. 6, Institute of Mathematical Statistics, 2020, pp. 3619–42, doi:10.1214/20-AOS1945.

All files available under the following license(s):

Copyright Statement: