--- res: bibo_abstract: - "Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with N (N being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width N following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used LeCun’s initialization and obtain an over-parameterization requirement for the single wide layer of order N2.\r\n@eng" bibo_authorlist: - foaf_Person: foaf_givenName: Quynh foaf_name: Nguyen, Quynh foaf_surname: Nguyen - foaf_Person: foaf_givenName: Marco foaf_name: Mondelli, Marco foaf_surname: Mondelli foaf_workInfoHomepage: http://www.librecat.org/personId=27EB676C-8706-11E9-9510-7717E6697425 orcid: 0000-0002-3242-7020 bibo_volume: 33 dct_date: 2020^xs_gYear dct_language: eng dct_publisher: Curran Associates@ dct_title: Global convergence of deep networks with one wide layer followed by pyramidal topology@ ...