---
_id: '13146'
abstract:
- lang: eng
  text: 'A recent line of work has analyzed the theoretical properties of deep neural
    networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue
    of the NTK has been related to the memorization capacity, the global convergence
    of gradient descent algorithms and the generalization of deep nets. However, existing
    results either provide bounds in the two-layer setting or assume that the spectrum
    of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper,
    we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU
    nets, both in the limiting case of infinite widths and for finite widths. In the
    finite-width setting, the network architectures we consider are fairly general:
    we require the existence of a wide layer with roughly order of N neurons, N being
    the number of data samples; and the scaling of the remaining layer widths is arbitrary
    (up to logarithmic factors). To obtain our results, we analyze various quantities
    of independent interest: we give lower bounds on the smallest singular value of
    hidden feature matrices, and upper bounds on the Lipschitz constant of input-output
    feature maps.'
acknowledgement: The authors would like to thank the anonymous reviewers for their
  helpful comments. MM was partially supported by the 2019 Lopez-Loreta Prize. QN
  and GM acknowledge support from the European Research Council (ERC) under the European
  Union’s Horizon 2020 research and innovation programme (grant agreement no 757983).
article_processing_charge: No
arxiv: 1
author:
- first_name: Quynh
  full_name: Nguyen, Quynh
  last_name: Nguyen
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Guido
  full_name: Montufar, Guido
  last_name: Montufar
citation:
  ama: 'Nguyen Q, Mondelli M, Montufar G. Tight bounds on the smallest Eigenvalue
    of the neural tangent kernel for deep ReLU networks. In: <i>Proceedings of the
    38th International Conference on Machine Learning</i>. Vol 139. ML Research Press;
    2021:8119-8129.'
  apa: 'Nguyen, Q., Mondelli, M., &#38; Montufar, G. (2021). Tight bounds on the smallest
    Eigenvalue of the neural tangent kernel for deep ReLU networks. In <i>Proceedings
    of the 38th International Conference on Machine Learning</i> (Vol. 139, pp. 8119–8129).
    Virtual: ML Research Press.'
  chicago: Nguyen, Quynh, Marco Mondelli, and Guido Montufar. “Tight Bounds on the
    Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks.” In <i>Proceedings
    of the 38th International Conference on Machine Learning</i>, 139:8119–29. ML
    Research Press, 2021.
  ieee: Q. Nguyen, M. Mondelli, and G. Montufar, “Tight bounds on the smallest Eigenvalue
    of the neural tangent kernel for deep ReLU networks,” in <i>Proceedings of the
    38th International Conference on Machine Learning</i>, Virtual, 2021, vol. 139,
    pp. 8119–8129.
  ista: 'Nguyen Q, Mondelli M, Montufar G. 2021. Tight bounds on the smallest Eigenvalue
    of the neural tangent kernel for deep ReLU networks. Proceedings of the 38th International
    Conference on Machine Learning. ICML: International Conference on Machine Learning
    vol. 139, 8119–8129.'
  mla: Nguyen, Quynh, et al. “Tight Bounds on the Smallest Eigenvalue of the Neural
    Tangent Kernel for Deep ReLU Networks.” <i>Proceedings of the 38th International
    Conference on Machine Learning</i>, vol. 139, ML Research Press, 2021, pp. 8119–29.
  short: Q. Nguyen, M. Mondelli, G. Montufar, in:, Proceedings of the 38th International
    Conference on Machine Learning, ML Research Press, 2021, pp. 8119–8129.
conference:
  end_date: 2021-07-24
  location: Virtual
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2021-07-18
date_created: 2023-06-18T22:00:48Z
date_published: 2021-07-01T00:00:00Z
date_updated: 2025-07-10T11:50:36Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2012.11654'
file:
- access_level: open_access
  checksum: 19489cf5e16a0596b1f92e317d97c9b0
  content_type: application/pdf
  creator: dernst
  date_created: 2023-06-19T10:49:12Z
  date_updated: 2023-06-19T10:49:12Z
  file_id: '13155'
  file_name: 2021_PMLR_Nguyen.pdf
  file_size: 591332
  relation: main_file
  success: 1
file_date_updated: 2023-06-19T10:49:12Z
has_accepted_license: '1'
intvolume: '       139'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 8119-8129
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 38th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
  isbn:
  - '9781713845065'
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep
  ReLU networks
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 139
year: '2021'
...
---
_id: '13147'
abstract:
- lang: eng
  text: "We investigate fast and communication-efficient algorithms for the classic
    problem of minimizing a sum of strongly convex and smooth functions that are distributed
    among n\r\n different nodes, which can communicate using a limited number of bits.
    Most previous communication-efficient approaches for this problem are limited
    to first-order optimization, and therefore have \\emph{linear} dependence on the
    condition number in their communication complexity. We show that this dependence
    is not inherent: communication-efficient methods can in fact have sublinear dependence
    on the condition number. For this, we design and analyze the first communication-efficient
    distributed variants of preconditioned gradient descent for Generalized Linear
    Models, and for Newton’s method. Our results rely on a new technique for quantizing
    both the preconditioner and the descent direction at each step of the algorithms,
    while controlling their convergence rate. We also validate our findings experimentally,
    showing faster convergence and reduced communication relative to previous methods."
acknowledgement: The authors would like to thank Janne Korhonen, Aurelien Lucchi,
  Celestine MendlerDunner and Antonio Orvieto for helpful discussions. FA ¨and DA
  were supported during this work by the European Research Council (ERC) under the
  European Union’s Horizon 2020 research and innovation programme (grant agreement
  No 805223 ScaleML). PD was supported by the European Union’s Horizon 2020 programme
  under the Marie Skłodowska-Curie grant agreement No. 754411.
article_processing_charge: No
arxiv: 1
author:
- first_name: Foivos
  full_name: Alimisis, Foivos
  last_name: Alimisis
- first_name: Peter
  full_name: Davies, Peter
  id: 11396234-BB50-11E9-B24C-90FCE5697425
  last_name: Davies
  orcid: 0000-0002-5646-9524
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Alimisis F, Davies P, Alistarh D-A. Communication-efficient distributed optimization
    with quantized preconditioners. In: <i>Proceedings of the 38th International Conference
    on Machine Learning</i>. Vol 139. ML Research Press; 2021:196-206.'
  apa: 'Alimisis, F., Davies, P., &#38; Alistarh, D.-A. (2021). Communication-efficient
    distributed optimization with quantized preconditioners. In <i>Proceedings of
    the 38th International Conference on Machine Learning</i> (Vol. 139, pp. 196–206).
    Virtual: ML Research Press.'
  chicago: Alimisis, Foivos, Peter Davies, and Dan-Adrian Alistarh. “Communication-Efficient
    Distributed Optimization with Quantized Preconditioners.” In <i>Proceedings of
    the 38th International Conference on Machine Learning</i>, 139:196–206. ML Research
    Press, 2021.
  ieee: F. Alimisis, P. Davies, and D.-A. Alistarh, “Communication-efficient distributed
    optimization with quantized preconditioners,” in <i>Proceedings of the 38th International
    Conference on Machine Learning</i>, Virtual, 2021, vol. 139, pp. 196–206.
  ista: 'Alimisis F, Davies P, Alistarh D-A. 2021. Communication-efficient distributed
    optimization with quantized preconditioners. Proceedings of the 38th International
    Conference on Machine Learning. ICML: International Conference on Machine Learning
    vol. 139, 196–206.'
  mla: Alimisis, Foivos, et al. “Communication-Efficient Distributed Optimization
    with Quantized Preconditioners.” <i>Proceedings of the 38th International Conference
    on Machine Learning</i>, vol. 139, ML Research Press, 2021, pp. 196–206.
  short: F. Alimisis, P. Davies, D.-A. Alistarh, in:, Proceedings of the 38th International
    Conference on Machine Learning, ML Research Press, 2021, pp. 196–206.
conference:
  end_date: 2021-07-24
  location: Virtual
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2021-07-18
corr_author: '1'
date_created: 2023-06-18T22:00:48Z
date_published: 2021-07-01T00:00:00Z
date_updated: 2025-07-10T11:50:37Z
day: '01'
ddc:
- '000'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2102.07214'
file:
- access_level: open_access
  checksum: 7ec0d59bac268b49c76bf2e036dedd7a
  content_type: application/pdf
  creator: dernst
  date_created: 2023-06-19T10:41:05Z
  date_updated: 2023-06-19T10:41:05Z
  file_id: '13154'
  file_name: 2021_PMLR_Alimisis.pdf
  file_size: 429087
  relation: main_file
  success: 1
file_date_updated: 2023-06-19T10:41:05Z
has_accepted_license: '1'
intvolume: '       139'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 196-206
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
- _id: 260C2330-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '754411'
  name: ISTplus - Postdoctoral Fellowships
publication: Proceedings of the 38th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
  isbn:
  - '9781713845065'
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Communication-efficient distributed optimization with quantized preconditioners
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 139
year: '2021'
...
