---
OA_place: publisher
OA_type: gold
_id: '21324'
abstract:
- lang: eng
  text: Learning models have been shown to rely on spurious correlations between non-predictive
    features and the associated labels in the training data, with negative implications
    on robustness, bias and fairness. In this work, we provide a statistical characterization
    of this phenomenon for high-dimensional regression, when the data contains a predictive
    core feature x and a spurious feature y. Specifically, we quantify the amount
    of spurious correlations C learned via linear regression, in terms of the data
    covariance and the strength λ of the ridge regularization. As a consequence, we
    first capture the simplicity of y through the spectrum of its covariance, and
    its correlation with x through the Schur complement of the full data covariance.
    Next, we prove a trade-off between C and the in-distribution test loss L, by showing
    that the value of λ that minimizes L lies in an interval where C is increasing.
    Finally, we investigate the effects of over-parameterization via the random features
    model, by showing its equivalence to regularized linear regression. Our theoretical
    results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10
    datasets.
acknowledgement: Marco Mondelli is funded by the European Union (ERC, INF2, project
  number 101161364). Views and opinions expressed are however those of the author(s)
  only and do not necessarily reflect those of the European Union or the European
  Research Council Executive Agency. Neither the European Union nor the granting authority
  can be held responsible for them. Simone Bombari is supported by a Google PhD fellowship.
  The authors would like to thank GuanWen Qiu for helpful discussions.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. Spurious correlations in high dimensional regression:
    The roles of regularization, simplicity bias and over-parameterization. In: <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research
    Press; 2025:4839-4873.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2025). Spurious correlations in high dimensional
    regression: The roles of regularization, simplicity bias and over-parameterization.
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>
    (Vol. 267, pp. 4839–4873). Vancouver, Canada: ML Research Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional
    Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.”
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>,
    267:4839–73. ML Research Press, 2025.'
  ieee: 'S. Bombari and M. Mondelli, “Spurious correlations in high dimensional regression:
    The roles of regularization, simplicity bias and over-parameterization,” in <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada,
    2025, vol. 267, pp. 4839–4873.'
  ista: 'Bombari S, Mondelli M. 2025. Spurious correlations in high dimensional regression:
    The roles of regularization, simplicity bias and over-parameterization. Proceedings
    of the 42nd International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 267, 4839–4873.'
  mla: 'Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional
    Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.”
    <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol.
    267, ML Research Press, 2025, pp. 4839–73.'
  short: S. Bombari, M. Mondelli, in:, Proceedings of the 42nd International Conference
    on Machine Learning, ML Research Press, 2025, pp. 4839–4873.
conference:
  end_date: 2025-07-19
  location: Vancouver, Canada
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2025-07-13
corr_author: '1'
date_created: 2026-02-18T11:58:00Z
date_published: 2025-07-30T00:00:00Z
date_updated: 2026-02-19T08:08:55Z
day: '30'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2502.01347'
file:
- access_level: open_access
  checksum: d4ba4f7717b362ca38878f45e57bd643
  content_type: application/pdf
  creator: dernst
  date_created: 2026-02-19T08:04:38Z
  date_updated: 2026-02-19T08:04:38Z
  file_id: '21335'
  file_name: 2025_ICML_Bombari.pdf
  file_size: 887526
  relation: main_file
  success: 1
file_date_updated: 2026-02-19T08:04:38Z
has_accepted_license: '1'
intvolume: '       267'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 4839-4873
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
- _id: 92099302-16d5-11f0-9cad-f9a785f54fbd
  name: 'Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust
    LLMs'
publication: Proceedings of the 42nd International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: 'Spurious correlations in high dimensional regression: The roles of regularization,
  simplicity bias and over-parameterization'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 267
year: '2025'
...
---
APC_amount: 2754,32 EUR
OA_place: publisher
OA_type: hybrid
_id: '19627'
abstract:
- lang: eng
  text: Differentially private gradient descent (DP-GD) is a popular algorithm to
    train deep learning models with provable guarantees on the privacy of the training
    data. In the last decade, the problem of understanding its performance cost with
    respect to standard GD has received remarkable attention from the research community,
    which formally derived upper bounds on the excess population risk  RP  in different
    learning settings. However, existing bounds typically degrade with over-parameterization,
    i.e., as the number of parameters  p  gets larger than the number of training
    samples  n  -- a regime which is ubiquitous in current deep-learning practice.
    As a result, the lack of theoretical insights leaves practitioners without clear
    guidance, leading some to reduce the effective number of trainable parameters
    to improve performance, while others use larger models to achieve better results
    through scale. In this work, we show that in the popular random features model
    with quadratic loss, for any sufficiently large  p , privacy can be obtained for
    free, i.e.,  |RP|=o(1) , not only when the privacy parameter  ε  has constant
    order, but also in the strongly private setting  ε=o(1) . This challenges the
    common wisdom that over-parameterization inherently hinders performance in private
    learning.
acknowledgement: This research was funded in whole, or in part, by the Austrian Science
  Fund (FWF) Grant number COE 12. For the purpose of open access, the author has applied
  a CC BY public copyright license to any Author Accepted Manuscript version arising
  from this submission. The authors were also supported by the 2019 Lopez-Loreta prize,
  and Simone Bombari was supported by a Google PhD fellowship. We thank Diyuan Wu,
  Edwige Cyffers, Francesco Pedrotti, Inbar Seroussi, Nikita P. Kalinin, Pietro Pelliconi,
  Roodabeh Safavi, Yizhe Zhu, and Zhichao Wang for helpful discussions.
article_number: e2423072122
article_processing_charge: Yes (in subscription journal)
article_type: original
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Bombari S, Mondelli M. Privacy for free in the overparameterized regime. <i>Proceedings
    of the National Academy of Sciences</i>. 2025;122(15). doi:<a href="https://doi.org/10.1073/pnas.2423072122">10.1073/pnas.2423072122</a>
  apa: Bombari, S., &#38; Mondelli, M. (2025). Privacy for free in the overparameterized
    regime. <i>Proceedings of the National Academy of Sciences</i>. National Academy
    of Sciences. <a href="https://doi.org/10.1073/pnas.2423072122">https://doi.org/10.1073/pnas.2423072122</a>
  chicago: Bombari, Simone, and Marco Mondelli. “Privacy for Free in the Overparameterized
    Regime.” <i>Proceedings of the National Academy of Sciences</i>. National Academy
    of Sciences, 2025. <a href="https://doi.org/10.1073/pnas.2423072122">https://doi.org/10.1073/pnas.2423072122</a>.
  ieee: S. Bombari and M. Mondelli, “Privacy for free in the overparameterized regime,”
    <i>Proceedings of the National Academy of Sciences</i>, vol. 122, no. 15. National
    Academy of Sciences, 2025.
  ista: Bombari S, Mondelli M. 2025. Privacy for free in the overparameterized regime.
    Proceedings of the National Academy of Sciences. 122(15), e2423072122.
  mla: Bombari, Simone, and Marco Mondelli. “Privacy for Free in the Overparameterized
    Regime.” <i>Proceedings of the National Academy of Sciences</i>, vol. 122, no.
    15, e2423072122, National Academy of Sciences, 2025, doi:<a href="https://doi.org/10.1073/pnas.2423072122">10.1073/pnas.2423072122</a>.
  short: S. Bombari, M. Mondelli, Proceedings of the National Academy of Sciences
    122 (2025).
corr_author: '1'
date_created: 2025-04-27T22:02:13Z
date_published: 2025-04-15T00:00:00Z
date_updated: 2026-05-20T08:23:19Z
day: '15'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.1073/pnas.2423072122
external_id:
  arxiv:
  - '2410.14787'
  isi:
  - '001471214000001'
  pmid:
  - '40215275'
file:
- access_level: open_access
  checksum: 1ac6f78e368d35a0cafb4d2d9bd63443
  content_type: application/pdf
  creator: dernst
  date_created: 2025-05-05T07:27:54Z
  date_updated: 2025-05-05T07:27:54Z
  file_id: '19648'
  file_name: 2025_PNAS_Bombari.pdf
  file_size: 2328320
  relation: main_file
  success: 1
file_date_updated: 2025-05-05T07:27:54Z
has_accepted_license: '1'
intvolume: '       122'
isi: 1
issue: '15'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
pmid: 1
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
- _id: 92099302-16d5-11f0-9cad-f9a785f54fbd
  name: 'Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust
    LLMs'
publication: Proceedings of the National Academy of Sciences
publication_identifier:
  eissn:
  - 1091-6490
  issn:
  - 0027-8424
publication_status: published
publisher: National Academy of Sciences
quality_controlled: '1'
scopus_import: '1'
status: public
title: Privacy for free in the overparameterized regime
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 122
year: '2025'
...
---
OA_place: repository
OA_type: green
_id: '18972'
abstract:
- lang: eng
  text: 'Deep learning models are known to overfit and memorize spurious features
    in the training dataset. While numerous empirical studies have aimed at understanding
    this phenomenon, a rigorous theoretical framework to quantify it is still missing.
    In this paper, we consider spurious features that are uncorrelated with the learning
    task, and we provide a precise characterization of how they are memorized via
    two separate terms: (i) the stability of the model with respect to individual
    training samples, and (ii) the feature alignment between the spurious pattern
    and the full sample. While the first term is well established in learning theory
    and it is connected to the generalization error in classical work, the second
    one is, to the best of our knowledge, novel. Our key technical result gives a
    precise characterization of the feature alignment for the two prototypical settings
    of random features (RF) and neural tangent kernel (NTK) regression. We prove that
    the memorization of spurious features weakens as the generalization capability
    increases and, through the analysis of the feature alignment, we unveil the role
    of the model and of its activation function. Numerical experiments show the predictive
    power of our theory on standard datasets (MNIST, CIFAR-10).'
acknowledgement: "The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank (in alphabetical order) Grigorios Chrysos, Simone Maria
  Giancola, Mahyar\r\nJafari Nodeh, Christoph Lampert, Marco Miani, GuanWen Qiu, and
  Peter Sukenık for helpful discussions."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. How spurious features are memorized: Precise analysis
    for random and NTK features. In: <i>41st International Conference on Machine Learning</i>.
    Vol 235. ML Research Press; 2024:4267-4299.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). How spurious features are memorized:
    Precise analysis for random and NTK features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4267–4299). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4267–99. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “How spurious features are memorized: Precise
    analysis for random and NTK features,” in <i>41st International Conference on
    Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4267–4299.'
  ista: 'Bombari S, Mondelli M. 2024. How spurious features are memorized: Precise
    analysis for random and NTK features. 41st International Conference on Machine
    Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235,
    4267–4299.'
  mla: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4267–99.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4267–4299.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:29:47Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2305.12100'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2305.12100
month: '07'
oa: 1
oa_version: Preprint
page: 4267-4299
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How spurious features are memorized: Precise analysis for random and NTK features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18973'
abstract:
- lang: eng
  text: 'Understanding the reasons behind the exceptional success of transformers
    requires a better analysis of why attention layers are suitable for NLP tasks.
    In particular, such tasks require predictive models to capture contextual meaning
    which often depends on one or few words, even if the sentence is long. Our work
    studies this key property, dubbed word sensitivity (WS), in the prototypical setting
    of random features. We show that attention layers enjoy high WS, namely, there
    exists a vector in the space of embeddings that largely perturbs the random attention
    features map. The argument critically exploits the role of the softmax in the
    attention layer, highlighting its benefit compared to other activations (e.g.,
    ReLU). In contrast, the WS of standard random features is of order 1/n−−√, n being
    the number of words in the textual sample, and thus it decays with the length
    of the context. We then translate these results on the word sensitivity into generalization
    bounds: due to their low WS, random features provably cannot learn to distinguish
    between two sentences that differ only in a single word; in contrast, due to their
    high WS, random attention features have higher generalization capabilities. We
    validate our theoretical results with experimental evidence over the BERT-Base
    word embeddings of the imdb review dataset.'
acknowledgement: The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank Mohammad Hossein Amani, Lorenzo Beretta, and Clement
  Rebuffel for helpful discussions.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. Towards understanding the word sensitivity of attention
    layers: A study via random features. In: <i>41st International Conference on Machine
    Learning</i>. Vol 235. ML Research Press; 2024:4300-4328.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). Towards understanding the word sensitivity
    of attention layers: A study via random features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4300–4328). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4300–4328. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “Towards understanding the word sensitivity of
    attention layers: A study via random features,” in <i>41st International Conference
    on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4300–4328.'
  ista: 'Bombari S, Mondelli M. 2024. Towards understanding the word sensitivity of
    attention layers: A study via random features. 41st International Conference on
    Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol.
    235, 4300–4328.'
  mla: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4300–28.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4300–4328.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:35:49Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2402.02969'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2402.02969
month: '07'
oa: 1
oa_version: Preprint
page: 4300-4328
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Towards understanding the word sensitivity of attention layers: A study via
  random features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
_id: '12859'
abstract:
- lang: eng
  text: 'Machine learning models are vulnerable to adversarial perturbations, and
    a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through
    the lens of over-parameterization: interpolating smoothly the data requires significantly
    more parameters than simply memorizing it. However, this "universal" law provides
    only a necessary condition for robustness, and it is unable to discriminate between
    models. In this paper, we address these gaps by focusing on empirical risk minimization
    in two prototypical settings, namely, random features and the neural tangent kernel
    (NTK). We prove that, for random features, the model is not robust for any degree
    of over-parameterization, even when the necessary condition coming from the universal
    law of robustness is satisfied. In contrast, for even activations, the NTK model
    meets the universal lower bound, and it is robust as soon as the necessary condition
    on over-parameterization is fulfilled. This also addresses a conjecture in prior
    work by Bubeck, Li and Nagaraj. Our analysis decouples the effect of the kernel
    of the model from an "interaction matrix", which describes the interaction with
    the test data and captures the effect of the activation. Our theoretical results
    are corroborated by numerical evidence on both synthetic and standard datasets
    (MNIST, CIFAR-10).'
acknowledgement: "Simone Bombari and Marco Mondelli were partially supported by the
  2019 Lopez-Loreta prize, and\r\nthe authors would like to thank Hamed Hassani for
  helpful discussions.\r\n"
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Shayan
  full_name: Kiyani, Shayan
  id: f5a2b424-e339-11ed-8435-ff3b4fe70cf8
  last_name: Kiyani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Kiyani S, Mondelli M. Beyond the universal law of robustness: Sharper
    laws for random features and neural tangent kernels. In: <i>Proceedings of the
    40th International Conference on Machine Learning</i>. Vol 202. ML Research Press;
    2023:2738-2776.'
  apa: 'Bombari, S., Kiyani, S., &#38; Mondelli, M. (2023). Beyond the universal law
    of robustness: Sharper laws for random features and neural tangent kernels. In
    <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol.
    202, pp. 2738–2776). Honolulu, HI, United States: ML Research Press.'
  chicago: 'Bombari, Simone, Shayan Kiyani, and Marco Mondelli. “Beyond the Universal
    Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels.”
    In <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    202:2738–76. ML Research Press, 2023.'
  ieee: 'S. Bombari, S. Kiyani, and M. Mondelli, “Beyond the universal law of robustness:
    Sharper laws for random features and neural tangent kernels,” in <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, Honolulu, HI, United
    States, 2023, vol. 202, pp. 2738–2776.'
  ista: 'Bombari S, Kiyani S, Mondelli M. 2023. Beyond the universal law of robustness:
    Sharper laws for random features and neural tangent kernels. Proceedings of the
    40th International Conference on Machine Learning. ICML: International Conference
    on Machine Learning, PMLR, vol. 202, 2738–2776.'
  mla: 'Bombari, Simone, et al. “Beyond the Universal Law of Robustness: Sharper Laws
    for Random Features and Neural Tangent Kernels.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 2738–76.'
  short: S. Bombari, S. Kiyani, M. Mondelli, in:, Proceedings of the 40th International
    Conference on Machine Learning, ML Research Press, 2023, pp. 2738–2776.
conference:
  end_date: 2023-07-29
  location: Honolulu, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
corr_author: '1'
date_created: 2023-04-23T16:11:03Z
date_published: 2023-10-27T00:00:00Z
date_updated: 2025-04-15T07:50:16Z
day: '27'
department:
- _id: GradSch
- _id: MaMo
external_id:
  arxiv:
  - '2302.01629'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2302.01629
month: '10'
oa: 1
oa_version: Preprint
page: 2738-2776
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 40th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/simone-bombari/beyond-universal-robustness
status: public
title: 'Beyond the universal law of robustness: Sharper laws for random features and
  neural tangent kernels'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
OA_place: repository
OA_type: green
_id: '12537'
abstract:
- lang: eng
  text: 'The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide
    memorization, optimization and generalization guarantees in deep neural networks.
    A line of work has studied the NTK spectrum for two-layer and deep networks with
    at least a layer with Ω(N) neurons, N being the number of training samples. Furthermore,
    there is increasing evidence suggesting that deep networks with sub-linear layer
    widths are powerful memorizers and optimizers, as long as the number of parameters
    exceeds the number of samples. Thus, a natural open question is whether the NTK
    is well conditioned in such a challenging sub-linear setup. In this paper, we
    answer this question in the affirmative. Our key technical contribution is a lower
    bound on the smallest NTK eigenvalue for deep networks with the minimum possible
    over-parameterization: the number of parameters is roughly Ω(N) and, hence, the
    number of neurons is as little as Ω(N−−√). To showcase the applicability of our
    NTK bounds, we provide two results concerning memorization capacity and optimization
    guarantees for gradient descent training.'
acknowledgement: "The authors were partially supported by the 2019 Lopez-Loreta prize,
  and they would like to thank\r\nQuynh Nguyen, Mahdi Soltanolkotabi and Adel Javanmard
  for helpful discussions.\r\n"
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Mohammad Hossein
  full_name: Amani, Mohammad Hossein
  last_name: Amani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Amani MH, Mondelli M. Memorization and optimization in deep neural
    networks with minimum over-parameterization. In: <i>36th Conference on Neural
    Information Processing Systems</i>. Vol 35. Neural Information Processing Systems
    Foundation; 2022:7628-7640.'
  apa: 'Bombari, S., Amani, M. H., &#38; Mondelli, M. (2022). Memorization and optimization
    in deep neural networks with minimum over-parameterization. In <i>36th Conference
    on Neural Information Processing Systems</i> (Vol. 35, pp. 7628–7640). New Orleans,
    LA, United States: Neural Information Processing Systems Foundation.'
  chicago: Bombari, Simone, Mohammad Hossein Amani, and Marco Mondelli. “Memorization
    and Optimization in Deep Neural Networks with Minimum Over-Parameterization.”
    In <i>36th Conference on Neural Information Processing Systems</i>, 35:7628–40.
    Neural Information Processing Systems Foundation, 2022.
  ieee: S. Bombari, M. H. Amani, and M. Mondelli, “Memorization and optimization in
    deep neural networks with minimum over-parameterization,” in <i>36th Conference
    on Neural Information Processing Systems</i>, New Orleans, LA, United States,
    2022, vol. 35, pp. 7628–7640.
  ista: 'Bombari S, Amani MH, Mondelli M. 2022. Memorization and optimization in deep
    neural networks with minimum over-parameterization. 36th Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    Advances in Neural Information Processing Systems, vol. 35, 7628–7640.'
  mla: Bombari, Simone, et al. “Memorization and Optimization in Deep Neural Networks
    with Minimum Over-Parameterization.” <i>36th Conference on Neural Information
    Processing Systems</i>, vol. 35, Neural Information Processing Systems Foundation,
    2022, pp. 7628–40.
  short: S. Bombari, M.H. Amani, M. Mondelli, in:, 36th Conference on Neural Information
    Processing Systems, Neural Information Processing Systems Foundation, 2022, pp.
    7628–7640.
conference:
  end_date: 2022-12-09
  location: New Orleans, LA, United States
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2022-11-28
corr_author: '1'
date_created: 2023-02-10T13:46:37Z
date_published: 2022-07-24T00:00:00Z
date_updated: 2025-05-14T11:28:22Z
day: '24'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2205.10217'
intvolume: '        35'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: ' https://doi.org/10.48550/arXiv.2205.10217'
month: '07'
oa: 1
oa_version: Preprint
page: 7628-7640
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 36th Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
  isbn:
  - '9781713871088'
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
status: public
title: Memorization and optimization in deep neural networks with minimum over-parameterization
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 35
year: '2022'
...
---
_id: '12538'
abstract:
- lang: eng
  text: In this paper, we study the compression of a target two-layer neural network
    with N nodes into a compressed network with M<N nodes. More precisely, we consider
    the setting in which the weights of the target network are i.i.d. sub-Gaussian,
    and we minimize the population L_2 loss between the outputs of the target and
    of the compressed network, under the assumption of Gaussian inputs. By using tools
    from high-dimensional probability, we show that this non-convex problem can be
    simplified when the target network is sufficiently over-parameterized, and provide
    the error rate of this approximation as a function of the input dimension and
    N. In this mean-field limit, the simplified objective, as well as the optimal
    weights of the compressed network, does not depend on the realization of the target
    network, but only on expected scaling factors. Furthermore, for networks with
    ReLU activation, we conjecture that the optimum of the simplified optimization
    problem is achieved by taking weights on the Equiangular Tight Frame (ETF), while
    the scaling of the weights and the orientation of the ETF depend on the parameters
    of the target network. Numerical evidence is provided to support this conjecture.
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Mohammad Hossein
  full_name: Amani, Mohammad Hossein
  last_name: Amani
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Rattana
  full_name: Pukdee, Rattana
  last_name: Pukdee
- first_name: Stefano
  full_name: Rini, Stefano
  last_name: Rini
citation:
  ama: Amani MH, Bombari S, Mondelli M, Pukdee R, Rini S. Sharp asymptotics on the
    compression of two-layer neural networks. <i>IEEE Information Theory Workshop</i>.
    2022:588-593. doi:<a href="https://doi.org/10.1109/ITW54588.2022.9965870">10.1109/ITW54588.2022.9965870</a>
  apa: 'Amani, M. H., Bombari, S., Mondelli, M., Pukdee, R., &#38; Rini, S. (2022).
    Sharp asymptotics on the compression of two-layer neural networks. <i>IEEE Information
    Theory Workshop</i>. Mumbai, India: IEEE. <a href="https://doi.org/10.1109/ITW54588.2022.9965870">https://doi.org/10.1109/ITW54588.2022.9965870</a>'
  chicago: Amani, Mohammad Hossein, Simone Bombari, Marco Mondelli, Rattana Pukdee,
    and Stefano Rini. “Sharp Asymptotics on the Compression of Two-Layer Neural Networks.”
    <i>IEEE Information Theory Workshop</i>. IEEE, 2022. <a href="https://doi.org/10.1109/ITW54588.2022.9965870">https://doi.org/10.1109/ITW54588.2022.9965870</a>.
  ieee: M. H. Amani, S. Bombari, M. Mondelli, R. Pukdee, and S. Rini, “Sharp asymptotics
    on the compression of two-layer neural networks,” <i>IEEE Information Theory Workshop</i>.
    IEEE, pp. 588–593, 2022.
  ista: Amani MH, Bombari S, Mondelli M, Pukdee R, Rini S. 2022. Sharp asymptotics
    on the compression of two-layer neural networks. IEEE Information Theory Workshop.,
    588–593.
  mla: Amani, Mohammad Hossein, et al. “Sharp Asymptotics on the Compression of Two-Layer
    Neural Networks.” <i>IEEE Information Theory Workshop</i>, IEEE, 2022, pp. 588–93,
    doi:<a href="https://doi.org/10.1109/ITW54588.2022.9965870">10.1109/ITW54588.2022.9965870</a>.
  short: M.H. Amani, S. Bombari, M. Mondelli, R. Pukdee, S. Rini, IEEE Information
    Theory Workshop (2022) 588–593.
conference:
  end_date: 2022-11-09
  location: Mumbai, India
  name: 'ITW: Information Theory Workshop'
  start_date: 2022-11-01
date_created: 2023-02-10T13:47:56Z
date_published: 2022-11-16T00:00:00Z
date_updated: 2025-09-10T09:53:31Z
day: '16'
department:
- _id: MaMo
doi: 10.1109/ITW54588.2022.9965870
external_id:
  arxiv:
  - '2205.08199'
  isi:
  - '000904341100099'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: ' https://doi.org/10.48550/arXiv.2205.08199'
month: '11'
oa: 1
oa_version: Preprint
page: 588-593
publication: IEEE Information Theory Workshop
publication_identifier:
  isbn:
  - '9781665483414'
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: Sharp asymptotics on the compression of two-layer neural networks
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
year: '2022'
...
---
_id: '12860'
abstract:
- lang: eng
  text: 'Memorization of the relation between entities in a dataset can lead to privacy
    issues when using a trained model for question answering. We introduce Relational
    Memorization (RM) to understand, quantify and control this phenomenon. While bounding
    general memorization can have detrimental effects on the performance of a trained
    model, bounding RM does not prevent effective learning. The difference is most
    pronounced when the data distribution is long-tailed, with many queries having
    only few training examples: Impeding general memorization prevents effective learning,
    while impeding only relational memorization still allows learning general properties
    of the underlying concepts. We formalize the notion of Relational Privacy (RP)
    and, inspired by Differential Privacy (DP), we provide a possible definition of
    Differential Relational Privacy (DrP). These notions can be used to describe and
    compute bounds on the amount of RM in a trained model. We illustrate Relational
    Privacy concepts in experiments with large-scale models for Question Answering.'
article_number: '2203.16701'
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Alessandro
  full_name: Achille, Alessandro
  last_name: Achille
- first_name: Zijian
  full_name: Wang, Zijian
  last_name: Wang
- first_name: Yu-Xiang
  full_name: Wang, Yu-Xiang
  last_name: Wang
- first_name: Yusheng
  full_name: Xie, Yusheng
  last_name: Xie
- first_name: Kunwar Yashraj
  full_name: Singh, Kunwar Yashraj
  last_name: Singh
- first_name: Srikar
  full_name: Appalaraju, Srikar
  last_name: Appalaraju
- first_name: Vijay
  full_name: Mahadevan, Vijay
  last_name: Mahadevan
- first_name: Stefano
  full_name: Soatto, Stefano
  last_name: Soatto
citation:
  ama: Bombari S, Achille A, Wang Z, et al. Towards differential relational privacy
    and its use in question answering. <i>arXiv</i>. doi:<a href="https://doi.org/10.48550/arXiv.2203.16701">10.48550/arXiv.2203.16701</a>
  apa: Bombari, S., Achille, A., Wang, Z., Wang, Y.-X., Xie, Y., Singh, K. Y., … Soatto,
    S. (n.d.). Towards differential relational privacy and its use in question answering.
    <i>arXiv</i>. <a href="https://doi.org/10.48550/arXiv.2203.16701">https://doi.org/10.48550/arXiv.2203.16701</a>
  chicago: Bombari, Simone, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng
    Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, and Stefano Soatto.
    “Towards Differential Relational Privacy and Its Use in Question Answering.” <i>ArXiv</i>,
    n.d. <a href="https://doi.org/10.48550/arXiv.2203.16701">https://doi.org/10.48550/arXiv.2203.16701</a>.
  ieee: S. Bombari <i>et al.</i>, “Towards differential relational privacy and its
    use in question answering,” <i>arXiv</i>. .
  ista: Bombari S, Achille A, Wang Z, Wang Y-X, Xie Y, Singh KY, Appalaraju S, Mahadevan
    V, Soatto S. Towards differential relational privacy and its use in question answering.
    arXiv, 2203.16701.
  mla: Bombari, Simone, et al. “Towards Differential Relational Privacy and Its Use
    in Question Answering.” <i>ArXiv</i>, 2203.16701, doi:<a href="https://doi.org/10.48550/arXiv.2203.16701">10.48550/arXiv.2203.16701</a>.
  short: S. Bombari, A. Achille, Z. Wang, Y.-X. Wang, Y. Xie, K.Y. Singh, S. Appalaraju,
    V. Mahadevan, S. Soatto, ArXiv (n.d.).
date_created: 2023-04-23T16:11:48Z
date_published: 2022-03-30T00:00:00Z
date_updated: 2023-04-25T07:34:49Z
day: '30'
department:
- _id: GradSch
- _id: MaMo
doi: 10.48550/arXiv.2203.16701
external_id:
  arxiv:
  - '2203.16701'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2203.16701
month: '03'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: Towards differential relational privacy and its use in question answering
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...