---
DOAJ_listed: '1'
OA_place: publisher
OA_type: gold
_id: '21488'
abstract:
- lang: eng
  text: Human height is a model for the genetic analysis of complex traits, and recent
    studies suggest the presence of thousands of common genetic variant associations
    and hundreds of low-frequency/rare variants. Here, we develop a new algorithmic
    paradigm based on approximate message passing (genomic vector approximate message
    passing [gVAMP]) for identifying DNA sequence variants associated with complex
    traits and common diseases in large-scale whole-genome sequencing (WGS) data.
    We show that gVAMP accurately localizes associations to variants with the correct
    frequency and position in the DNA, outperforming existing fine-mapping methods
    in selecting the appropriate genetic variants within WGS data. We then apply gVAMP
    to jointly model the relationship of tens of millions of WGS variants with human
    height in hundreds of thousands of UK Biobank individuals. We identify 59 rare
    variants and gene burden scores alongside many hundreds of DNA regions containing
    common variant associations and show that understanding the genetic basis of complex
    traits will require the joint analysis of hundreds of millions of variables measured
    on millions of people. The polygenic risk scores obtained from gVAMP have high
    accuracy (including a prediction accuracy of ∼46% for human height) and outperform
    current methods for downstream tasks such as mixed linear model association testing
    across 13 UK Biobank traits. In conclusion, gVAMP offers a scalable foundation
    for a wider range of analyses in WGS data.
acknowledgement: We thank Malgorzata Borczyk for creating the gene burden scores.
  We thank Robin Beaumont, Amedeo Roberto Esposito, Gareth Hawkes, Philip Schniter,
  Matthew Stephens, Pragya Sur, Peter Visscher, Michael Weedon, and Harry Wright for
  providing valuable suggestions and comments on earlier versions of the work. This
  project was funded by a Lopez-Loreta Prize to M.M., an SNSF Eccellenza Grant to
  M.R.R. (PCEGP3-181181), an ERC Starting Grant to M.M. (INF2, project number 101161364),
  and core funding from ISTA. High-performance computing was supported by the Scientific
  Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).
  We would like to acknowledge the participants and investigators of the UK Biobank
  study. We gratefully acknowledge the All of Us participants for their contributions,
  without whom this research would not have been possible. We also thank the National
  Institutes of Health All of Us Research Program for making available the participant
  data (and/or samples and/or cohort) examined in this study.
article_number: '101162'
article_processing_charge: Yes
article_type: original
author:
- first_name: Al
  full_name: Depope, Al
  id: 0b77531d-dbcd-11ea-9d1d-a8eee0bf3830
  last_name: Depope
- first_name: Jakub
  full_name: Bajzik, Jakub
  id: b995e25b-8c4b-11ed-a6d8-f71b7bcd6122
  last_name: Bajzik
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Matthew Richard
  full_name: Robinson, Matthew Richard
  id: E5D42276-F5DA-11E9-8E24-6303E6697425
  last_name: Robinson
  orcid: 0000-0001-8982-8813
citation:
  ama: Depope A, Bajzik J, Mondelli M, Robinson MR. Joint modeling of whole-genome
    sequencing data for human height via approximate message passing. <i>Cell Genomics</i>.
    2026. doi:<a href="https://doi.org/10.1016/j.xgen.2026.101162">10.1016/j.xgen.2026.101162</a>
  apa: Depope, A., Bajzik, J., Mondelli, M., &#38; Robinson, M. R. (2026). Joint modeling
    of whole-genome sequencing data for human height via approximate message passing.
    <i>Cell Genomics</i>. Elsevier. <a href="https://doi.org/10.1016/j.xgen.2026.101162">https://doi.org/10.1016/j.xgen.2026.101162</a>
  chicago: Depope, Al, Jakub Bajzik, Marco Mondelli, and Matthew Richard Robinson.
    “Joint Modeling of Whole-Genome Sequencing Data for Human Height via Approximate
    Message Passing.” <i>Cell Genomics</i>. Elsevier, 2026. <a href="https://doi.org/10.1016/j.xgen.2026.101162">https://doi.org/10.1016/j.xgen.2026.101162</a>.
  ieee: A. Depope, J. Bajzik, M. Mondelli, and M. R. Robinson, “Joint modeling of
    whole-genome sequencing data for human height via approximate message passing,”
    <i>Cell Genomics</i>. Elsevier, 2026.
  ista: Depope A, Bajzik J, Mondelli M, Robinson MR. 2026. Joint modeling of whole-genome
    sequencing data for human height via approximate message passing. Cell Genomics.,
    101162.
  mla: Depope, Al, et al. “Joint Modeling of Whole-Genome Sequencing Data for Human
    Height via Approximate Message Passing.” <i>Cell Genomics</i>, 101162, Elsevier,
    2026, doi:<a href="https://doi.org/10.1016/j.xgen.2026.101162">10.1016/j.xgen.2026.101162</a>.
  short: A. Depope, J. Bajzik, M. Mondelli, M.R. Robinson, Cell Genomics (2026).
corr_author: '1'
date_created: 2026-03-23T15:10:03Z
date_published: 2026-02-18T00:00:00Z
date_updated: 2026-04-28T12:08:37Z
day: '18'
ddc:
- '000'
- '570'
department:
- _id: MaMo
- _id: MaRo
doi: 10.1016/j.xgen.2026.101162
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.1016/j.xgen.2026.101162
month: '02'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
- _id: 9B8D11D6-BA93-11EA-9121-9846C619BF3A
  grant_number: PCEGP3_181181
  name: Improving estimation and prediction of common complex disease risk
publication: Cell Genomics
publication_identifier:
  eissn:
  - 2666-979X
publication_status: epub_ahead
publisher: Elsevier
quality_controlled: '1'
related_material:
  link:
  - description: News on ISTA website
    relation: press_release
    url: https://ista.ac.at/en/news/big-data-and-human-height/
status: public
title: Joint modeling of whole-genome sequencing data for human height via approximate
  message passing
tmp:
  image: /images/cc_by_nc_nd.png
  legal_code_url: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
  name: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
    (CC BY-NC-ND 4.0)
  short: CC BY-NC-ND (4.0)
type: journal_article
user_id: ba8df636-2132-11f1-aed0-ed93e2281fdd
year: '2026'
...
---
OA_place: publisher
OA_type: diamond
_id: '20033'
abstract:
- lang: eng
  text: 'A growing number of machine learning scenarios rely on knowledge distillation
    where one uses the output of a surrogate model as labels to supervise the training
    of a target model. In this work, we provide a sharp characterization of this process
    for ridgeless, high-dimensional regression, under two settings: (i) model shift,
    where the surrogate model is arbitrary, and (ii) distribution shift, where the
    surrogate model is the solution of empirical risk minimization with out-of-distribution
    data. In both cases, we characterize the precise risk of the target model through
    non-asymptotic bounds in terms of sample size and data distribution under mild
    conditions. As a consequence, we identify the form of the optimal surrogate model,
    which reveals the benefits and limitations of discarding weak features in a data-dependent
    fashion. In the context of weak-to-strong (W2S) generalization, this has the interpretation
    that (i) W2S training, with the surrogate as the weak model, can provably outperform
    training with strong labels under the same data budget, but (ii) it is unable
    to improve the data scaling law. We validate our results on numerical experiments
    both on ridgeless regression and on neural network architectures.'
acknowledgement: M.E.I., H.A.G., E.O.T., S.O. are supported by the NSF grants CCF-2046816,
  CCF-2403075, the Office of Naval Research grant N000142412289, an OpenAI Agentic
  AI Systems grant, and gifts by Open Philanthropy and Google Research. M. M. is funded
  by the European Union (ERC, INF2, project number 101161364). Views and opinions
  expressed are however those of the author(s) only and do not necessarily reflect
  those of the European Union or the European Research Council Executive Agency. Neither
  the European Union nor the granting authority can be held responsible for them.
article_processing_charge: No
arxiv: 1
author:
- first_name: M.
  full_name: Emrullah Ildiz, M.
  last_name: Emrullah Ildiz
- first_name: Halil Alperen
  full_name: Gozeten, Halil Alperen
  last_name: Gozeten
- first_name: Ege Onur
  full_name: Taga, Ege Onur
  last_name: Taga
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Samet
  full_name: Oymak, Samet
  last_name: Oymak
citation:
  ama: 'Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. High-dimensional
    analysis of knowledge distillation: Weak-to-Strong generalization and scaling
    laws. In: <i>13th International Conference on Learning Representations</i>. ICLR;
    2025:2967-3006.'
  apa: 'Emrullah Ildiz, M., Gozeten, H. A., Taga, E. O., Mondelli, M., &#38; Oymak,
    S. (2025). High-dimensional analysis of knowledge distillation: Weak-to-Strong
    generalization and scaling laws. In <i>13th International Conference on Learning
    Representations</i> (pp. 2967–3006). Singapore, Singapore: ICLR.'
  chicago: 'Emrullah Ildiz, M., Halil Alperen Gozeten, Ege Onur Taga, Marco Mondelli,
    and Samet Oymak. “High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong
    Generalization and Scaling Laws.” In <i>13th International Conference on Learning
    Representations</i>, 2967–3006. ICLR, 2025.'
  ieee: 'M. Emrullah Ildiz, H. A. Gozeten, E. O. Taga, M. Mondelli, and S. Oymak,
    “High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization
    and scaling laws,” in <i>13th International Conference on Learning Representations</i>,
    Singapore, Singapore, 2025, pp. 2967–3006.'
  ista: 'Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. 2025. High-dimensional
    analysis of knowledge distillation: Weak-to-Strong generalization and scaling
    laws. 13th International Conference on Learning Representations. ICLR: International
    Conference on Learning Representations, 2967–3006.'
  mla: 'Emrullah Ildiz, M., et al. “High-Dimensional Analysis of Knowledge Distillation:
    Weak-to-Strong Generalization and Scaling Laws.” <i>13th International Conference
    on Learning Representations</i>, ICLR, 2025, pp. 2967–3006.'
  short: M. Emrullah Ildiz, H.A. Gozeten, E.O. Taga, M. Mondelli, S. Oymak, in:, 13th
    International Conference on Learning Representations, ICLR, 2025, pp. 2967–3006.
conference:
  end_date: 2025-04-28
  location: Singapore, Singapore
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2025-04-24
date_created: 2025-07-20T22:02:02Z
date_published: 2025-04-01T00:00:00Z
date_updated: 2025-08-04T08:33:58Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2410.18837'
file:
- access_level: open_access
  checksum: 5a38b093ebb4ee4eb662ea142621a5ca
  content_type: application/pdf
  creator: dernst
  date_created: 2025-08-04T08:32:38Z
  date_updated: 2025-08-04T08:32:38Z
  file_id: '20112'
  file_name: 2025_ICLR_Ildiz.pdf
  file_size: 528171
  relation: main_file
  success: 1
file_date_updated: 2025-08-04T08:32:38Z
has_accepted_license: '1'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
page: 2967-3006
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
publication: 13th International Conference on Learning Representations
publication_identifier:
  isbn:
  - '9798331320850'
publication_status: published
publisher: ICLR
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization
  and scaling laws'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2025'
...
---
OA_place: publisher
OA_type: diamond
_id: '20035'
abstract:
- lang: eng
  text: "Deep neural networks (DNNs) at convergence consistently represent the training
    data in the last layer via a geometric structure referred to as neural collapse.
    This empirical evidence has spurred a line of theoretical research aimed at proving
    the emergence of neural collapse, mostly focusing on the unconstrained features
    model. Here, the features of the penultimate layer are free variables, which makes
    the model data-agnostic and puts into question its ability to capture DNN training.
    Our work addresses the issue, moving away from unconstrained features and\r\nstudying
    DNNs that end with at least two linear layers. We first prove generic guarantees
    on neural collapse that assume (i) low training error and balancedness of linear
    layers (for within-class variability collapse), and (ii) bounded conditioning
    of the features before the linear part (for orthogonality of class-means, and
    their alignment with weight matrices). The balancedness refers to the fact that
    W⊤ℓ+1Wℓ+1 ≈ WℓW⊤ℓfor any pair of consecutive weight matrices of the linear part,
    and the bounded conditioning requires a well-behaved ratio between largest and
    smallest non-zero singular values of the features. We then show that such assumptions
    hold for gradient descent training with weight decay: (i) for networks with a
    wide first layer, we prove low training error and balancedness, and (ii) for solutions
    that are either nearly optimal or stable under large learning rates, we additionally
    prove the bounded conditioning. Taken together, our results are the first to show
    neural collapse in the end-to-end training of DNNs."
acknowledgement: M. M. and P. S. are funded by the European Union (ERC, INF2, project
  number 101161364). Views and opinions expressed are however those of the author(s)
  only and do not necessarily reflect those of the European Union or the European
  Research Council Executive Agency. Neither the European Union nor the granting authority
  can be held responsible for them.
article_processing_charge: No
arxiv: 1
author:
- first_name: Arthur
  full_name: Jacot, Arthur
  last_name: Jacot
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Zihan
  full_name: Wang, Zihan
  last_name: Wang
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Jacot A, Súkeník P, Wang Z, Mondelli M. Wide neural networks trained with
    weight decay provably exhibit neural collapse. In: <i>13th International Conference
    on Learning Representations</i>. ICLR; 2025:1905-1931.'
  apa: 'Jacot, A., Súkeník, P., Wang, Z., &#38; Mondelli, M. (2025). Wide neural networks
    trained with weight decay provably exhibit neural collapse. In <i>13th International
    Conference on Learning Representations</i> (pp. 1905–1931). Singapore, Singapore:
    ICLR.'
  chicago: Jacot, Arthur, Peter Súkeník, Zihan Wang, and Marco Mondelli. “Wide Neural
    Networks Trained with Weight Decay Provably Exhibit Neural Collapse.” In <i>13th
    International Conference on Learning Representations</i>, 1905–31. ICLR, 2025.
  ieee: A. Jacot, P. Súkeník, Z. Wang, and M. Mondelli, “Wide neural networks trained
    with weight decay provably exhibit neural collapse,” in <i>13th International
    Conference on Learning Representations</i>, Singapore, Singapore, 2025, pp. 1905–1931.
  ista: 'Jacot A, Súkeník P, Wang Z, Mondelli M. 2025. Wide neural networks trained
    with weight decay provably exhibit neural collapse. 13th International Conference
    on Learning Representations. ICLR: International Conference on Learning Representations,
    1905–1931.'
  mla: Jacot, Arthur, et al. “Wide Neural Networks Trained with Weight Decay Provably
    Exhibit Neural Collapse.” <i>13th International Conference on Learning Representations</i>,
    ICLR, 2025, pp. 1905–31.
  short: A. Jacot, P. Súkeník, Z. Wang, M. Mondelli, in:, 13th International Conference
    on Learning Representations, ICLR, 2025, pp. 1905–1931.
conference:
  end_date: 2025-04-28
  location: Singapore, Singapore
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2025-04-24
corr_author: '1'
date_created: 2025-07-20T22:02:02Z
date_published: 2025-04-01T00:00:00Z
date_updated: 2025-08-04T08:47:00Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2410.04887'
file:
- access_level: open_access
  checksum: 59c48c173887139647cc9839c0801136
  content_type: application/pdf
  creator: dernst
  date_created: 2025-08-04T08:45:43Z
  date_updated: 2025-08-04T08:45:43Z
  file_id: '20114'
  file_name: 2025_ICLR_Jacot.pdf
  file_size: 1337236
  relation: main_file
  success: 1
file_date_updated: 2025-08-04T08:45:43Z
has_accepted_license: '1'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
page: 1905-1931
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
publication: 13th International Conference on Learning Representations
publication_identifier:
  isbn:
  - '9798331320850'
publication_status: published
publisher: ICLR
quality_controlled: '1'
scopus_import: '1'
status: public
title: Wide neural networks trained with weight decay provably exhibit neural collapse
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2025'
...
---
OA_place: publisher
OA_type: diamond
PlanS_conform: '1'
_id: '20734'
abstract:
- lang: eng
  text: We consider the problem of parameter estimation in a high-dimensional generalized
    linear model. Spectral methods obtained via the principal eigenvector of a suitable
    data-dependent matrix provide a simple yet surprisingly effective solution. However,
    despite their wide use, a rigorous performance characterization, as well as a
    principled way to preprocess the data, are available only for unstructured (i.i.d.
    Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are
    highly structured and exhibit non-trivial correlations. To address the problem,
    we consider correlated Gaussian designs capturing the anisotropic nature of the
    features via a covariance matrix Σ. Our main result is a precise asymptotic characterization
    of the performance of spectral estimators. This allows us to identify the optimal
    preprocessing that minimizes the number of samples needed for parameter estimation.
    Surprisingly, such preprocessing is universal across a broad set of designs, which
    partly addresses a conjecture on optimal spectral estimators for rotationally
    invariant models. Our principled approach vastly improves upon previous heuristic
    methods, including for designs common in computational imaging and genetics. The
    proposed methodology, based on approximate message passing, is broadly applicable
    and opens the way to the precise characterization of spiked matrices and of the
    corresponding spectral methods in a variety of settings.
acknowledgement: "This work was done when Y. Z. and H. C. J. were at the Institute
  of Science and Technology Austria. Y. Z. thanks Hugo Latourelle-Vigeant for bringing
  [53] to the authors’ attention.\r\nY. Z. and M. M. are partially supported by the
  2019 Lopez-Loreta Prize and by the Interdisciplinary Projects Committee (IPC) at
  ISTA. H. C. J. is supported by the ERC Advanced Grant “RMTBeyond” No. 101020331."
article_processing_charge: No
article_type: original
author:
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
- first_name: Hong Chang
  full_name: Ji, Hong Chang
  last_name: Ji
- first_name: Ramji
  full_name: Venkataramanan, Ramji
  last_name: Venkataramanan
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Zhang Y, Ji HC, Venkataramanan R, Mondelli M. Spectral estimators for structured
    generalized linear models via approximate message passing. <i>Mathematical Statistics
    and Learning</i>. 2025;8(3-4):193-304. doi:<a href="https://doi.org/10.4171/MSL/52">10.4171/MSL/52</a>
  apa: Zhang, Y., Ji, H. C., Venkataramanan, R., &#38; Mondelli, M. (2025). Spectral
    estimators for structured generalized linear models via approximate message passing.
    <i>Mathematical Statistics and Learning</i>. EMS Press. <a href="https://doi.org/10.4171/MSL/52">https://doi.org/10.4171/MSL/52</a>
  chicago: Zhang, Yihan, Hong Chang Ji, Ramji Venkataramanan, and Marco Mondelli.
    “Spectral Estimators for Structured Generalized Linear Models via Approximate
    Message Passing.” <i>Mathematical Statistics and Learning</i>. EMS Press, 2025.
    <a href="https://doi.org/10.4171/MSL/52">https://doi.org/10.4171/MSL/52</a>.
  ieee: Y. Zhang, H. C. Ji, R. Venkataramanan, and M. Mondelli, “Spectral estimators
    for structured generalized linear models via approximate message passing,” <i>Mathematical
    Statistics and Learning</i>, vol. 8, no. 3–4. EMS Press, pp. 193–304, 2025.
  ista: Zhang Y, Ji HC, Venkataramanan R, Mondelli M. 2025. Spectral estimators for
    structured generalized linear models via approximate message passing. Mathematical
    Statistics and Learning. 8(3–4), 193–304.
  mla: Zhang, Yihan, et al. “Spectral Estimators for Structured Generalized Linear
    Models via Approximate Message Passing.” <i>Mathematical Statistics and Learning</i>,
    vol. 8, no. 3–4, EMS Press, 2025, pp. 193–304, doi:<a href="https://doi.org/10.4171/MSL/52">10.4171/MSL/52</a>.
  short: Y. Zhang, H.C. Ji, R. Venkataramanan, M. Mondelli, Mathematical Statistics
    and Learning 8 (2025) 193–304.
corr_author: '1'
date_created: 2025-12-07T23:02:02Z
date_published: 2025-09-02T00:00:00Z
date_updated: 2025-12-09T13:53:31Z
day: '02'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.4171/MSL/52
file:
- access_level: open_access
  checksum: 55a1bd9c1b6b0198c42504fb94f4ad4c
  content_type: application/pdf
  creator: dernst
  date_created: 2025-12-09T13:50:03Z
  date_updated: 2025-12-09T13:50:03Z
  file_id: '20752'
  file_name: 2025_MathStatLearning_Zhang.pdf
  file_size: 1379626
  relation: main_file
  success: 1
file_date_updated: 2025-12-09T13:50:03Z
has_accepted_license: '1'
intvolume: '         8'
issue: 3-4
language:
- iso: eng
month: '09'
oa: 1
oa_version: Published Version
page: 193-304
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Mathematical Statistics and Learning
publication_identifier:
  eissn:
  - 2520-2324
  issn:
  - 2520-2316
publication_status: published
publisher: EMS Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Spectral estimators for structured generalized linear models via approximate
  message passing
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 8
year: '2025'
...
---
OA_place: publisher
OA_type: hybrid
PlanS_conform: '1'
_id: '19065'
abstract:
- lang: eng
  text: 'The identification of the parameters of a neural network from finite samples
    of input-output pairs is often referred to as the teacher-student model, and this
    model has represented a popular framework for understanding training and generalization.
    Even if the problem is NP-complete in the worst case, a rapidly growing literature
    – after adding suitable distributional assumptions – has established finite sample
    identification of two-layer networks with a number of neurons (math. formula),
    D being the input dimension. For the range (math. formula) the problem becomes
    harder, and truly little is known for networks parametrized by biases as well.
    This paper fills the gap by providing efficient algorithms and rigorous theoretical
    guarantees of finite sample identification for such wider shallow networks with
    biases. Our approach is based on a two-step pipeline: first, we recover the direction
    of the weights, by exploiting second order information; next, we identify the
    signs by suitable algebraic evaluations, and we recover the biases by empirical
    risk minimization via gradient descent. Numerical results demonstrate the effectiveness
    of our approach.'
article_number: '101749'
article_processing_charge: No
article_type: original
author:
- first_name: Massimo
  full_name: Fornasier, Massimo
  last_name: Fornasier
- first_name: Timo
  full_name: Klock, Timo
  last_name: Klock
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Michael
  full_name: Rauchensteiner, Michael
  last_name: Rauchensteiner
citation:
  ama: Fornasier M, Klock T, Mondelli M, Rauchensteiner M. Efficient identification
    of wide shallow neural networks with biases. <i>Applied and Computational Harmonic
    Analysis</i>. 2025;77. doi:<a href="https://doi.org/10.1016/j.acha.2025.101749">10.1016/j.acha.2025.101749</a>
  apa: Fornasier, M., Klock, T., Mondelli, M., &#38; Rauchensteiner, M. (2025). Efficient
    identification of wide shallow neural networks with biases. <i>Applied and Computational
    Harmonic Analysis</i>. Elsevier. <a href="https://doi.org/10.1016/j.acha.2025.101749">https://doi.org/10.1016/j.acha.2025.101749</a>
  chicago: Fornasier, Massimo, Timo Klock, Marco Mondelli, and Michael Rauchensteiner.
    “Efficient Identification of Wide Shallow Neural Networks with Biases.” <i>Applied
    and Computational Harmonic Analysis</i>. Elsevier, 2025. <a href="https://doi.org/10.1016/j.acha.2025.101749">https://doi.org/10.1016/j.acha.2025.101749</a>.
  ieee: M. Fornasier, T. Klock, M. Mondelli, and M. Rauchensteiner, “Efficient identification
    of wide shallow neural networks with biases,” <i>Applied and Computational Harmonic
    Analysis</i>, vol. 77. Elsevier, 2025.
  ista: Fornasier M, Klock T, Mondelli M, Rauchensteiner M. 2025. Efficient identification
    of wide shallow neural networks with biases. Applied and Computational Harmonic
    Analysis. 77, 101749.
  mla: Fornasier, Massimo, et al. “Efficient Identification of Wide Shallow Neural
    Networks with Biases.” <i>Applied and Computational Harmonic Analysis</i>, vol.
    77, 101749, Elsevier, 2025, doi:<a href="https://doi.org/10.1016/j.acha.2025.101749">10.1016/j.acha.2025.101749</a>.
  short: M. Fornasier, T. Klock, M. Mondelli, M. Rauchensteiner, Applied and Computational
    Harmonic Analysis 77 (2025).
corr_author: '1'
date_created: 2025-02-23T23:01:54Z
date_published: 2025-06-01T00:00:00Z
date_updated: 2025-09-30T10:35:09Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.1016/j.acha.2025.101749
external_id:
  isi:
  - '001430202700001'
file:
- access_level: open_access
  checksum: 657f258af0f7ca135e69959fd13e2d63
  content_type: application/pdf
  creator: dernst
  date_created: 2025-08-05T12:22:04Z
  date_updated: 2025-08-05T12:22:04Z
  file_id: '20131'
  file_name: 2025_ApplCompAnalysis_Fornasier.pdf
  file_size: 2223350
  relation: main_file
  success: 1
file_date_updated: 2025-08-05T12:22:04Z
has_accepted_license: '1'
intvolume: '        77'
isi: 1
language:
- iso: eng
month: '06'
oa: 1
oa_version: Published Version
publication: Applied and Computational Harmonic Analysis
publication_identifier:
  eissn:
  - 1096-603X
  issn:
  - 1063-5203
publication_status: published
publisher: Elsevier
quality_controlled: '1'
scopus_import: '1'
status: public
title: Efficient identification of wide shallow neural networks with biases
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
volume: 77
year: '2025'
...
---
OA_place: publisher
OA_type: gold
_id: '21324'
abstract:
- lang: eng
  text: Learning models have been shown to rely on spurious correlations between non-predictive
    features and the associated labels in the training data, with negative implications
    on robustness, bias and fairness. In this work, we provide a statistical characterization
    of this phenomenon for high-dimensional regression, when the data contains a predictive
    core feature x and a spurious feature y. Specifically, we quantify the amount
    of spurious correlations C learned via linear regression, in terms of the data
    covariance and the strength λ of the ridge regularization. As a consequence, we
    first capture the simplicity of y through the spectrum of its covariance, and
    its correlation with x through the Schur complement of the full data covariance.
    Next, we prove a trade-off between C and the in-distribution test loss L, by showing
    that the value of λ that minimizes L lies in an interval where C is increasing.
    Finally, we investigate the effects of over-parameterization via the random features
    model, by showing its equivalence to regularized linear regression. Our theoretical
    results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10
    datasets.
acknowledgement: Marco Mondelli is funded by the European Union (ERC, INF2, project
  number 101161364). Views and opinions expressed are however those of the author(s)
  only and do not necessarily reflect those of the European Union or the European
  Research Council Executive Agency. Neither the European Union nor the granting authority
  can be held responsible for them. Simone Bombari is supported by a Google PhD fellowship.
  The authors would like to thank GuanWen Qiu for helpful discussions.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. Spurious correlations in high dimensional regression:
    The roles of regularization, simplicity bias and over-parameterization. In: <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research
    Press; 2025:4839-4873.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2025). Spurious correlations in high dimensional
    regression: The roles of regularization, simplicity bias and over-parameterization.
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>
    (Vol. 267, pp. 4839–4873). Vancouver, Canada: ML Research Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional
    Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.”
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>,
    267:4839–73. ML Research Press, 2025.'
  ieee: 'S. Bombari and M. Mondelli, “Spurious correlations in high dimensional regression:
    The roles of regularization, simplicity bias and over-parameterization,” in <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada,
    2025, vol. 267, pp. 4839–4873.'
  ista: 'Bombari S, Mondelli M. 2025. Spurious correlations in high dimensional regression:
    The roles of regularization, simplicity bias and over-parameterization. Proceedings
    of the 42nd International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 267, 4839–4873.'
  mla: 'Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional
    Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.”
    <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol.
    267, ML Research Press, 2025, pp. 4839–73.'
  short: S. Bombari, M. Mondelli, in:, Proceedings of the 42nd International Conference
    on Machine Learning, ML Research Press, 2025, pp. 4839–4873.
conference:
  end_date: 2025-07-19
  location: Vancouver, Canada
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2025-07-13
corr_author: '1'
date_created: 2026-02-18T11:58:00Z
date_published: 2025-07-30T00:00:00Z
date_updated: 2026-02-19T08:08:55Z
day: '30'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2502.01347'
file:
- access_level: open_access
  checksum: d4ba4f7717b362ca38878f45e57bd643
  content_type: application/pdf
  creator: dernst
  date_created: 2026-02-19T08:04:38Z
  date_updated: 2026-02-19T08:04:38Z
  file_id: '21335'
  file_name: 2025_ICML_Bombari.pdf
  file_size: 887526
  relation: main_file
  success: 1
file_date_updated: 2026-02-19T08:04:38Z
has_accepted_license: '1'
intvolume: '       267'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 4839-4873
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
- _id: 92099302-16d5-11f0-9cad-f9a785f54fbd
  name: 'Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust
    LLMs'
publication: Proceedings of the 42nd International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: 'Spurious correlations in high dimensional regression: The roles of regularization,
  simplicity bias and over-parameterization'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 267
year: '2025'
...
---
OA_place: publisher
OA_type: gold
_id: '21325'
abstract:
- lang: eng
  text: Test-time training (TTT) methods explicitly update the weights of a model
    to adapt to the specific test instance, and they have found success in a variety
    of settings, including most recently language modeling and reasoning. To demystify
    this success, we investigate a gradient-based TTT algorithm for in-context learning,
    where we train a transformer model on the in-context demonstrations provided in
    the test prompt. Specifically, we provide a comprehensive theoretical characterization
    of linear transformers when the update rule is a single gradient step. Our theory
    (i) delineates the role of alignment between pretraining distribution and target
    task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies
    the sample complexity of TTT including how it can significantly reduce the eventual
    sample size required for in-context learning. As our empirical contribution, we
    study the benefits of TTT for TabPFN, a tabular foundation model. In line with
    our theory, we demonstrate that TTT significantly reduces the required sample
    size for tabular classification (3 to 5 times fewer) unlocking substantial inference
    efficiency with a negligible training cost.
acknowledgement: "H.A.G., M.E.I., X.Z., and S.O. were supported in part by the NSF
  grants CCF2046816, CCF-2403075, CCF-2008020, and the Office of Naval Research grant
  N000142412289.\r\nM. M. is funded by the European Union (ERC, INF2 , project number
  101161364). Views and opinions expressed are, however, those of the author(s) only
  and do not necessarily\r\nreflect those of the European Union or the European Research
  Council Executive Agency. Neither the European Union nor the granting authority
  can be held responsible for them. M.S. is supported by the Packard Fellowship in
  Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF-CAREER
  under award #1846369, DARPA FastNICS program, and NSF-CIF awards #1813877 and #2008443,
  and NIH DP2LM014564-01. The authors also\r\nacknowledge further support from Open
  Philanthropy, OpenAI, Amazon Research, Google Research, and Microsoft Research."
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Halil Alperen
  full_name: Gozeten, Halil Alperen
  last_name: Gozeten
- first_name: Muhammed Emrullah
  full_name: Ildiz, Muhammed Emrullah
  last_name: Ildiz
- first_name: Xuechen
  full_name: Zhang, Xuechen
  last_name: Zhang
- first_name: Mahdi
  full_name: Soltanolkotabi, Mahdi
  last_name: Soltanolkotabi
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Samet
  full_name: Oymak, Samet
  last_name: Oymak
citation:
  ama: 'Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. Test-time
    training provably improves transformers as in-context learners. In: <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research
    Press; 2025:20266-20295.'
  apa: 'Gozeten, H. A., Ildiz, M. E., Zhang, X., Soltanolkotabi, M., Mondelli, M.,
    &#38; Oymak, S. (2025). Test-time training provably improves transformers as in-context
    learners. In <i>Proceedings of the 42nd International Conference on Machine Learning</i>
    (Vol. 267, pp. 20266–20295). Vancouver, Canada: ML Research Press.'
  chicago: Gozeten, Halil Alperen, Muhammed Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi,
    Marco Mondelli, and Samet Oymak. “Test-Time Training Provably Improves Transformers
    as in-Context Learners.” In <i>Proceedings of the 42nd International Conference
    on Machine Learning</i>, 267:20266–95. ML Research Press, 2025.
  ieee: H. A. Gozeten, M. E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, and
    S. Oymak, “Test-time training provably improves transformers as in-context learners,”
    in <i>Proceedings of the 42nd International Conference on Machine Learning</i>,
    Vancouver, Canada, 2025, vol. 267, pp. 20266–20295.
  ista: 'Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. 2025.
    Test-time training provably improves transformers as in-context learners. Proceedings
    of the 42nd International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 267, 20266–20295.'
  mla: Gozeten, Halil Alperen, et al. “Test-Time Training Provably Improves Transformers
    as in-Context Learners.” <i>Proceedings of the 42nd International Conference on
    Machine Learning</i>, vol. 267, ML Research Press, 2025, pp. 20266–95.
  short: H.A. Gozeten, M.E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, S. Oymak,
    in:, Proceedings of the 42nd International Conference on Machine Learning, ML
    Research Press, 2025, pp. 20266–20295.
conference:
  end_date: 2025-07-19
  location: Vancouver, Canada
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2025-07-13
date_created: 2026-02-18T12:00:44Z
date_published: 2025-11-30T00:00:00Z
date_updated: 2026-02-19T08:18:24Z
day: '30'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  pmid:
  - '41321376'
file:
- access_level: open_access
  checksum: f774f8619a0d72f3975d9cb23942a1e9
  content_type: application/pdf
  creator: dernst
  date_created: 2026-02-19T08:15:48Z
  date_updated: 2026-02-19T08:15:48Z
  file_id: '21336'
  file_name: 2025_ICML_Gozeten.pdf
  file_size: 471176
  relation: main_file
  success: 1
file_date_updated: 2026-02-19T08:15:48Z
has_accepted_license: '1'
intvolume: '       267'
language:
- iso: eng
month: '11'
oa: 1
oa_version: Published Version
page: 20266-20295
pmid: 1
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
publication: Proceedings of the 42nd International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: Test-time training provably improves transformers as in-context learners
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 267
year: '2025'
...
---
OA_place: publisher
OA_type: gold
_id: '21326'
abstract:
- lang: eng
  text: 'Neural Collapse is a phenomenon where the last-layer representations of a
    well-trained neural network converge to a highly structured geometry. In this
    paper, we focus on its first (and most basic) property, known as NC1: the within-class
    variability vanishes. While prior theoretical studies establish the occurrence
    of NC1 via the data-agnostic unconstrained features model, our work adopts a data-specific
    perspective, analyzing NC1 in a three-layer neural network, with the first two
    layers operating in the mean-field regime and followed by a linear layer. In particular,
    we establish a fundamental connection between NC1 and the loss landscape: we prove
    that points with small empirical loss and gradient norm (thus, close to being
    stationary) approximately satisfy NC1, and the closeness to NC1 is controlled
    by the residual loss and gradient norm. We then show that (i) gradient flow on
    the mean squared error converges to NC1 solutions with small empirical loss, and
    (ii) for well-separated data distributions, both NC1 and vanishing test loss are
    achieved simultaneously. This aligns with the empirical observation that NC1 emerges
    during training while models attain near-zero test error. Overall, our results
    demonstrate that NC1 arises from gradient training due to the properties of the
    loss landscape, and they show the co-occurrence of NC1 and small test error for
    certain data distributions.'
acknowledgement: "This research was funded in whole or in part by the Austrian Science
  Fund (FWF) 10.55776/COE12. For the purpose of open access, the authors have applied
  a CC BY public\r\ncopyright license to any Author Accepted Manuscript version arising
  from this submission. The authors would like to thank Peter Sukenık for general
  helpful discussions and for pointing out that all the stationary points are approximately
  proportional in the case without entropic regularization. "
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Diyuan
  full_name: Wu, Diyuan
  id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
  last_name: Wu
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Wu D, Mondelli M. Neural collapse beyond the unconstrained features model:
    Landscape, dynamics, and generalization in the mean-field regime. In: <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research
    Press; 2025:67499-67536.'
  apa: 'Wu, D., &#38; Mondelli, M. (2025). Neural collapse beyond the unconstrained
    features model: Landscape, dynamics, and generalization in the mean-field regime.
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>
    (Vol. 267, pp. 67499–67536). Vancouver, Canada: ML Research Press.'
  chicago: 'Wu, Diyuan, and Marco Mondelli. “Neural Collapse beyond the Unconstrained
    Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime.”
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>,
    267:67499–536. ML Research Press, 2025.'
  ieee: 'D. Wu and M. Mondelli, “Neural collapse beyond the unconstrained features
    model: Landscape, dynamics, and generalization in the mean-field regime,” in <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada,
    2025, vol. 267, pp. 67499–67536.'
  ista: 'Wu D, Mondelli M. 2025. Neural collapse beyond the unconstrained features
    model: Landscape, dynamics, and generalization in the mean-field regime. Proceedings
    of the 42nd International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 267, 67499–67536.'
  mla: 'Wu, Diyuan, and Marco Mondelli. “Neural Collapse beyond the Unconstrained
    Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime.”
    <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol.
    267, ML Research Press, 2025, pp. 67499–536.'
  short: D. Wu, M. Mondelli, in:, Proceedings of the 42nd International Conference
    on Machine Learning, ML Research Press, 2025, pp. 67499–67536.
conference:
  end_date: 2025-07-19
  location: Vancouver, Canada
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2025-07-13
corr_author: '1'
date_created: 2026-02-18T12:02:45Z
date_published: 2025-07-30T00:00:00Z
date_updated: 2026-02-19T08:30:42Z
day: '30'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2501.19104'
file:
- access_level: open_access
  checksum: c5ce8b1c83e33dc3a11122f4910deb67
  content_type: application/pdf
  creator: dernst
  date_created: 2026-02-19T08:28:22Z
  date_updated: 2026-02-19T08:28:22Z
  file_id: '21337'
  file_name: 2025_ICML_Wu.pdf
  file_size: 3994385
  relation: main_file
  success: 1
file_date_updated: 2026-02-19T08:28:22Z
has_accepted_license: '1'
intvolume: '       267'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 67499-67536
publication: Proceedings of the 42nd International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: 'Neural collapse beyond the unconstrained features model: Landscape, dynamics,
  and generalization in the mean-field regime'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 267
year: '2025'
...
---
OA_place: publisher
OA_type: gold
_id: '21328'
abstract:
- lang: eng
  text: Multi-index models provide a popular framework to investigate the learnability
    of functions with low-dimensional structure and, also due to their connections
    with neural networks, they have been object of recent intensive study. In this
    paper, we focus on recovering the subspace spanned by the signals via spectral
    estimators – a family of methods routinely used in practice, often as a warm-start
    for iterative algorithms. Our main technical contribution is a precise asymptotic
    characterization of the performance of spectral methods, when sample size and
    input dimension grow proportionally and the dimension p of the space to recover
    is fixed. Specifically, we locate the top-p eigenvalues of the spectral matrix
    and establish the overlaps between the corresponding eigenvectors (which give
    the spectral estimators) and a basis of the signal subspace. Our analysis unveils
    a phase transition phenomenon in which, as the sample complexity grows, eigenvalues
    escape from the bulk of the spectrum and, when that happens, eigenvectors recover
    directions of the desired subspace. The precise characterization we put forward
    enables the optimization of the data preprocessing, thus allowing to identify
    the spectral estimator that requires the minimal sample size for weak recovery.
acknowledgement: "This work was done when Y. Z. was at the Institute of Science and
  Technology Austria. Y. Z. and\r\nM. M. are funded by the European Union (ERC, INF2,
  project number 101161364). Views and\r\nopinions expressed are however those of
  the author(s) only and do not necessarily reflect those of the European Union or
  the European Research Council Executive Agency. Neither the European Union nor the
  granting authority can be held responsible for them. The authors would like to acknowledge
  (in alphabetical order) discussions with Yatin Dandi, Leonardo Defilippis and Bruno
  Loureiro concerning their parallel work (Defilippis et al., 2025)."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Filip
  full_name: Kovačević, Filip
  id: d0258e7b-50b8-11ef-ad56-8b9f537b6b1b
  last_name: Kovačević
- first_name: Zhang
  full_name: Yihan, Zhang
  last_name: Yihan
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Kovačević F, Yihan Z, Mondelli M. Spectral estimators for multi-index models:
    Precise asymptotics and optimal weak recovery. In: <i>Proceedings of 38th Conference
    on Learning Theory</i>. Vol 291. ML Research Press; 2025:3354-3404.'
  apa: 'Kovačević, F., Yihan, Z., &#38; Mondelli, M. (2025). Spectral estimators for
    multi-index models: Precise asymptotics and optimal weak recovery. In <i>Proceedings
    of 38th Conference on Learning Theory</i> (Vol. 291, pp. 3354–3404). Lyon, France:
    ML Research Press.'
  chicago: 'Kovačević, Filip, Zhang Yihan, and Marco Mondelli. “Spectral Estimators
    for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery.” In <i>Proceedings
    of 38th Conference on Learning Theory</i>, 291:3354–3404. ML Research Press, 2025.'
  ieee: 'F. Kovačević, Z. Yihan, and M. Mondelli, “Spectral estimators for multi-index
    models: Precise asymptotics and optimal weak recovery,” in <i>Proceedings of 38th
    Conference on Learning Theory</i>, Lyon, France, 2025, vol. 291, pp. 3354–3404.'
  ista: 'Kovačević F, Yihan Z, Mondelli M. 2025. Spectral estimators for multi-index
    models: Precise asymptotics and optimal weak recovery. Proceedings of 38th Conference
    on Learning Theory. COLT: Conference on Learning Theory, PMLR, vol. 291, 3354–3404.'
  mla: 'Kovačević, Filip, et al. “Spectral Estimators for Multi-Index Models: Precise
    Asymptotics and Optimal Weak Recovery.” <i>Proceedings of 38th Conference on Learning
    Theory</i>, vol. 291, ML Research Press, 2025, pp. 3354–404.'
  short: F. Kovačević, Z. Yihan, M. Mondelli, in:, Proceedings of 38th Conference
    on Learning Theory, ML Research Press, 2025, pp. 3354–3404.
conference:
  end_date: 2025-07-04
  location: Lyon, France
  name: 'COLT: Conference on Learning Theory'
  start_date: 2025-06-30
corr_author: '1'
date_created: 2026-02-18T12:12:47Z
date_published: 2025-07-01T00:00:00Z
date_updated: 2026-02-19T09:03:53Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2502.01583'
file:
- access_level: open_access
  checksum: 19aa70ab4f57fb9067b6ebb99a5fd6f0
  content_type: application/pdf
  creator: dernst
  date_created: 2026-02-19T09:03:43Z
  date_updated: 2026-02-19T09:03:43Z
  file_id: '21339'
  file_name: 2025_LearningTheory_Kovacevic.pdf
  file_size: 844611
  relation: main_file
  success: 1
file_date_updated: 2026-02-19T09:03:43Z
has_accepted_license: '1'
intvolume: '       291'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 3354-3404
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
publication: Proceedings of 38th Conference on Learning Theory
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Spectral estimators for multi-index models: Precise asymptotics and optimal
  weak recovery'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 291
year: '2025'
...
---
APC_amount: 3272,21 EUR
DOAJ_listed: '1'
OA_place: publisher
OA_type: gold
_id: '18986'
abstract:
- lang: eng
  text: 'We consider a prototypical problem of Bayesian inference for a structured
    spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic
    and algorithmic limits are well understood when the noise is a Gaussian Wigner
    matrix, the more realistic case of structured noise still remains challenging.
    To capture the structure while maintaining mathematical tractability, a line of
    work has focused on rotationally invariant noise. However, existing studies either
    provide suboptimal algorithms or are limited to a special class of noise ensembles.
    In this paper, using tools from statistical physics (replica method) and random
    matrix theory (generalized spherical integrals) we establish the characterization
    of the information-theoretic limits for a noise matrix drawn from a general trace
    ensemble. Remarkably, our analysis unveils the asymptotic equivalence between
    the rotationally invariant model and a surrogate Gaussian one. Finally, we show
    how to saturate the predicted statistical limits using an efficient algorithm
    inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations.'
acknowledgement: J.B., F.C., and Y.X. were funded by the European Union (ERC, CHORAL,
  Project No. 101039794). Views and opinions expressed are however those of the authors
  only and do not necessarily reflect those of the European Union or the European
  Research Council. Neither the European Union nor the granting authority can be held
  responsible for them. M.M. was supported by the 2019 Lopez-Loreta Prize. J.B. acknowledges
  discussions with TianQi Hou at the initial stage of the project, as well as with
  Antoine Bodin.
article_number: '013081'
article_processing_charge: Yes
article_type: original
arxiv: 1
author:
- first_name: Jean
  full_name: Barbier, Jean
  last_name: Barbier
- first_name: Francesco
  full_name: Camilli, Francesco
  last_name: Camilli
- first_name: Yizhou
  full_name: Xu, Yizhou
  last_name: Xu
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Barbier J, Camilli F, Xu Y, Mondelli M. Information limits and Thouless-Anderson-Palmer
    equations for spiked matrix models with structured noise. <i>Physical Review Research</i>.
    2025;7. doi:<a href="https://doi.org/10.1103/PhysRevResearch.7.013081">10.1103/PhysRevResearch.7.013081</a>
  apa: Barbier, J., Camilli, F., Xu, Y., &#38; Mondelli, M. (2025). Information limits
    and Thouless-Anderson-Palmer equations for spiked matrix models with structured
    noise. <i>Physical Review Research</i>. American Physical Society. <a href="https://doi.org/10.1103/PhysRevResearch.7.013081">https://doi.org/10.1103/PhysRevResearch.7.013081</a>
  chicago: Barbier, Jean, Francesco Camilli, Yizhou Xu, and Marco Mondelli. “Information
    Limits and Thouless-Anderson-Palmer Equations for Spiked Matrix Models with Structured
    Noise.” <i>Physical Review Research</i>. American Physical Society, 2025. <a href="https://doi.org/10.1103/PhysRevResearch.7.013081">https://doi.org/10.1103/PhysRevResearch.7.013081</a>.
  ieee: J. Barbier, F. Camilli, Y. Xu, and M. Mondelli, “Information limits and Thouless-Anderson-Palmer
    equations for spiked matrix models with structured noise,” <i>Physical Review
    Research</i>, vol. 7. American Physical Society, 2025.
  ista: Barbier J, Camilli F, Xu Y, Mondelli M. 2025. Information limits and Thouless-Anderson-Palmer
    equations for spiked matrix models with structured noise. Physical Review Research.
    7, 013081.
  mla: Barbier, Jean, et al. “Information Limits and Thouless-Anderson-Palmer Equations
    for Spiked Matrix Models with Structured Noise.” <i>Physical Review Research</i>,
    vol. 7, 013081, American Physical Society, 2025, doi:<a href="https://doi.org/10.1103/PhysRevResearch.7.013081">10.1103/PhysRevResearch.7.013081</a>.
  short: J. Barbier, F. Camilli, Y. Xu, M. Mondelli, Physical Review Research 7 (2025).
date_created: 2025-02-02T23:01:54Z
date_published: 2025-01-22T00:00:00Z
date_updated: 2026-05-06T12:57:36Z
day: '22'
ddc:
- '530'
department:
- _id: MaMo
doi: 10.1103/PhysRevResearch.7.013081
external_id:
  arxiv:
  - '2405.20993'
file:
- access_level: open_access
  checksum: 52c5f72d80ffc928542469114fcdb62b
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-03T08:27:59Z
  date_updated: 2025-02-03T08:27:59Z
  file_id: '18988'
  file_name: 2025_PhysReviewResearch_Barbier.pdf
  file_size: 702543
  relation: main_file
  success: 1
file_date_updated: 2025-02-03T08:27:59Z
has_accepted_license: '1'
intvolume: '         7'
language:
- iso: eng
month: '01'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Physical Review Research
publication_identifier:
  issn:
  - 2643-1564
publication_status: published
publisher: American Physical Society
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/xu-yz19/spiked-matrix-models-with-structured-noise
scopus_import: '1'
status: public
title: Information limits and Thouless-Anderson-Palmer equations for spiked matrix
  models with structured noise
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 7
year: '2025'
...
---
APC_amount: 2754,32 EUR
OA_place: publisher
OA_type: hybrid
_id: '19627'
abstract:
- lang: eng
  text: Differentially private gradient descent (DP-GD) is a popular algorithm to
    train deep learning models with provable guarantees on the privacy of the training
    data. In the last decade, the problem of understanding its performance cost with
    respect to standard GD has received remarkable attention from the research community,
    which formally derived upper bounds on the excess population risk  RP  in different
    learning settings. However, existing bounds typically degrade with over-parameterization,
    i.e., as the number of parameters  p  gets larger than the number of training
    samples  n  -- a regime which is ubiquitous in current deep-learning practice.
    As a result, the lack of theoretical insights leaves practitioners without clear
    guidance, leading some to reduce the effective number of trainable parameters
    to improve performance, while others use larger models to achieve better results
    through scale. In this work, we show that in the popular random features model
    with quadratic loss, for any sufficiently large  p , privacy can be obtained for
    free, i.e.,  |RP|=o(1) , not only when the privacy parameter  ε  has constant
    order, but also in the strongly private setting  ε=o(1) . This challenges the
    common wisdom that over-parameterization inherently hinders performance in private
    learning.
acknowledgement: This research was funded in whole, or in part, by the Austrian Science
  Fund (FWF) Grant number COE 12. For the purpose of open access, the author has applied
  a CC BY public copyright license to any Author Accepted Manuscript version arising
  from this submission. The authors were also supported by the 2019 Lopez-Loreta prize,
  and Simone Bombari was supported by a Google PhD fellowship. We thank Diyuan Wu,
  Edwige Cyffers, Francesco Pedrotti, Inbar Seroussi, Nikita P. Kalinin, Pietro Pelliconi,
  Roodabeh Safavi, Yizhe Zhu, and Zhichao Wang for helpful discussions.
article_number: e2423072122
article_processing_charge: Yes (in subscription journal)
article_type: original
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Bombari S, Mondelli M. Privacy for free in the overparameterized regime. <i>Proceedings
    of the National Academy of Sciences</i>. 2025;122(15). doi:<a href="https://doi.org/10.1073/pnas.2423072122">10.1073/pnas.2423072122</a>
  apa: Bombari, S., &#38; Mondelli, M. (2025). Privacy for free in the overparameterized
    regime. <i>Proceedings of the National Academy of Sciences</i>. National Academy
    of Sciences. <a href="https://doi.org/10.1073/pnas.2423072122">https://doi.org/10.1073/pnas.2423072122</a>
  chicago: Bombari, Simone, and Marco Mondelli. “Privacy for Free in the Overparameterized
    Regime.” <i>Proceedings of the National Academy of Sciences</i>. National Academy
    of Sciences, 2025. <a href="https://doi.org/10.1073/pnas.2423072122">https://doi.org/10.1073/pnas.2423072122</a>.
  ieee: S. Bombari and M. Mondelli, “Privacy for free in the overparameterized regime,”
    <i>Proceedings of the National Academy of Sciences</i>, vol. 122, no. 15. National
    Academy of Sciences, 2025.
  ista: Bombari S, Mondelli M. 2025. Privacy for free in the overparameterized regime.
    Proceedings of the National Academy of Sciences. 122(15), e2423072122.
  mla: Bombari, Simone, and Marco Mondelli. “Privacy for Free in the Overparameterized
    Regime.” <i>Proceedings of the National Academy of Sciences</i>, vol. 122, no.
    15, e2423072122, National Academy of Sciences, 2025, doi:<a href="https://doi.org/10.1073/pnas.2423072122">10.1073/pnas.2423072122</a>.
  short: S. Bombari, M. Mondelli, Proceedings of the National Academy of Sciences
    122 (2025).
corr_author: '1'
date_created: 2025-04-27T22:02:13Z
date_published: 2025-04-15T00:00:00Z
date_updated: 2026-05-20T08:23:19Z
day: '15'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.1073/pnas.2423072122
external_id:
  arxiv:
  - '2410.14787'
  isi:
  - '001471214000001'
  pmid:
  - '40215275'
file:
- access_level: open_access
  checksum: 1ac6f78e368d35a0cafb4d2d9bd63443
  content_type: application/pdf
  creator: dernst
  date_created: 2025-05-05T07:27:54Z
  date_updated: 2025-05-05T07:27:54Z
  file_id: '19648'
  file_name: 2025_PNAS_Bombari.pdf
  file_size: 2328320
  relation: main_file
  success: 1
file_date_updated: 2025-05-05T07:27:54Z
has_accepted_license: '1'
intvolume: '       122'
isi: 1
issue: '15'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
pmid: 1
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
- _id: 92099302-16d5-11f0-9cad-f9a785f54fbd
  name: 'Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust
    LLMs'
publication: Proceedings of the National Academy of Sciences
publication_identifier:
  eissn:
  - 1091-6490
  issn:
  - 0027-8424
publication_status: published
publisher: National Academy of Sciences
quality_controlled: '1'
scopus_import: '1'
status: public
title: Privacy for free in the overparameterized regime
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 122
year: '2025'
...
---
OA_place: repository
OA_type: green
_id: '18890'
abstract:
- lang: eng
  text: Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the
    data representations in the final layers of Deep Neural Networks (DNNs). Though
    the phenomenon has been measured in a variety of settings, its emergence is typically
    explained via data-agnostic approaches, such as the unconstrained features model.
    In this work, we introduce a data-dependent setting where DNC forms due to feature
    learning through the average gradient outer product (AGOP). The AGOP is defined
    with respect to a learned predictor and is equal to the uncentered covariance
    matrix of its input-output gradients averaged over the training dataset. The Deep
    Recursive Feature Machine (Deep RFM) is a method that constructs a neural network
    by iteratively mapping the data with the AGOP and applying an untrained random
    feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard
    settings as a consequence of the projection with the AGOP matrix computed at each
    layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting
    and as a result of kernel learning. We then provide evidence that this mechanism
    holds for neural networks more generally. In particular, we show that the right
    singular vectors and values of the weights can be responsible for the majority
    of within-class variability collapse for DNNs trained in the feature learning
    regime. As observed in recent work, this singular structure is highly correlated
    with that of the AGOP.
acknowledgement: 'We acknowledge support from the National Science Foundation (NSF)
  and the Simons Foundation for the Collaboration on the Theoretical Foundations of
  Deep Learning through awards DMS-2031883 and #814639 as well as the TILOS institute
  (NSF CCF-2112665). This work used the programs (1) XSEDE (Extreme science and engineering
  discovery environment) which is supported by NSF grant numbers ACI-1548562, and
  (2) ACCESS (Advanced cyberinfrastructure coordination ecosystem: services & support)
  which is supported by NSF grants numbers #2138259, #2138286, #2138307, #2137603,
  and #2138296. Specifically, we used the resources from SDSC Expanse GPU compute
  nodes, and NCSA Delta system, via allocations TG-CIS220009. Marco Mondelli is supported
  by the 2019 Lopez-Loreta prize. We also acknowledge useful feedback from anonymous
  reviewers. '
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Daniel
  full_name: Beaglehole, Daniel
  last_name: Beaglehole
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Mikhail
  full_name: Belkin, Mikhail
  last_name: Belkin
citation:
  ama: 'Beaglehole D, Súkeník P, Mondelli M, Belkin M. Average gradient outer product
    as a mechanism for deep neural collapse. In: <i>38th Annual Conference on Neural
    Information Processing Systems</i>. Vol 37. Neural Information Processing Systems
    Foundation; 2024.'
  apa: 'Beaglehole, D., Súkeník, P., Mondelli, M., &#38; Belkin, M. (2024). Average
    gradient outer product as a mechanism for deep neural collapse. In <i>38th Annual
    Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver,
    Canada: Neural Information Processing Systems Foundation.'
  chicago: Beaglehole, Daniel, Peter Súkeník, Marco Mondelli, and Mikhail Belkin.
    “Average Gradient Outer Product as a Mechanism for Deep Neural Collapse.” In <i>38th
    Annual Conference on Neural Information Processing Systems</i>, Vol. 37. Neural
    Information Processing Systems Foundation, 2024.
  ieee: D. Beaglehole, P. Súkeník, M. Mondelli, and M. Belkin, “Average gradient outer
    product as a mechanism for deep neural collapse,” in <i>38th Annual Conference
    on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Beaglehole D, Súkeník P, Mondelli M, Belkin M. 2024. Average gradient outer
    product as a mechanism for deep neural collapse. 38th Annual Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    Advances in Neural Information Processing Systems, vol. 37.'
  mla: Beaglehole, Daniel, et al. “Average Gradient Outer Product as a Mechanism for
    Deep Neural Collapse.” <i>38th Annual Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: D. Beaglehole, P. Súkeník, M. Mondelli, M. Belkin, in:, 38th Annual Conference
    on Neural Information Processing Systems, Neural Information Processing Systems
    Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-27T11:11:40Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-05-14T11:29:45Z
day: '01'
department:
- _id: GradSch
- _id: MaMo
external_id:
  arxiv:
  - '2402.13728'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/forum?id=lJ1jdl2K9k
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 38th Annual Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Average gradient outer product as a mechanism for deep neural collapse
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: publisher
OA_type: gold
_id: '18891'
abstract:
- lang: eng
  text: "Deep neural networks (DNNs) exhibit a surprising structure in their final
    layer\r\nknown as neural collapse (NC), and a growing body of works has currently
    investigated the propagation of neural collapse to earlier layers of DNNs – a
    phenomenon\r\ncalled deep neural collapse (DNC). However, existing theoretical
    results are restricted to special cases: linear models, only two layers or binary
    classification.\r\nIn contrast, we focus on non-linear models of arbitrary depth
    in multi-class classification and reveal a surprising qualitative shift. As soon
    as we go beyond two\r\nlayers or two classes, DNC stops being optimal for the
    deep unconstrained features\r\nmodel (DUFM) – the standard theoretical framework
    for the analysis of collapse.\r\nThe main culprit is a low-rank bias of multi-layer
    regularization schemes: this bias\r\nleads to optimal solutions of even lower
    rank than the neural collapse. We support\r\nour theoretical findings with experiments
    on both DUFM and real data, which show\r\nthe emergence of the low-rank structure
    in the solution found by gradient descent."
acknowledged_ssus:
- _id: ScienComp
acknowledgement: Marco Mondelli is partially supported by the 2019 Lopez-Loreta prize.
  This research was supported by the Scientific Service Units (SSU) of ISTA through
  resources provided by Scientific Computing (SciComp).
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Súkeník P, Lampert C, Mondelli M. Neural collapse versus low-rank bias: Is
    deep neural collapse really optimal? In: <i>38th Annual Conference on Neural Information
    Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation;
    2024.'
  apa: 'Súkeník, P., Lampert, C., &#38; Mondelli, M. (2024). Neural collapse versus
    low-rank bias: Is deep neural collapse really optimal? In <i>38th Annual Conference
    on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural
    Information Processing Systems Foundation.'
  chicago: 'Súkeník, Peter, Christoph Lampert, and Marco Mondelli. “Neural Collapse
    versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal?” In <i>38th Annual
    Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information
    Processing Systems Foundation, 2024.'
  ieee: 'P. Súkeník, C. Lampert, and M. Mondelli, “Neural collapse versus low-rank
    bias: Is deep neural collapse really optimal?,” in <i>38th Annual Conference on
    Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.'
  ista: 'Súkeník P, Lampert C, Mondelli M. 2024. Neural collapse versus low-rank bias:
    Is deep neural collapse really optimal? 38th Annual Conference on Neural Information
    Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in
    Neural Information Processing Systems, vol. 37.'
  mla: 'Súkeník, Peter, et al. “Neural Collapse versus Low-Rank Bias: Is Deep Neural
    Collapse Really Optimal?” <i>38th Annual Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.'
  short: P. Súkeník, C. Lampert, M. Mondelli, in:, 38th Annual Conference on Neural
    Information Processing Systems, Neural Information Processing Systems Foundation,
    2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-27T11:15:18Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-06-04T07:19:21Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: MaMo
- _id: ChLa
external_id:
  arxiv:
  - '2405.14468'
file:
- access_level: open_access
  checksum: b7b79f1ea3ac1e9e11b3d91faaeb0780
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-04T08:11:25Z
  date_updated: 2025-02-04T08:11:25Z
  file_id: '18989'
  file_name: 2024_NeurIPS_Sukenik.pdf
  file_size: 1784118
  relation: main_file
  success: 1
file_date_updated: 2025-02-04T08:11:25Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 38th Annual Conference on Neural Information Processing Systems
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
status: public
title: 'Neural collapse versus low-rank bias: Is deep neural collapse really optimal?'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: publisher
OA_type: gold
_id: '18897'
abstract:
- lang: eng
  text: 'Score-based generative models (SGMs) are powerful tools to sample from complex
    data distributions. Their underlying idea is to (i) run a forward process for
    time T1 by adding noise to the data, (ii) estimate its score function, and (iii)
    use such estimate to run a reverse process. As the reverse process is initialized
    with the stationary distribution of the forward one, the existing analysis paradigm
    requires T1→∞. This is however problematic: from a theoretical viewpoint, for
    a given precision of the score approximation, the convergence guarantee fails
    as T1 diverges; from a practical viewpoint, a large T1 increases computational
    costs and leads to error propagation. This paper addresses the issue by considering
    a version of the popular predictor-corrector scheme: after running the forward
    process, we first estimate the final distribution via an inexact Langevin dynamics
    and then revert the process. Our key technical contribution is to provide convergence
    guarantees which require to run the forward process only for a fixed finite time
    T1. Our bounds exhibit a mild logarithmic dependence on the input dimension and
    the subgaussian norm of the target distribution, have minimal assumptions on the
    data, and require only to control the L2 loss on the score approximation, which
    is the quantity minimized in practice.'
acknowledgement: "Francesco Pedrotti and Jan Maas acknowledge support by the Austrian
  Science Fund (FWF) project 10.55776/F65. Marco Mondelli acknowledges support by
  the 2019 Lopez-Loreta prize.\r\n"
alternative_title:
- TMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Francesco
  full_name: Pedrotti, Francesco
  id: d3ac8ac6-dc8d-11ea-abe3-e2a9628c4c3c
  last_name: Pedrotti
- first_name: Jan
  full_name: Maas, Jan
  id: 4C5696CE-F248-11E8-B48F-1D18A9856A87
  last_name: Maas
  orcid: 0000-0002-0845-1338
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion
    models via prediction-correction. In: <i>Transactions on Machine Learning Research</i>.
    ; 2024.'
  apa: Pedrotti, F., Maas, J., &#38; Mondelli, M. (2024). Improved convergence of
    score-based diffusion models via prediction-correction. In <i>Transactions on
    Machine Learning Research</i>.
  chicago: Pedrotti, Francesco, Jan Maas, and Marco Mondelli. “Improved Convergence
    of Score-Based Diffusion Models via Prediction-Correction.” In <i>Transactions
    on Machine Learning Research</i>, 2024.
  ieee: F. Pedrotti, J. Maas, and M. Mondelli, “Improved convergence of score-based
    diffusion models via prediction-correction,” in <i>Transactions on Machine Learning
    Research</i>, 2024.
  ista: Pedrotti F, Maas J, Mondelli M. 2024. Improved convergence of score-based
    diffusion models via prediction-correction. Transactions on Machine Learning Research.
    , TMLR, .
  mla: Pedrotti, Francesco, et al. “Improved Convergence of Score-Based Diffusion
    Models via Prediction-Correction.” <i>Transactions on Machine Learning Research</i>,
    2024.
  short: F. Pedrotti, J. Maas, M. Mondelli, in:, Transactions on Machine Learning
    Research, 2024.
corr_author: '1'
date_created: 2025-01-27T12:18:05Z
date_published: 2024-06-01T00:00:00Z
date_updated: 2025-04-15T08:31:35Z
day: '01'
ddc:
- '000'
department:
- _id: JaMa
- _id: MaMo
external_id:
  arxiv:
  - '2305.14164'
file:
- access_level: open_access
  checksum: 76a1fd5afd8ee6f7ae0e5912d7dbf6b4
  content_type: application/pdf
  creator: dernst
  date_created: 2025-01-27T12:19:44Z
  date_updated: 2025-01-27T12:19:44Z
  file_id: '18898'
  file_name: 2024_TMLR_Pedrotti.pdf
  file_size: 780315
  relation: main_file
  success: 1
file_date_updated: 2025-01-27T12:19:44Z
has_accepted_license: '1'
language:
- iso: eng
month: '06'
oa: 1
oa_version: Published Version
project:
- _id: fc31cba2-9c52-11eb-aca3-ff467d239cd2
  grant_number: F6504
  name: Taming Complexity in Partial Differential Systems
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Transactions on Machine Learning Research
publication_identifier:
  issn:
  - 2835-8856
publication_status: published
quality_controlled: '1'
related_material:
  record:
  - id: '17350'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: Improved convergence of score-based diffusion models via prediction-correction
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18972'
abstract:
- lang: eng
  text: 'Deep learning models are known to overfit and memorize spurious features
    in the training dataset. While numerous empirical studies have aimed at understanding
    this phenomenon, a rigorous theoretical framework to quantify it is still missing.
    In this paper, we consider spurious features that are uncorrelated with the learning
    task, and we provide a precise characterization of how they are memorized via
    two separate terms: (i) the stability of the model with respect to individual
    training samples, and (ii) the feature alignment between the spurious pattern
    and the full sample. While the first term is well established in learning theory
    and it is connected to the generalization error in classical work, the second
    one is, to the best of our knowledge, novel. Our key technical result gives a
    precise characterization of the feature alignment for the two prototypical settings
    of random features (RF) and neural tangent kernel (NTK) regression. We prove that
    the memorization of spurious features weakens as the generalization capability
    increases and, through the analysis of the feature alignment, we unveil the role
    of the model and of its activation function. Numerical experiments show the predictive
    power of our theory on standard datasets (MNIST, CIFAR-10).'
acknowledgement: "The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank (in alphabetical order) Grigorios Chrysos, Simone Maria
  Giancola, Mahyar\r\nJafari Nodeh, Christoph Lampert, Marco Miani, GuanWen Qiu, and
  Peter Sukenık for helpful discussions."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. How spurious features are memorized: Precise analysis
    for random and NTK features. In: <i>41st International Conference on Machine Learning</i>.
    Vol 235. ML Research Press; 2024:4267-4299.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). How spurious features are memorized:
    Precise analysis for random and NTK features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4267–4299). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4267–99. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “How spurious features are memorized: Precise
    analysis for random and NTK features,” in <i>41st International Conference on
    Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4267–4299.'
  ista: 'Bombari S, Mondelli M. 2024. How spurious features are memorized: Precise
    analysis for random and NTK features. 41st International Conference on Machine
    Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235,
    4267–4299.'
  mla: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4267–99.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4267–4299.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:29:47Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2305.12100'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2305.12100
month: '07'
oa: 1
oa_version: Preprint
page: 4267-4299
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How spurious features are memorized: Precise analysis for random and NTK features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18973'
abstract:
- lang: eng
  text: 'Understanding the reasons behind the exceptional success of transformers
    requires a better analysis of why attention layers are suitable for NLP tasks.
    In particular, such tasks require predictive models to capture contextual meaning
    which often depends on one or few words, even if the sentence is long. Our work
    studies this key property, dubbed word sensitivity (WS), in the prototypical setting
    of random features. We show that attention layers enjoy high WS, namely, there
    exists a vector in the space of embeddings that largely perturbs the random attention
    features map. The argument critically exploits the role of the softmax in the
    attention layer, highlighting its benefit compared to other activations (e.g.,
    ReLU). In contrast, the WS of standard random features is of order 1/n−−√, n being
    the number of words in the textual sample, and thus it decays with the length
    of the context. We then translate these results on the word sensitivity into generalization
    bounds: due to their low WS, random features provably cannot learn to distinguish
    between two sentences that differ only in a single word; in contrast, due to their
    high WS, random attention features have higher generalization capabilities. We
    validate our theoretical results with experimental evidence over the BERT-Base
    word embeddings of the imdb review dataset.'
acknowledgement: The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank Mohammad Hossein Amani, Lorenzo Beretta, and Clement
  Rebuffel for helpful discussions.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. Towards understanding the word sensitivity of attention
    layers: A study via random features. In: <i>41st International Conference on Machine
    Learning</i>. Vol 235. ML Research Press; 2024:4300-4328.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). Towards understanding the word sensitivity
    of attention layers: A study via random features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4300–4328). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4300–4328. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “Towards understanding the word sensitivity of
    attention layers: A study via random features,” in <i>41st International Conference
    on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4300–4328.'
  ista: 'Bombari S, Mondelli M. 2024. Towards understanding the word sensitivity of
    attention layers: A study via random features. 41st International Conference on
    Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol.
    235, 4300–4328.'
  mla: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4300–28.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4300–4328.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:35:49Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2402.02969'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2402.02969
month: '07'
oa: 1
oa_version: Preprint
page: 4300-4328
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Towards understanding the word sensitivity of attention layers: A study via
  random features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
_id: '15172'
abstract:
- lang: eng
  text: 'We propose a novel approach to concentration for non-independent random variables.
    The main idea is to “pretend” that the random variables are independent and pay
    a multiplicative price measuring how far they are from actually being independent.
    This price is encapsulated in the Hellinger integral between the joint and the
    product of the marginals, which is then upper bounded leveraging tensorisation
    properties. Our bounds represent a natural generalisation of concentration inequalities
    in the presence of dependence: we recover exactly the classical bounds (McDiarmid’s
    inequality) when the random variables are independent. Furthermore, in a “large
    deviations” regime, we obtain the same decay in the probability as for the independent
    case, even when the random variables display non-trivial dependencies. To show
    this, we consider a number of applications of interest. First, we provide a bound
    for Markov chains with finite state space. Then, we consider the Simple Symmetric
    Random Walk, which is a non-contracting Markov chain, and a non-Markovian setting
    in which the stochastic process depends on its entire past. To conclude, we propose
    an application to Markov Chain Monte Carlo methods, where our approach leads to
    an improved lower bound on the minimum burn-in period required to reach a certain
    accuracy. In all of these settings, we provide a regime of parameters in which
    our bound fares better than what the state of the art can provide.'
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Amedeo Roberto
  full_name: Esposito, Amedeo Roberto
  id: 9583e921-e1ad-11ec-9862-cef099626dc9
  last_name: Esposito
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Esposito AR, Mondelli M. Concentration without independence via information
    measures. <i>IEEE Transactions on Information Theory</i>. 2024;70(6):3823-3839.
    doi:<a href="https://doi.org/10.1109/TIT.2024.3367767">10.1109/TIT.2024.3367767</a>
  apa: Esposito, A. R., &#38; Mondelli, M. (2024). Concentration without independence
    via information measures. <i>IEEE Transactions on Information Theory</i>. IEEE.
    <a href="https://doi.org/10.1109/TIT.2024.3367767">https://doi.org/10.1109/TIT.2024.3367767</a>
  chicago: Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence
    via Information Measures.” <i>IEEE Transactions on Information Theory</i>. IEEE,
    2024. <a href="https://doi.org/10.1109/TIT.2024.3367767">https://doi.org/10.1109/TIT.2024.3367767</a>.
  ieee: A. R. Esposito and M. Mondelli, “Concentration without independence via information
    measures,” <i>IEEE Transactions on Information Theory</i>, vol. 70, no. 6. IEEE,
    pp. 3823–3839, 2024.
  ista: Esposito AR, Mondelli M. 2024. Concentration without independence via information
    measures. IEEE Transactions on Information Theory. 70(6), 3823–3839.
  mla: Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence
    via Information Measures.” <i>IEEE Transactions on Information Theory</i>, vol.
    70, no. 6, IEEE, 2024, pp. 3823–39, doi:<a href="https://doi.org/10.1109/TIT.2024.3367767">10.1109/TIT.2024.3367767</a>.
  short: A.R. Esposito, M. Mondelli, IEEE Transactions on Information Theory 70 (2024)
    3823–3839.
corr_author: '1'
date_created: 2024-03-24T23:01:00Z
date_published: 2024-06-01T00:00:00Z
date_updated: 2025-09-04T13:06:53Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/TIT.2024.3367767
external_id:
  arxiv:
  - '2303.07245'
  isi:
  - '001230181100001'
intvolume: '        70'
isi: 1
issue: '6'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2303.07245
month: '06'
oa: 1
oa_version: Preprint
page: 3823-3839
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: IEEE Transactions on Information Theory
publication_identifier:
  eissn:
  - 1557-9654
  issn:
  - 0018-9448
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  record:
  - id: '14922'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: Concentration without independence via information measures
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
volume: 70
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '17147'
abstract:
- lang: eng
  text: Efficient utilization of large-scale biobank data is crucial for inferring
    the genetic basis of disease and predicting health outcomes from the DNA. Yet
    we lack efficient, accurate methods that scale to data where electronic health
    records are linked to whole genome sequence information. To address this issue,
    our paper develops a new algorithmic paradigm based on Approximate Message Passing
    (AMP), which is specifically tailored for genomic prediction and association testing.
    Our method yields comparable out-of-sample prediction accuracy to the state of
    the art on UK Biobank traits, whilst dramatically improving computational complexity,
    with a 8x-speed up in the run time. In addition, AMP theory provides a joint association
    testing framework, which outperforms the currently used REGENIE method, in roughly
    a third of the compute time. This first, truly large-scale application of the
    AMP framework lays the foundations for a far wider range of statistical analyses
    for hundreds of millions of variables measured on millions of people.
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "This work was supported by a Lopez-Loreta Prize to MM, an SNSF Eccellenza
  Grant to MRR (PCEGP3-181181), and core funding from ISTA. The authors thank Philip
  Schniter, Matthew Stephens and Pragya Sur for valuable suggestions on an early version
  of the work. The authors acknowledge the participants and investigators of the UK
  Biobank study. High-performance\r\ncomputing was supported by the Scientific Service
  Units (SSU) of IST Austria through resources provided by Scientific Computing (SciComp)."
article_processing_charge: No
author:
- first_name: Al
  full_name: Depope, Al
  id: 0b77531d-dbcd-11ea-9d1d-a8eee0bf3830
  last_name: Depope
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Matthew Richard
  full_name: Robinson, Matthew Richard
  id: E5D42276-F5DA-11E9-8E24-6303E6697425
  last_name: Robinson
  orcid: 0000-0001-8982-8813
citation:
  ama: 'Depope A, Mondelli M, Robinson MR. Inference of genetic effects via approximate
    message passing. In: <i>2024 IEEE International Conference on Acoustics, Speech,
    and Signal Processing</i>. IEEE; 2024:13151-13155. doi:<a href="https://doi.org/10.1109/ICASSP48485.2024.10447198">10.1109/ICASSP48485.2024.10447198</a>'
  apa: 'Depope, A., Mondelli, M., &#38; Robinson, M. R. (2024). Inference of genetic
    effects via approximate message passing. In <i>2024 IEEE International Conference
    on Acoustics, Speech, and Signal Processing</i> (pp. 13151–13155). Seoul, Korea:
    IEEE. <a href="https://doi.org/10.1109/ICASSP48485.2024.10447198">https://doi.org/10.1109/ICASSP48485.2024.10447198</a>'
  chicago: Depope, Al, Marco Mondelli, and Matthew Richard Robinson. “Inference of
    Genetic Effects via Approximate Message Passing.” In <i>2024 IEEE International
    Conference on Acoustics, Speech, and Signal Processing</i>, 13151–55. IEEE, 2024.
    <a href="https://doi.org/10.1109/ICASSP48485.2024.10447198">https://doi.org/10.1109/ICASSP48485.2024.10447198</a>.
  ieee: A. Depope, M. Mondelli, and M. R. Robinson, “Inference of genetic effects
    via approximate message passing,” in <i>2024 IEEE International Conference on
    Acoustics, Speech, and Signal Processing</i>, Seoul, Korea, 2024, pp. 13151–13155.
  ista: 'Depope A, Mondelli M, Robinson MR. 2024. Inference of genetic effects via
    approximate message passing. 2024 IEEE International Conference on Acoustics,
    Speech, and Signal Processing. ICASSP: International Conference on Acoustics,
    Speech and Signal Processing, 13151–13155.'
  mla: Depope, Al, et al. “Inference of Genetic Effects via Approximate Message Passing.”
    <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing</i>,
    IEEE, 2024, pp. 13151–55, doi:<a href="https://doi.org/10.1109/ICASSP48485.2024.10447198">10.1109/ICASSP48485.2024.10447198</a>.
  short: A. Depope, M. Mondelli, M.R. Robinson, in:, 2024 IEEE International Conference
    on Acoustics, Speech, and Signal Processing, IEEE, 2024, pp. 13151–13155.
conference:
  end_date: 2024-04-19
  location: Seoul, Korea
  name: 'ICASSP: International Conference on Acoustics, Speech and Signal Processing'
  start_date: 2024-04-14
corr_author: '1'
date_created: 2024-06-16T22:01:07Z
date_published: 2024-04-19T00:00:00Z
date_updated: 2025-11-05T07:21:31Z
day: '19'
department:
- _id: MaMo
- _id: MaRo
doi: 10.1109/ICASSP48485.2024.10447198
external_id:
  isi:
  - '001396233806078'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/forum?id=aQYCDxfZV0
month: '04'
oa: 1
oa_version: Submitted Version
page: 13151-13155
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
- _id: 9B8D11D6-BA93-11EA-9121-9846C619BF3A
  grant_number: PCEGP3_181181
  name: Improving estimation and prediction of common complex disease risk
publication: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
publication_identifier:
  isbn:
  - '9798350344851'
  issn:
  - 1520-6149
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: Inference of genetic effects via approximate message passing
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: repository
_id: '17350'
abstract:
- lang: eng
  text: "Score-based generative models (SGMs) are powerful tools to sample from\r\ncomplex
    data distributions. Their underlying idea is to (i) run a forward\r\nprocess for
    time $T_1$ by adding noise to the data, (ii) estimate its score\r\nfunction, and
    (iii) use such estimate to run a reverse process. As the reverse\r\nprocess is
    initialized with the stationary distribution of the forward one, the\r\nexisting
    analysis paradigm requires $T_1\\to\\infty$. This is however\r\nproblematic: from
    a theoretical viewpoint, for a given precision of the score\r\napproximation,
    the convergence guarantee fails as $T_1$ diverges; from a\r\npractical viewpoint,
    a large $T_1$ increases computational costs and leads to\r\nerror propagation.
    This paper addresses the issue by considering a version of\r\nthe popular predictor-corrector
    scheme: after running the forward process, we\r\nfirst estimate the final distribution
    via an inexact Langevin dynamics and then\r\nrevert the process. Our key technical
    contribution is to provide convergence\r\nguarantees which require to run the
    forward process only for a fixed finite\r\ntime $T_1$. Our bounds exhibit a mild
    logarithmic dependence on the input\r\ndimension and the subgaussian norm of the
    target distribution, have minimal\r\nassumptions on the data, and require only
    to control the $L^2$ loss on the\r\nscore approximation, which is the quantity
    minimized in practice."
article_processing_charge: No
arxiv: 1
author:
- first_name: Francesco
  full_name: Pedrotti, Francesco
  id: d3ac8ac6-dc8d-11ea-abe3-e2a9628c4c3c
  last_name: Pedrotti
- first_name: Jan
  full_name: Maas, Jan
  id: 4C5696CE-F248-11E8-B48F-1D18A9856A87
  last_name: Maas
  orcid: 0000-0002-0845-1338
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion
    models via prediction-correction. <i>arXiv</i>. doi:<a href="https://doi.org/10.48550/arXiv.2305.14164">10.48550/arXiv.2305.14164</a>
  apa: Pedrotti, F., Maas, J., &#38; Mondelli, M. (n.d.). Improved convergence of
    score-based diffusion models via prediction-correction. <i>arXiv</i>. <a href="https://doi.org/10.48550/arXiv.2305.14164">https://doi.org/10.48550/arXiv.2305.14164</a>
  chicago: Pedrotti, Francesco, Jan Maas, and Marco Mondelli. “Improved Convergence
    of Score-Based Diffusion Models via Prediction-Correction.” <i>ArXiv</i>, n.d.
    <a href="https://doi.org/10.48550/arXiv.2305.14164">https://doi.org/10.48550/arXiv.2305.14164</a>.
  ieee: F. Pedrotti, J. Maas, and M. Mondelli, “Improved convergence of score-based
    diffusion models via prediction-correction,” <i>arXiv</i>. .
  ista: Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion
    models via prediction-correction. arXiv, <a href="https://doi.org/10.48550/arXiv.2305.14164">10.48550/arXiv.2305.14164</a>.
  mla: Pedrotti, Francesco, et al. “Improved Convergence of Score-Based Diffusion
    Models via Prediction-Correction.” <i>ArXiv</i>, doi:<a href="https://doi.org/10.48550/arXiv.2305.14164">10.48550/arXiv.2305.14164</a>.
  short: F. Pedrotti, J. Maas, M. Mondelli, ArXiv (n.d.).
corr_author: '1'
date_created: 2024-07-31T07:56:40Z
date_published: 2024-06-06T00:00:00Z
date_updated: 2026-04-07T13:00:02Z
day: '06'
department:
- _id: JaMa
- _id: MaMo
doi: 10.48550/arXiv.2305.14164
external_id:
  arxiv:
  - '2305.14164'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2305.14164
month: '06'
oa: 1
oa_version: Preprint
project:
- _id: fc31cba2-9c52-11eb-aca3-ff467d239cd2
  grant_number: F6504
  name: Taming Complexity in Partial Differential Systems
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: arXiv
publication_status: draft
related_material:
  record:
  - id: '18897'
    relation: later_version
    status: public
  - id: '17336'
    relation: dissertation_contains
    status: public
status: public
title: Improved convergence of score-based diffusion models via prediction-correction
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
_id: '17469'
abstract:
- lang: eng
  text: 'Autoencoders are a prominent model in many empirical branches of machine
    learning and lossy data compression. However, basic theoretical questions remain
    unanswered even in a shallow two-layer setting. In particular, to what degree
    does a shallow autoencoder capture the structure of the underlying data distribution?
    For the prototypical case of the 1-bit compression of sparse Gaussian data, we
    prove that gradient descent converges to a solution that completely disregards
    the sparse structure of the input. Namely, the performance of the algorithm is
    the same as if it was compressing a Gaussian source - with no sparsity. For general
    data distributions, we give evidence of a phase transition phenomenon in the shape
    of the gradient descent minimizer, as a function of the data sparsity: below the
    critical sparsity level, the minimizer is a rotation taken uniformly at random
    (just like in the compression of non-sparse data); above the critical sparsity,
    the minimizer is the identity (up to a permutation). Finally, by exploiting a
    connection with approximate message passing algorithms, we show how to improve
    upon Gaussian performance for the compression of sparse data: adding a denoising
    function to a shallow architecture already reduces the loss provably, and a suitable
    multi-layer decoder leads to a further improvement. We validate our findings on
    image datasets, such as CIFAR-10 and MNIST.'
acknowledgement: "Kevin Kogler, Alexander Shevchenko and Marco Mondelli are supported
  by the 2019 Lopez-Loreta Prize. Hamed\r\nHassani acknowledges the support by the
  NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data
  Science (EnCORE)."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Kevin
  full_name: Kögler, Kevin
  id: 94ec913c-dc85-11ea-9058-e5051ab2428b
  last_name: Kögler
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Hamed
  full_name: Hassani, Hamed
  last_name: Hassani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Kögler K, Shevchenko A, Hassani H, Mondelli M. Compression of structured data
    with autoencoders: Provable benefit of nonlinearities and depth. In: <i>Proceedings
    of the 41st International Conference on Machine Learning</i>. Vol 235. ML Research
    Press; 2024:24964-25015.'
  apa: 'Kögler, K., Shevchenko, A., Hassani, H., &#38; Mondelli, M. (2024). Compression
    of structured data with autoencoders: Provable benefit of nonlinearities and depth.
    In <i>Proceedings of the 41st International Conference on Machine Learning</i>
    (Vol. 235, pp. 24964–25015). Vienna, Austria: ML Research Press.'
  chicago: 'Kögler, Kevin, Alexander Shevchenko, Hamed Hassani, and Marco Mondelli.
    “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities
    and Depth.” In <i>Proceedings of the 41st International Conference on Machine
    Learning</i>, 235:24964–15. ML Research Press, 2024.'
  ieee: 'K. Kögler, A. Shevchenko, H. Hassani, and M. Mondelli, “Compression of structured
    data with autoencoders: Provable benefit of nonlinearities and depth,” in <i>Proceedings
    of the 41st International Conference on Machine Learning</i>, Vienna, Austria,
    2024, vol. 235, pp. 24964–25015.'
  ista: 'Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured
    data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings
    of the 41st International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 235, 24964–25015.'
  mla: 'Kögler, Kevin, et al. “Compression of Structured Data with Autoencoders: Provable
    Benefit of Nonlinearities and Depth.” <i>Proceedings of the 41st International
    Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 24964–5015.'
  short: K. Kögler, A. Shevchenko, H. Hassani, M. Mondelli, in:, Proceedings of the
    41st International Conference on Machine Learning, ML Research Press, 2024, pp.
    24964–25015.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2024-08-29T11:47:57Z
date_published: 2024-07-01T00:00:00Z
date_updated: 2026-06-07T22:30:05Z
day: '01'
department:
- _id: DaAl
- _id: MaMo
external_id:
  arxiv:
  - '2402.05013'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.mlr.press/v235/kogler24a.html
month: '07'
oa: 1
oa_version: Published Version
page: 24964-25015
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 41st International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17465'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: 'Compression of structured data with autoencoders: Provable benefit of nonlinearities
  and depth'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...