---
OA_place: publisher
OA_type: gold
_id: '21326'
abstract:
- lang: eng
  text: 'Neural Collapse is a phenomenon where the last-layer representations of a
    well-trained neural network converge to a highly structured geometry. In this
    paper, we focus on its first (and most basic) property, known as NC1: the within-class
    variability vanishes. While prior theoretical studies establish the occurrence
    of NC1 via the data-agnostic unconstrained features model, our work adopts a data-specific
    perspective, analyzing NC1 in a three-layer neural network, with the first two
    layers operating in the mean-field regime and followed by a linear layer. In particular,
    we establish a fundamental connection between NC1 and the loss landscape: we prove
    that points with small empirical loss and gradient norm (thus, close to being
    stationary) approximately satisfy NC1, and the closeness to NC1 is controlled
    by the residual loss and gradient norm. We then show that (i) gradient flow on
    the mean squared error converges to NC1 solutions with small empirical loss, and
    (ii) for well-separated data distributions, both NC1 and vanishing test loss are
    achieved simultaneously. This aligns with the empirical observation that NC1 emerges
    during training while models attain near-zero test error. Overall, our results
    demonstrate that NC1 arises from gradient training due to the properties of the
    loss landscape, and they show the co-occurrence of NC1 and small test error for
    certain data distributions.'
acknowledgement: "This research was funded in whole or in part by the Austrian Science
  Fund (FWF) 10.55776/COE12. For the purpose of open access, the authors have applied
  a CC BY public\r\ncopyright license to any Author Accepted Manuscript version arising
  from this submission. The authors would like to thank Peter Sukenık for general
  helpful discussions and for pointing out that all the stationary points are approximately
  proportional in the case without entropic regularization. "
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Diyuan
  full_name: Wu, Diyuan
  id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
  last_name: Wu
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Wu D, Mondelli M. Neural collapse beyond the unconstrained features model:
    Landscape, dynamics, and generalization in the mean-field regime. In: <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research
    Press; 2025:67499-67536.'
  apa: 'Wu, D., &#38; Mondelli, M. (2025). Neural collapse beyond the unconstrained
    features model: Landscape, dynamics, and generalization in the mean-field regime.
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>
    (Vol. 267, pp. 67499–67536). Vancouver, Canada: ML Research Press.'
  chicago: 'Wu, Diyuan, and Marco Mondelli. “Neural Collapse beyond the Unconstrained
    Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime.”
    In <i>Proceedings of the 42nd International Conference on Machine Learning</i>,
    267:67499–536. ML Research Press, 2025.'
  ieee: 'D. Wu and M. Mondelli, “Neural collapse beyond the unconstrained features
    model: Landscape, dynamics, and generalization in the mean-field regime,” in <i>Proceedings
    of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada,
    2025, vol. 267, pp. 67499–67536.'
  ista: 'Wu D, Mondelli M. 2025. Neural collapse beyond the unconstrained features
    model: Landscape, dynamics, and generalization in the mean-field regime. Proceedings
    of the 42nd International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 267, 67499–67536.'
  mla: 'Wu, Diyuan, and Marco Mondelli. “Neural Collapse beyond the Unconstrained
    Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime.”
    <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol.
    267, ML Research Press, 2025, pp. 67499–536.'
  short: D. Wu, M. Mondelli, in:, Proceedings of the 42nd International Conference
    on Machine Learning, ML Research Press, 2025, pp. 67499–67536.
conference:
  end_date: 2025-07-19
  location: Vancouver, Canada
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2025-07-13
corr_author: '1'
date_created: 2026-02-18T12:02:45Z
date_published: 2025-07-30T00:00:00Z
date_updated: 2026-02-19T08:30:42Z
day: '30'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2501.19104'
file:
- access_level: open_access
  checksum: c5ce8b1c83e33dc3a11122f4910deb67
  content_type: application/pdf
  creator: dernst
  date_created: 2026-02-19T08:28:22Z
  date_updated: 2026-02-19T08:28:22Z
  file_id: '21337'
  file_name: 2025_ICML_Wu.pdf
  file_size: 3994385
  relation: main_file
  success: 1
file_date_updated: 2026-02-19T08:28:22Z
has_accepted_license: '1'
intvolume: '       267'
language:
- iso: eng
license: https://creativecommons.org/licenses/by/4.0/
month: '07'
oa: 1
oa_version: Published Version
page: 67499-67536
publication: Proceedings of the 42nd International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: 'Neural collapse beyond the unconstrained features model: Landscape, dynamics,
  and generalization in the mean-field regime'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 267
year: '2025'
...
---
OA_place: repository
OA_type: green
_id: '19518'
abstract:
- lang: eng
  text: "The rising footprint of machine learning has led to a focus on imposing model\r\nsparsity
    as a means of reducing computational and memory costs. For deep neural\r\nnetworks
    (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics\r\ninspired
    by the classical Optimal Brain Surgeon (OBS) framework [LeCun et al.,\r\n1989,
    Hassibi and Stork, 1992, Hassibi et al., 1993], which leverages loss curvature\r\ninformation
    to make better pruning decisions. Yet, these results still lack a solid\r\ntheoretical
    understanding, and it is unclear whether they can be improved by\r\nleveraging
    connections to the wealth of work on sparse recovery algorithms. In this\r\npaper,
    we draw new connections between these two areas and present new sparse\r\nrecovery
    algorithms inspired by the OBS framework that comes with theoretical\r\nguarantees
    under reasonable assumptions and have strong practical performance.\r\nSpecifically,
    our work starts from the observation that we can leverage curvature\r\ninformation
    in OBS-like fashion upon the projection step of classic iterative sparse\r\nrecovery
    algorithms such as IHT. We show for the first time that this leads both\r\nto
    improved convergence bounds under standard assumptions. Furthermore, we\r\npresent
    extensions of this approach to the practical task of obtaining accurate sparse\r\nDNNs,
    and validate it experimentally at scale for Transformer-based models on\r\nvision
    and language tasks."
acknowledged_ssus:
- _id: CampIT
acknowledgement: The authors thank the anonymous NeurIPS reviewers for their useful
  comments and feedback, the IT department from the Institute of Science and Technology
  Austria for the hardware support, and Weights and Biases for the infrastructure
  to track all our experiments. Mher Safaryan has received funding from the European
  Union’s Horizon 2020 research and innovation program under the Maria Skłodowska-Curie
  grant agreement No 101034413.
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Diyuan
  full_name: Wu, Diyuan
  id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
  last_name: Wu
- first_name: Ionut-Vlad
  full_name: Modoranu, Ionut-Vlad
  id: 449f7a18-f128-11eb-9611-9b430c0c6333
  last_name: Modoranu
- first_name: Mher
  full_name: Safaryan, Mher
  id: dd546b39-0804-11ed-9c55-ef075c39778d
  last_name: Safaryan
- first_name: Denis
  full_name: Kuznedelev, Denis
  last_name: Kuznedelev
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative
    optimal brain surgeon: Faster sparse recovery by leveraging second-order information.
    In: <i>38th Conference on Neural Information Processing Systems</i>. Vol 37. Neural
    Information Processing Systems Foundation; 2024.'
  apa: 'Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., &#38; Alistarh, D.-A.
    (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging
    second-order information. In <i>38th Conference on Neural Information Processing
    Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing Systems
    Foundation.'
  chicago: 'Wu, Diyuan, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, and
    Dan-Adrian Alistarh. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery
    by Leveraging Second-Order Information.” In <i>38th Conference on Neural Information
    Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation,
    2024.'
  ieee: 'D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, and D.-A. Alistarh, “The
    iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order
    information,” in <i>38th Conference on Neural Information Processing Systems</i>,
    Vancouver, Canada, 2024, vol. 37.'
  ista: 'Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. 2024. The iterative
    optimal brain surgeon: Faster sparse recovery by leveraging second-order information.
    38th Conference on Neural Information Processing Systems. NeurIPS: Neural Information
    Processing Systems, Advances in Neural Information Processing Systems, vol. 37.'
  mla: 'Wu, Diyuan, et al. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery
    by Leveraging Second-Order Information.” <i>38th Conference on Neural Information
    Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation,
    2024.'
  short: D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, D.-A. Alistarh, in:, 38th
    Conference on Neural Information Processing Systems, Neural Information Processing
    Systems Foundation, 2024.
conference:
  end_date: 2024-12-15
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-09
corr_author: '1'
date_created: 2025-04-06T22:01:32Z
date_published: 2024-12-20T00:00:00Z
date_updated: 2025-05-14T11:37:10Z
day: '20'
department:
- _id: DaAl
- _id: MaMo
ec_funded: 1
external_id:
  arxiv:
  - '2408.17163'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2408.17163
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'The iterative optimal brain surgeon: Faster sparse recovery by leveraging
  second-order information'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
_id: '14924'
abstract:
- lang: eng
  text: "The stochastic heavy ball method (SHB), also known as stochastic gradient
    descent (SGD) with Polyak's momentum, is widely used in training neural networks.
    However, despite the remarkable success of such algorithm in practice, its theoretical
    characterization remains limited. In this paper, we focus on neural networks with
    two and three layers and provide a rigorous understanding of the properties of
    the solutions found by SHB: \\emph{(i)} stability after dropping out part of the
    neurons, \\emph{(ii)} connectivity along a low-loss path, and \\emph{(iii)} convergence
    to the global optimum.\r\nTo achieve this goal, we take a mean-field view and
    relate the SHB dynamics to a certain partial differential equation in the limit
    of large network widths. This mean-field perspective has inspired a recent line
    of work focusing on SGD while, in contrast, our paper considers an algorithm with
    momentum. More specifically, after proving existence and uniqueness of the limit
    differential equations, we show convergence to the global optimum and give a quantitative
    bound between the mean-field limit and the SHB dynamics of a finite-width network.
    Armed with this last bound, we are able to establish the dropout-stability and
    connectivity of SHB solutions."
acknowledgement: D. Wu and M. Mondelli are partially supported by the 2019 Lopez-Loreta
  Prize. V. Kungurtsev was supported by the OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765
  "Research Center for Informatics".
alternative_title:
- TMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Diyuan
  full_name: Wu, Diyuan
  id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
  last_name: Wu
- first_name: Vyacheslav
  full_name: Kungurtsev, Vyacheslav
  last_name: Kungurtsev
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Wu D, Kungurtsev V, Mondelli M. Mean-field analysis for heavy ball methods:
    Dropout-stability, connectivity, and global convergence. In: <i>Transactions on
    Machine Learning Research</i>. ML Research Press; 2023.'
  apa: 'Wu, D., Kungurtsev, V., &#38; Mondelli, M. (2023). Mean-field analysis for
    heavy ball methods: Dropout-stability, connectivity, and global convergence. In
    <i>Transactions on Machine Learning Research</i>. ML Research Press.'
  chicago: 'Wu, Diyuan, Vyacheslav Kungurtsev, and Marco Mondelli. “Mean-Field Analysis
    for Heavy Ball Methods: Dropout-Stability, Connectivity, and Global Convergence.”
    In <i>Transactions on Machine Learning Research</i>. ML Research Press, 2023.'
  ieee: 'D. Wu, V. Kungurtsev, and M. Mondelli, “Mean-field analysis for heavy ball
    methods: Dropout-stability, connectivity, and global convergence,” in <i>Transactions
    on Machine Learning Research</i>, 2023.'
  ista: 'Wu D, Kungurtsev V, Mondelli M. 2023. Mean-field analysis for heavy ball
    methods: Dropout-stability, connectivity, and global convergence. Transactions
    on Machine Learning Research. , TMLR, .'
  mla: 'Wu, Diyuan, et al. “Mean-Field Analysis for Heavy Ball Methods: Dropout-Stability,
    Connectivity, and Global Convergence.” <i>Transactions on Machine Learning Research</i>,
    ML Research Press, 2023.'
  short: D. Wu, V. Kungurtsev, M. Mondelli, in:, Transactions on Machine Learning
    Research, ML Research Press, 2023.
corr_author: '1'
date_created: 2024-02-02T11:21:56Z
date_published: 2023-02-28T00:00:00Z
date_updated: 2025-04-15T07:50:17Z
day: '28'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2210.06819'
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2210.06819
month: '02'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Transactions on Machine Learning Research
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: 'Mean-field analysis for heavy ball methods: Dropout-stability, connectivity,
  and global convergence'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...