---
OA_place: publisher
OA_type: hybrid
PlanS_conform: '1'
_id: '12662'
abstract:
- lang: eng
  text: 'Modern machine learning tasks often require considering not just one but
    multiple objectives. For example, besides the prediction quality, this could be
    the efficiency, robustness or fairness of the learned models, or any of their
    combinations. Multi-objective learning offers a natural framework for handling
    such problems without having to commit to early trade-offs. Surprisingly, statistical
    learning theory so far offers almost no insight into the generalization properties
    of multi-objective learning. In this work, we make first steps to fill this gap:
    We establish foundational generalization bounds for the multi-objective setting
    as well as generalization and excess bounds for learning with scalarizations.
    We also provide the first theoretical analysis of the relation between the Pareto-optimal
    sets of the true objectives and the Pareto-optimal sets of their empirical approximations
    from training data. In particular, we show a surprising asymmetry: All Pareto-optimal
    solutions can be approximated by empirically Pareto-optimal ones, but not vice
    versa.'
acknowledgement: Open access funding provided by Institute of Science and Technology
  (IST Austria).
article_processing_charge: Yes (via OA deal)
article_type: original
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: Súkeník P, Lampert C. Generalization in multi-objective machine learning. <i>Neural
    Computing and Applications</i>. 2025;37:24669–24683. doi:<a href="https://doi.org/10.1007/s00521-024-10616-1">10.1007/s00521-024-10616-1</a>
  apa: Súkeník, P., &#38; Lampert, C. (2025). Generalization in multi-objective machine
    learning. <i>Neural Computing and Applications</i>. Springer Nature. <a href="https://doi.org/10.1007/s00521-024-10616-1">https://doi.org/10.1007/s00521-024-10616-1</a>
  chicago: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective
    Machine Learning.” <i>Neural Computing and Applications</i>. Springer Nature,
    2025. <a href="https://doi.org/10.1007/s00521-024-10616-1">https://doi.org/10.1007/s00521-024-10616-1</a>.
  ieee: P. Súkeník and C. Lampert, “Generalization in multi-objective machine learning,”
    <i>Neural Computing and Applications</i>, vol. 37. Springer Nature, pp. 24669–24683,
    2025.
  ista: Súkeník P, Lampert C. 2025. Generalization in multi-objective machine learning.
    Neural Computing and Applications. 37, 24669–24683.
  mla: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective Machine
    Learning.” <i>Neural Computing and Applications</i>, vol. 37, Springer Nature,
    2025, pp. 24669–24683, doi:<a href="https://doi.org/10.1007/s00521-024-10616-1">10.1007/s00521-024-10616-1</a>.
  short: P. Súkeník, C. Lampert, Neural Computing and Applications 37 (2025) 24669–24683.
corr_author: '1'
date_created: 2023-02-20T08:23:06Z
date_published: 2025-10-01T00:00:00Z
date_updated: 2025-12-30T06:39:56Z
day: '01'
ddc:
- '004'
department:
- _id: ChLa
doi: 10.1007/s00521-024-10616-1
external_id:
  arxiv:
  - '2208.13499'
file:
- access_level: open_access
  checksum: 61ad4591aee16b1e02daf6c164321a42
  content_type: application/pdf
  creator: dernst
  date_created: 2025-12-30T06:39:11Z
  date_updated: 2025-12-30T06:39:11Z
  file_id: '20877'
  file_name: 2025_NeuralCompApplic_Sukenik.pdf
  file_size: 500213
  relation: main_file
  success: 1
file_date_updated: 2025-12-30T06:39:11Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '10'
oa: 1
oa_version: Published Version
page: 24669–24683
publication: Neural Computing and Applications
publication_identifier:
  eissn:
  - 1433-3058
  issn:
  - 0941-0643
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
scopus_import: '1'
status: public
title: Generalization in multi-objective machine learning
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2025'
...
---
OA_place: publisher
OA_type: diamond
_id: '20035'
abstract:
- lang: eng
  text: "Deep neural networks (DNNs) at convergence consistently represent the training
    data in the last layer via a geometric structure referred to as neural collapse.
    This empirical evidence has spurred a line of theoretical research aimed at proving
    the emergence of neural collapse, mostly focusing on the unconstrained features
    model. Here, the features of the penultimate layer are free variables, which makes
    the model data-agnostic and puts into question its ability to capture DNN training.
    Our work addresses the issue, moving away from unconstrained features and\r\nstudying
    DNNs that end with at least two linear layers. We first prove generic guarantees
    on neural collapse that assume (i) low training error and balancedness of linear
    layers (for within-class variability collapse), and (ii) bounded conditioning
    of the features before the linear part (for orthogonality of class-means, and
    their alignment with weight matrices). The balancedness refers to the fact that
    W⊤ℓ+1Wℓ+1 ≈ WℓW⊤ℓfor any pair of consecutive weight matrices of the linear part,
    and the bounded conditioning requires a well-behaved ratio between largest and
    smallest non-zero singular values of the features. We then show that such assumptions
    hold for gradient descent training with weight decay: (i) for networks with a
    wide first layer, we prove low training error and balancedness, and (ii) for solutions
    that are either nearly optimal or stable under large learning rates, we additionally
    prove the bounded conditioning. Taken together, our results are the first to show
    neural collapse in the end-to-end training of DNNs."
acknowledgement: M. M. and P. S. are funded by the European Union (ERC, INF2, project
  number 101161364). Views and opinions expressed are however those of the author(s)
  only and do not necessarily reflect those of the European Union or the European
  Research Council Executive Agency. Neither the European Union nor the granting authority
  can be held responsible for them.
article_processing_charge: No
arxiv: 1
author:
- first_name: Arthur
  full_name: Jacot, Arthur
  last_name: Jacot
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Zihan
  full_name: Wang, Zihan
  last_name: Wang
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Jacot A, Súkeník P, Wang Z, Mondelli M. Wide neural networks trained with
    weight decay provably exhibit neural collapse. In: <i>13th International Conference
    on Learning Representations</i>. ICLR; 2025:1905-1931.'
  apa: 'Jacot, A., Súkeník, P., Wang, Z., &#38; Mondelli, M. (2025). Wide neural networks
    trained with weight decay provably exhibit neural collapse. In <i>13th International
    Conference on Learning Representations</i> (pp. 1905–1931). Singapore, Singapore:
    ICLR.'
  chicago: Jacot, Arthur, Peter Súkeník, Zihan Wang, and Marco Mondelli. “Wide Neural
    Networks Trained with Weight Decay Provably Exhibit Neural Collapse.” In <i>13th
    International Conference on Learning Representations</i>, 1905–31. ICLR, 2025.
  ieee: A. Jacot, P. Súkeník, Z. Wang, and M. Mondelli, “Wide neural networks trained
    with weight decay provably exhibit neural collapse,” in <i>13th International
    Conference on Learning Representations</i>, Singapore, Singapore, 2025, pp. 1905–1931.
  ista: 'Jacot A, Súkeník P, Wang Z, Mondelli M. 2025. Wide neural networks trained
    with weight decay provably exhibit neural collapse. 13th International Conference
    on Learning Representations. ICLR: International Conference on Learning Representations,
    1905–1931.'
  mla: Jacot, Arthur, et al. “Wide Neural Networks Trained with Weight Decay Provably
    Exhibit Neural Collapse.” <i>13th International Conference on Learning Representations</i>,
    ICLR, 2025, pp. 1905–31.
  short: A. Jacot, P. Súkeník, Z. Wang, M. Mondelli, in:, 13th International Conference
    on Learning Representations, ICLR, 2025, pp. 1905–1931.
conference:
  end_date: 2025-04-28
  location: Singapore, Singapore
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2025-04-24
corr_author: '1'
date_created: 2025-07-20T22:02:02Z
date_published: 2025-04-01T00:00:00Z
date_updated: 2025-08-04T08:47:00Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2410.04887'
file:
- access_level: open_access
  checksum: 59c48c173887139647cc9839c0801136
  content_type: application/pdf
  creator: dernst
  date_created: 2025-08-04T08:45:43Z
  date_updated: 2025-08-04T08:45:43Z
  file_id: '20114'
  file_name: 2025_ICLR_Jacot.pdf
  file_size: 1337236
  relation: main_file
  success: 1
file_date_updated: 2025-08-04T08:45:43Z
has_accepted_license: '1'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
page: 1905-1931
project:
- _id: 911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf
  grant_number: '101161364'
  name: 'Inference in High Dimensions: Light-speed Algorithms and Information Limits'
publication: 13th International Conference on Learning Representations
publication_identifier:
  isbn:
  - '9798331320850'
publication_status: published
publisher: ICLR
quality_controlled: '1'
scopus_import: '1'
status: public
title: Wide neural networks trained with weight decay provably exhibit neural collapse
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2025'
...
---
OA_place: repository
OA_type: green
_id: '18890'
abstract:
- lang: eng
  text: Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the
    data representations in the final layers of Deep Neural Networks (DNNs). Though
    the phenomenon has been measured in a variety of settings, its emergence is typically
    explained via data-agnostic approaches, such as the unconstrained features model.
    In this work, we introduce a data-dependent setting where DNC forms due to feature
    learning through the average gradient outer product (AGOP). The AGOP is defined
    with respect to a learned predictor and is equal to the uncentered covariance
    matrix of its input-output gradients averaged over the training dataset. The Deep
    Recursive Feature Machine (Deep RFM) is a method that constructs a neural network
    by iteratively mapping the data with the AGOP and applying an untrained random
    feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard
    settings as a consequence of the projection with the AGOP matrix computed at each
    layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting
    and as a result of kernel learning. We then provide evidence that this mechanism
    holds for neural networks more generally. In particular, we show that the right
    singular vectors and values of the weights can be responsible for the majority
    of within-class variability collapse for DNNs trained in the feature learning
    regime. As observed in recent work, this singular structure is highly correlated
    with that of the AGOP.
acknowledgement: 'We acknowledge support from the National Science Foundation (NSF)
  and the Simons Foundation for the Collaboration on the Theoretical Foundations of
  Deep Learning through awards DMS-2031883 and #814639 as well as the TILOS institute
  (NSF CCF-2112665). This work used the programs (1) XSEDE (Extreme science and engineering
  discovery environment) which is supported by NSF grant numbers ACI-1548562, and
  (2) ACCESS (Advanced cyberinfrastructure coordination ecosystem: services & support)
  which is supported by NSF grants numbers #2138259, #2138286, #2138307, #2137603,
  and #2138296. Specifically, we used the resources from SDSC Expanse GPU compute
  nodes, and NCSA Delta system, via allocations TG-CIS220009. Marco Mondelli is supported
  by the 2019 Lopez-Loreta prize. We also acknowledge useful feedback from anonymous
  reviewers. '
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Daniel
  full_name: Beaglehole, Daniel
  last_name: Beaglehole
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Mikhail
  full_name: Belkin, Mikhail
  last_name: Belkin
citation:
  ama: 'Beaglehole D, Súkeník P, Mondelli M, Belkin M. Average gradient outer product
    as a mechanism for deep neural collapse. In: <i>38th Annual Conference on Neural
    Information Processing Systems</i>. Vol 37. Neural Information Processing Systems
    Foundation; 2024.'
  apa: 'Beaglehole, D., Súkeník, P., Mondelli, M., &#38; Belkin, M. (2024). Average
    gradient outer product as a mechanism for deep neural collapse. In <i>38th Annual
    Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver,
    Canada: Neural Information Processing Systems Foundation.'
  chicago: Beaglehole, Daniel, Peter Súkeník, Marco Mondelli, and Mikhail Belkin.
    “Average Gradient Outer Product as a Mechanism for Deep Neural Collapse.” In <i>38th
    Annual Conference on Neural Information Processing Systems</i>, Vol. 37. Neural
    Information Processing Systems Foundation, 2024.
  ieee: D. Beaglehole, P. Súkeník, M. Mondelli, and M. Belkin, “Average gradient outer
    product as a mechanism for deep neural collapse,” in <i>38th Annual Conference
    on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Beaglehole D, Súkeník P, Mondelli M, Belkin M. 2024. Average gradient outer
    product as a mechanism for deep neural collapse. 38th Annual Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    Advances in Neural Information Processing Systems, vol. 37.'
  mla: Beaglehole, Daniel, et al. “Average Gradient Outer Product as a Mechanism for
    Deep Neural Collapse.” <i>38th Annual Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: D. Beaglehole, P. Súkeník, M. Mondelli, M. Belkin, in:, 38th Annual Conference
    on Neural Information Processing Systems, Neural Information Processing Systems
    Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-27T11:11:40Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-05-14T11:29:45Z
day: '01'
department:
- _id: GradSch
- _id: MaMo
external_id:
  arxiv:
  - '2402.13728'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/forum?id=lJ1jdl2K9k
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 38th Annual Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Average gradient outer product as a mechanism for deep neural collapse
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: publisher
OA_type: gold
_id: '18891'
abstract:
- lang: eng
  text: "Deep neural networks (DNNs) exhibit a surprising structure in their final
    layer\r\nknown as neural collapse (NC), and a growing body of works has currently
    investigated the propagation of neural collapse to earlier layers of DNNs – a
    phenomenon\r\ncalled deep neural collapse (DNC). However, existing theoretical
    results are restricted to special cases: linear models, only two layers or binary
    classification.\r\nIn contrast, we focus on non-linear models of arbitrary depth
    in multi-class classification and reveal a surprising qualitative shift. As soon
    as we go beyond two\r\nlayers or two classes, DNC stops being optimal for the
    deep unconstrained features\r\nmodel (DUFM) – the standard theoretical framework
    for the analysis of collapse.\r\nThe main culprit is a low-rank bias of multi-layer
    regularization schemes: this bias\r\nleads to optimal solutions of even lower
    rank than the neural collapse. We support\r\nour theoretical findings with experiments
    on both DUFM and real data, which show\r\nthe emergence of the low-rank structure
    in the solution found by gradient descent."
acknowledged_ssus:
- _id: ScienComp
acknowledgement: Marco Mondelli is partially supported by the 2019 Lopez-Loreta prize.
  This research was supported by the Scientific Service Units (SSU) of ISTA through
  resources provided by Scientific Computing (SciComp).
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Súkeník P, Lampert C, Mondelli M. Neural collapse versus low-rank bias: Is
    deep neural collapse really optimal? In: <i>38th Annual Conference on Neural Information
    Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation;
    2024.'
  apa: 'Súkeník, P., Lampert, C., &#38; Mondelli, M. (2024). Neural collapse versus
    low-rank bias: Is deep neural collapse really optimal? In <i>38th Annual Conference
    on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural
    Information Processing Systems Foundation.'
  chicago: 'Súkeník, Peter, Christoph Lampert, and Marco Mondelli. “Neural Collapse
    versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal?” In <i>38th Annual
    Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information
    Processing Systems Foundation, 2024.'
  ieee: 'P. Súkeník, C. Lampert, and M. Mondelli, “Neural collapse versus low-rank
    bias: Is deep neural collapse really optimal?,” in <i>38th Annual Conference on
    Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.'
  ista: 'Súkeník P, Lampert C, Mondelli M. 2024. Neural collapse versus low-rank bias:
    Is deep neural collapse really optimal? 38th Annual Conference on Neural Information
    Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in
    Neural Information Processing Systems, vol. 37.'
  mla: 'Súkeník, Peter, et al. “Neural Collapse versus Low-Rank Bias: Is Deep Neural
    Collapse Really Optimal?” <i>38th Annual Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.'
  short: P. Súkeník, C. Lampert, M. Mondelli, in:, 38th Annual Conference on Neural
    Information Processing Systems, Neural Information Processing Systems Foundation,
    2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-27T11:15:18Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-06-04T07:19:21Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: MaMo
- _id: ChLa
external_id:
  arxiv:
  - '2405.14468'
file:
- access_level: open_access
  checksum: b7b79f1ea3ac1e9e11b3d91faaeb0780
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-04T08:11:25Z
  date_updated: 2025-02-04T08:11:25Z
  file_id: '18989'
  file_name: 2024_NeurIPS_Sukenik.pdf
  file_size: 1784118
  relation: main_file
  success: 1
file_date_updated: 2025-02-04T08:11:25Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 38th Annual Conference on Neural Information Processing Systems
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
status: public
title: 'Neural collapse versus low-rank bias: Is deep neural collapse really optimal?'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
_id: '14921'
abstract:
- lang: eng
  text: Neural collapse (NC) refers to the surprising structure of the last layer
    of deep neural networks in the terminal phase of gradient descent training. Recently,
    an increasing amount of experimental evidence has pointed to the propagation of
    NC to earlier layers of neural networks. However, while the NC in the last layer
    is well studied theoretically, much less is known about its multi-layered counterpart
    - deep neural collapse (DNC). In particular, existing work focuses either on linear
    layers or only on the last two layers at the price of an extra assumption. Our
    paper fills this gap by generalizing the established analytical framework for
    NC - the unconstrained features model - to multiple non-linear layers. Our key
    technical contribution is to show that, in a deep unconstrained features model,
    the unique global optimum for binary classification exhibits all the properties
    typical of DNC. This explains the existing experimental evidence of DNC. We also
    empirically show that (i) by optimizing deep unconstrained features models via
    gradient descent, the resulting solution agrees well with our theory, and (ii)
    trained networks recover the unconstrained features suitable for the occurrence
    of DNC, thus supporting the validity of this modeling principle.
acknowledgement: M. M. is partially supported by the 2019 Lopez-Loreta Prize. The
  authors would like to thank Eugenia Iofinova, Bernd Prach and Simone Bombari for
  valuable feedback on the manuscript.
alternative_title:
- NeurIPS
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
    for the deep unconstrained features model. In: <i>37th Annual Conference on Neural
    Information Processing Systems</i>. ; 2023.'
  apa: Súkeník, P., Mondelli, M., &#38; Lampert, C. (2023). Deep neural collapse is
    provably optimal for the deep unconstrained features model. In <i>37th Annual
    Conference on Neural Information Processing Systems</i>. New Orleans, LA, United
    States.
  chicago: Súkeník, Peter, Marco Mondelli, and Christoph Lampert. “Deep Neural Collapse
    Is Provably Optimal for the Deep Unconstrained Features Model.” In <i>37th Annual
    Conference on Neural Information Processing Systems</i>, 2023.
  ieee: P. Súkeník, M. Mondelli, and C. Lampert, “Deep neural collapse is provably
    optimal for the deep unconstrained features model,” in <i>37th Annual Conference
    on Neural Information Processing Systems</i>, New Orleans, LA, United States,
    2023.
  ista: 'Súkeník P, Mondelli M, Lampert C. 2023. Deep neural collapse is provably
    optimal for the deep unconstrained features model. 37th Annual Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    NeurIPS, .'
  mla: Súkeník, Peter, et al. “Deep Neural Collapse Is Provably Optimal for the Deep
    Unconstrained Features Model.” <i>37th Annual Conference on Neural Information
    Processing Systems</i>, 2023.
  short: P. Súkeník, M. Mondelli, C. Lampert, in:, 37th Annual Conference on Neural
    Information Processing Systems, 2023.
conference:
  end_date: 2023-12-16
  location: New Orleans, LA, United States
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2023-12-10
corr_author: '1'
date_created: 2024-02-02T11:17:41Z
date_published: 2023-12-15T00:00:00Z
date_updated: 2025-04-15T07:50:16Z
day: '15'
department:
- _id: MaMo
- _id: ChLa
external_id:
  arxiv:
  - '2305.13165'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: ' https://doi.org/10.48550/arXiv.2305.13165'
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 37th Annual Conference on Neural Information Processing Systems
publication_status: published
quality_controlled: '1'
status: public
title: Deep neural collapse is provably optimal for the deep unconstrained features
  model
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '12664'
abstract:
- lang: eng
  text: Randomized smoothing is currently considered the state-of-the-art method to
    obtain certifiably robust classifiers. Despite its remarkable performance, the
    method is associated with various serious problems such as “certified accuracy
    waterfalls”, certification vs. accuracy trade-off, or even fairness issues. Input-dependent
    smoothing approaches have been proposed with intention of overcoming these flaws.
    However, we demonstrate that these methods lack formal guarantees and so the resulting
    certificates are not justified. We show that in general, the input-dependent smoothing
    suffers from the curse of dimensionality, forcing the variance function to have
    low semi-elasticity. On the other hand, we provide a theoretical and practical
    framework that enables the usage of input-dependent smoothing even in the presence
    of the curse of dimensionality, under strict restrictions. We present one concrete
    design of the smoothing variance function and test it on CIFAR10 and MNIST. Our
    design mitigates some of the problems of classical smoothing and is formally underlined,
    yet further improvement of the design is still necessary.
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Aleksei
  full_name: Kuvshinov, Aleksei
  last_name: Kuvshinov
- first_name: Stephan
  full_name: Günnemann, Stephan
  last_name: Günnemann
citation:
  ama: 'Súkeník P, Kuvshinov A, Günnemann S. Intriguing properties of input-dependent
    randomized smoothing. In: <i>Proceedings of the 39th International Conference
    on Machine Learning</i>. Vol 162. ML Research Press; 2022:20697-20743.'
  apa: 'Súkeník, P., Kuvshinov, A., &#38; Günnemann, S. (2022). Intriguing properties
    of input-dependent randomized smoothing. In <i>Proceedings of the 39th International
    Conference on Machine Learning</i> (Vol. 162, pp. 20697–20743). Baltimore, MD,
    United States: ML Research Press.'
  chicago: Súkeník, Peter, Aleksei Kuvshinov, and Stephan Günnemann. “Intriguing Properties
    of Input-Dependent Randomized Smoothing.” In <i>Proceedings of the 39th International
    Conference on Machine Learning</i>, 162:20697–743. ML Research Press, 2022.
  ieee: P. Súkeník, A. Kuvshinov, and S. Günnemann, “Intriguing properties of input-dependent
    randomized smoothing,” in <i>Proceedings of the 39th International Conference
    on Machine Learning</i>, Baltimore, MD, United States, 2022, vol. 162, pp. 20697–20743.
  ista: 'Súkeník P, Kuvshinov A, Günnemann S. 2022. Intriguing properties of input-dependent
    randomized smoothing. Proceedings of the 39th International Conference on Machine
    Learning. ICML: International Conference on Machine Learning vol. 162, 20697–20743.'
  mla: Súkeník, Peter, et al. “Intriguing Properties of Input-Dependent Randomized
    Smoothing.” <i>Proceedings of the 39th International Conference on Machine Learning</i>,
    vol. 162, ML Research Press, 2022, pp. 20697–743.
  short: P. Súkeník, A. Kuvshinov, S. Günnemann, in:, Proceedings of the 39th International
    Conference on Machine Learning, ML Research Press, 2022, pp. 20697–20743.
conference:
  end_date: 2022-07-23
  location: Baltimore, MD, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2022-07-17
corr_author: '1'
date_created: 2023-02-20T08:30:21Z
date_published: 2022-07-19T00:00:00Z
date_updated: 2025-07-10T11:50:28Z
day: '19'
ddc:
- '004'
external_id:
  arxiv:
  - '2110.05365'
file:
- access_level: open_access
  checksum: ab8695b1e24fb4fef4f1f9cd63ca8238
  content_type: application/pdf
  creator: chl
  date_created: 2023-02-20T08:30:10Z
  date_updated: 2023-02-20T08:30:10Z
  file_id: '12665'
  file_name: sukeni-k22a.pdf
  file_size: 8470811
  relation: main_file
  success: 1
file_date_updated: 2023-02-20T08:30:10Z
has_accepted_license: '1'
intvolume: '       162'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 20697-20743
publication: Proceedings of the 39th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Intriguing properties of input-dependent randomized smoothing
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 162
year: '2022'
...
---
OA_place: publisher
OA_type: gold
_id: '18876'
abstract:
- lang: eng
  text: Convolutional neural networks were the standard for solving many computer
    vision tasks until recently, when Transformers of MLP-based architectures have
    started to show competitive performance. These architectures typically have a
    vast number of weights and need to be trained on massive datasets; hence, they
    are not suitable for their use in low-data regimes. In this work, we propose a
    simple yet effective framework to improve generalization from small amounts of
    data. We augment modern CNNs with fully-connected (FC) layers and show the massive
    impact this architectural change has in low-data regimes. We further present an
    online joint knowledge-distillation method to utilize the extra FC layers at train
    time but avoid them during test time. This allows us to improve the generalization
    of a CNN-based model without any increase in the number of weights at test time.
    We perform classification experiments for a large range of network backbones and
    several standard datasets on supervised learning and active learning. Our experiments
    significantly outperform the networks without fully-connected layers, reaching
    a relative improvement of up to 16% validation accuracy in the supervised setting
    without adding any extra parameters during inference.
acknowledgement: "This work was supported by a Sofja Kovalevskaja Award, a postdoc
  fellowship\r\nfrom the Humboldt Foundation, the ERC Starting Grant Scan2CAD (804724),
  and the German\r\nResearch Foundation (DFG) Research Unit \"Learning and Simulation
  in Visual Computing\"."
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Kocsis, Peter
  last_name: Kocsis
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Guillem
  full_name: Brasó, Guillem
  last_name: Brasó
- first_name: Matthias
  full_name: Niessner, Matthias
  last_name: Niessner
- first_name: Laura
  full_name: Leal-Taixé, Laura
  last_name: Leal-Taixé
- first_name: Ismail
  full_name: Elezi, Ismail
  last_name: Elezi
citation:
  ama: 'Kocsis P, Súkeník P, Brasó G, Niessner M, Leal-Taixé L, Elezi I. The unreasonable
    effectiveness of fully-connected layers for low-data regimes. In: <i>36th Conference
    on Neural Information Processing Systems</i>. Vol 35. Neural Information Processing
    Systems Foundation; 2022:1896-1908.'
  apa: 'Kocsis, P., Súkeník, P., Brasó, G., Niessner, M., Leal-Taixé, L., &#38; Elezi,
    I. (2022). The unreasonable effectiveness of fully-connected layers for low-data
    regimes. In <i>36th Conference on Neural Information Processing Systems</i> (Vol.
    35, pp. 1896–1908). New Orleans, LA, United States: Neural Information Processing
    Systems Foundation.'
  chicago: Kocsis, Peter, Peter Súkeník, Guillem Brasó, Matthias Niessner, Laura Leal-Taixé,
    and Ismail Elezi. “The Unreasonable Effectiveness of Fully-Connected Layers for
    Low-Data Regimes.” In <i>36th Conference on Neural Information Processing Systems</i>,
    35:1896–1908. Neural Information Processing Systems Foundation, 2022.
  ieee: P. Kocsis, P. Súkeník, G. Brasó, M. Niessner, L. Leal-Taixé, and I. Elezi,
    “The unreasonable effectiveness of fully-connected layers for low-data regimes,”
    in <i>36th Conference on Neural Information Processing Systems</i>, New Orleans,
    LA, United States, 2022, vol. 35, pp. 1896–1908.
  ista: 'Kocsis P, Súkeník P, Brasó G, Niessner M, Leal-Taixé L, Elezi I. 2022. The
    unreasonable effectiveness of fully-connected layers for low-data regimes. 36th
    Conference on Neural Information Processing Systems. NeurIPS: Neural Information
    Processing Systems, Advances in Neural Information Processing Systems, vol. 35,
    1896–1908.'
  mla: Kocsis, Peter, et al. “The Unreasonable Effectiveness of Fully-Connected Layers
    for Low-Data Regimes.” <i>36th Conference on Neural Information Processing Systems</i>,
    vol. 35, Neural Information Processing Systems Foundation, 2022, pp. 1896–908.
  short: P. Kocsis, P. Súkeník, G. Brasó, M. Niessner, L. Leal-Taixé, I. Elezi, in:,
    36th Conference on Neural Information Processing Systems, Neural Information Processing
    Systems Foundation, 2022, pp. 1896–1908.
conference:
  end_date: 2022-12-09
  location: New Orleans, LA, United States
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2022-11-28
date_created: 2025-01-24T19:16:01Z
date_published: 2022-12-01T00:00:00Z
date_updated: 2025-07-10T11:51:28Z
day: '01'
ddc:
- '000'
extern: '1'
external_id:
  arxiv:
  - '2210.05657'
file:
- access_level: open_access
  checksum: 2a14e59ef8b34d9a1a27a7adbc6f83ff
  content_type: application/pdf
  creator: psukenik
  date_created: 2025-01-24T19:13:32Z
  date_updated: 2025-01-24T19:13:32Z
  file_id: '18877'
  file_name: NeurIPS-2022-the-unreasonable-effectiveness-of-fully-connected-layers-for-low-data-regimes-Paper-Conference.pdf
  file_size: 444819
  relation: main_file
  success: 1
file_date_updated: 2025-01-24T19:13:32Z
has_accepted_license: '1'
intvolume: '        35'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
page: 1896-1908
publication: 36th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: The unreasonable effectiveness of fully-connected layers for low-data regimes
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 35
year: '2022'
...
