---
OA_place: publisher
OA_type: gold
_id: '18891'
abstract:
- lang: eng
  text: "Deep neural networks (DNNs) exhibit a surprising structure in their final
    layer\r\nknown as neural collapse (NC), and a growing body of works has currently
    investigated the propagation of neural collapse to earlier layers of DNNs – a
    phenomenon\r\ncalled deep neural collapse (DNC). However, existing theoretical
    results are restricted to special cases: linear models, only two layers or binary
    classification.\r\nIn contrast, we focus on non-linear models of arbitrary depth
    in multi-class classification and reveal a surprising qualitative shift. As soon
    as we go beyond two\r\nlayers or two classes, DNC stops being optimal for the
    deep unconstrained features\r\nmodel (DUFM) – the standard theoretical framework
    for the analysis of collapse.\r\nThe main culprit is a low-rank bias of multi-layer
    regularization schemes: this bias\r\nleads to optimal solutions of even lower
    rank than the neural collapse. We support\r\nour theoretical findings with experiments
    on both DUFM and real data, which show\r\nthe emergence of the low-rank structure
    in the solution found by gradient descent."
acknowledged_ssus:
- _id: ScienComp
acknowledgement: Marco Mondelli is partially supported by the 2019 Lopez-Loreta prize.
  This research was supported by the Scientific Service Units (SSU) of ISTA through
  resources provided by Scientific Computing (SciComp).
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Súkeník P, Lampert C, Mondelli M. Neural collapse versus low-rank bias: Is
    deep neural collapse really optimal? In: <i>38th Annual Conference on Neural Information
    Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation;
    2024.'
  apa: 'Súkeník, P., Lampert, C., &#38; Mondelli, M. (2024). Neural collapse versus
    low-rank bias: Is deep neural collapse really optimal? In <i>38th Annual Conference
    on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural
    Information Processing Systems Foundation.'
  chicago: 'Súkeník, Peter, Christoph Lampert, and Marco Mondelli. “Neural Collapse
    versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal?” In <i>38th Annual
    Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information
    Processing Systems Foundation, 2024.'
  ieee: 'P. Súkeník, C. Lampert, and M. Mondelli, “Neural collapse versus low-rank
    bias: Is deep neural collapse really optimal?,” in <i>38th Annual Conference on
    Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.'
  ista: 'Súkeník P, Lampert C, Mondelli M. 2024. Neural collapse versus low-rank bias:
    Is deep neural collapse really optimal? 38th Annual Conference on Neural Information
    Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in
    Neural Information Processing Systems, vol. 37.'
  mla: 'Súkeník, Peter, et al. “Neural Collapse versus Low-Rank Bias: Is Deep Neural
    Collapse Really Optimal?” <i>38th Annual Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.'
  short: P. Súkeník, C. Lampert, M. Mondelli, in:, 38th Annual Conference on Neural
    Information Processing Systems, Neural Information Processing Systems Foundation,
    2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-27T11:15:18Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-06-04T07:19:21Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: MaMo
- _id: ChLa
external_id:
  arxiv:
  - '2405.14468'
file:
- access_level: open_access
  checksum: b7b79f1ea3ac1e9e11b3d91faaeb0780
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-04T08:11:25Z
  date_updated: 2025-02-04T08:11:25Z
  file_id: '18989'
  file_name: 2024_NeurIPS_Sukenik.pdf
  file_size: 1784118
  relation: main_file
  success: 1
file_date_updated: 2025-02-04T08:11:25Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 38th Annual Conference on Neural Information Processing Systems
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
status: public
title: 'Neural collapse versus low-rank bias: Is deep neural collapse really optimal?'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: publisher
OA_type: gold
_id: '18897'
abstract:
- lang: eng
  text: 'Score-based generative models (SGMs) are powerful tools to sample from complex
    data distributions. Their underlying idea is to (i) run a forward process for
    time T1 by adding noise to the data, (ii) estimate its score function, and (iii)
    use such estimate to run a reverse process. As the reverse process is initialized
    with the stationary distribution of the forward one, the existing analysis paradigm
    requires T1→∞. This is however problematic: from a theoretical viewpoint, for
    a given precision of the score approximation, the convergence guarantee fails
    as T1 diverges; from a practical viewpoint, a large T1 increases computational
    costs and leads to error propagation. This paper addresses the issue by considering
    a version of the popular predictor-corrector scheme: after running the forward
    process, we first estimate the final distribution via an inexact Langevin dynamics
    and then revert the process. Our key technical contribution is to provide convergence
    guarantees which require to run the forward process only for a fixed finite time
    T1. Our bounds exhibit a mild logarithmic dependence on the input dimension and
    the subgaussian norm of the target distribution, have minimal assumptions on the
    data, and require only to control the L2 loss on the score approximation, which
    is the quantity minimized in practice.'
acknowledgement: "Francesco Pedrotti and Jan Maas acknowledge support by the Austrian
  Science Fund (FWF) project 10.55776/F65. Marco Mondelli acknowledges support by
  the 2019 Lopez-Loreta prize.\r\n"
alternative_title:
- TMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Francesco
  full_name: Pedrotti, Francesco
  id: d3ac8ac6-dc8d-11ea-abe3-e2a9628c4c3c
  last_name: Pedrotti
- first_name: Jan
  full_name: Maas, Jan
  id: 4C5696CE-F248-11E8-B48F-1D18A9856A87
  last_name: Maas
  orcid: 0000-0002-0845-1338
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion
    models via prediction-correction. In: <i>Transactions on Machine Learning Research</i>.
    ; 2024.'
  apa: Pedrotti, F., Maas, J., &#38; Mondelli, M. (2024). Improved convergence of
    score-based diffusion models via prediction-correction. In <i>Transactions on
    Machine Learning Research</i>.
  chicago: Pedrotti, Francesco, Jan Maas, and Marco Mondelli. “Improved Convergence
    of Score-Based Diffusion Models via Prediction-Correction.” In <i>Transactions
    on Machine Learning Research</i>, 2024.
  ieee: F. Pedrotti, J. Maas, and M. Mondelli, “Improved convergence of score-based
    diffusion models via prediction-correction,” in <i>Transactions on Machine Learning
    Research</i>, 2024.
  ista: Pedrotti F, Maas J, Mondelli M. 2024. Improved convergence of score-based
    diffusion models via prediction-correction. Transactions on Machine Learning Research.
    , TMLR, .
  mla: Pedrotti, Francesco, et al. “Improved Convergence of Score-Based Diffusion
    Models via Prediction-Correction.” <i>Transactions on Machine Learning Research</i>,
    2024.
  short: F. Pedrotti, J. Maas, M. Mondelli, in:, Transactions on Machine Learning
    Research, 2024.
corr_author: '1'
date_created: 2025-01-27T12:18:05Z
date_published: 2024-06-01T00:00:00Z
date_updated: 2025-04-15T08:31:35Z
day: '01'
ddc:
- '000'
department:
- _id: JaMa
- _id: MaMo
external_id:
  arxiv:
  - '2305.14164'
file:
- access_level: open_access
  checksum: 76a1fd5afd8ee6f7ae0e5912d7dbf6b4
  content_type: application/pdf
  creator: dernst
  date_created: 2025-01-27T12:19:44Z
  date_updated: 2025-01-27T12:19:44Z
  file_id: '18898'
  file_name: 2024_TMLR_Pedrotti.pdf
  file_size: 780315
  relation: main_file
  success: 1
file_date_updated: 2025-01-27T12:19:44Z
has_accepted_license: '1'
language:
- iso: eng
month: '06'
oa: 1
oa_version: Published Version
project:
- _id: fc31cba2-9c52-11eb-aca3-ff467d239cd2
  grant_number: F6504
  name: Taming Complexity in Partial Differential Systems
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Transactions on Machine Learning Research
publication_identifier:
  issn:
  - 2835-8856
publication_status: published
quality_controlled: '1'
related_material:
  record:
  - id: '17350'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: Improved convergence of score-based diffusion models via prediction-correction
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18972'
abstract:
- lang: eng
  text: 'Deep learning models are known to overfit and memorize spurious features
    in the training dataset. While numerous empirical studies have aimed at understanding
    this phenomenon, a rigorous theoretical framework to quantify it is still missing.
    In this paper, we consider spurious features that are uncorrelated with the learning
    task, and we provide a precise characterization of how they are memorized via
    two separate terms: (i) the stability of the model with respect to individual
    training samples, and (ii) the feature alignment between the spurious pattern
    and the full sample. While the first term is well established in learning theory
    and it is connected to the generalization error in classical work, the second
    one is, to the best of our knowledge, novel. Our key technical result gives a
    precise characterization of the feature alignment for the two prototypical settings
    of random features (RF) and neural tangent kernel (NTK) regression. We prove that
    the memorization of spurious features weakens as the generalization capability
    increases and, through the analysis of the feature alignment, we unveil the role
    of the model and of its activation function. Numerical experiments show the predictive
    power of our theory on standard datasets (MNIST, CIFAR-10).'
acknowledgement: "The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank (in alphabetical order) Grigorios Chrysos, Simone Maria
  Giancola, Mahyar\r\nJafari Nodeh, Christoph Lampert, Marco Miani, GuanWen Qiu, and
  Peter Sukenık for helpful discussions."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. How spurious features are memorized: Precise analysis
    for random and NTK features. In: <i>41st International Conference on Machine Learning</i>.
    Vol 235. ML Research Press; 2024:4267-4299.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). How spurious features are memorized:
    Precise analysis for random and NTK features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4267–4299). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4267–99. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “How spurious features are memorized: Precise
    analysis for random and NTK features,” in <i>41st International Conference on
    Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4267–4299.'
  ista: 'Bombari S, Mondelli M. 2024. How spurious features are memorized: Precise
    analysis for random and NTK features. 41st International Conference on Machine
    Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235,
    4267–4299.'
  mla: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4267–99.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4267–4299.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:29:47Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2305.12100'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2305.12100
month: '07'
oa: 1
oa_version: Preprint
page: 4267-4299
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How spurious features are memorized: Precise analysis for random and NTK features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18973'
abstract:
- lang: eng
  text: 'Understanding the reasons behind the exceptional success of transformers
    requires a better analysis of why attention layers are suitable for NLP tasks.
    In particular, such tasks require predictive models to capture contextual meaning
    which often depends on one or few words, even if the sentence is long. Our work
    studies this key property, dubbed word sensitivity (WS), in the prototypical setting
    of random features. We show that attention layers enjoy high WS, namely, there
    exists a vector in the space of embeddings that largely perturbs the random attention
    features map. The argument critically exploits the role of the softmax in the
    attention layer, highlighting its benefit compared to other activations (e.g.,
    ReLU). In contrast, the WS of standard random features is of order 1/n−−√, n being
    the number of words in the textual sample, and thus it decays with the length
    of the context. We then translate these results on the word sensitivity into generalization
    bounds: due to their low WS, random features provably cannot learn to distinguish
    between two sentences that differ only in a single word; in contrast, due to their
    high WS, random attention features have higher generalization capabilities. We
    validate our theoretical results with experimental evidence over the BERT-Base
    word embeddings of the imdb review dataset.'
acknowledgement: The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank Mohammad Hossein Amani, Lorenzo Beretta, and Clement
  Rebuffel for helpful discussions.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. Towards understanding the word sensitivity of attention
    layers: A study via random features. In: <i>41st International Conference on Machine
    Learning</i>. Vol 235. ML Research Press; 2024:4300-4328.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). Towards understanding the word sensitivity
    of attention layers: A study via random features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4300–4328). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4300–4328. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “Towards understanding the word sensitivity of
    attention layers: A study via random features,” in <i>41st International Conference
    on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4300–4328.'
  ista: 'Bombari S, Mondelli M. 2024. Towards understanding the word sensitivity of
    attention layers: A study via random features. 41st International Conference on
    Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol.
    235, 4300–4328.'
  mla: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4300–28.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4300–4328.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:35:49Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2402.02969'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2402.02969
month: '07'
oa: 1
oa_version: Preprint
page: 4300-4328
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Towards understanding the word sensitivity of attention layers: A study via
  random features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '19518'
abstract:
- lang: eng
  text: "The rising footprint of machine learning has led to a focus on imposing model\r\nsparsity
    as a means of reducing computational and memory costs. For deep neural\r\nnetworks
    (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics\r\ninspired
    by the classical Optimal Brain Surgeon (OBS) framework [LeCun et al.,\r\n1989,
    Hassibi and Stork, 1992, Hassibi et al., 1993], which leverages loss curvature\r\ninformation
    to make better pruning decisions. Yet, these results still lack a solid\r\ntheoretical
    understanding, and it is unclear whether they can be improved by\r\nleveraging
    connections to the wealth of work on sparse recovery algorithms. In this\r\npaper,
    we draw new connections between these two areas and present new sparse\r\nrecovery
    algorithms inspired by the OBS framework that comes with theoretical\r\nguarantees
    under reasonable assumptions and have strong practical performance.\r\nSpecifically,
    our work starts from the observation that we can leverage curvature\r\ninformation
    in OBS-like fashion upon the projection step of classic iterative sparse\r\nrecovery
    algorithms such as IHT. We show for the first time that this leads both\r\nto
    improved convergence bounds under standard assumptions. Furthermore, we\r\npresent
    extensions of this approach to the practical task of obtaining accurate sparse\r\nDNNs,
    and validate it experimentally at scale for Transformer-based models on\r\nvision
    and language tasks."
acknowledged_ssus:
- _id: CampIT
acknowledgement: The authors thank the anonymous NeurIPS reviewers for their useful
  comments and feedback, the IT department from the Institute of Science and Technology
  Austria for the hardware support, and Weights and Biases for the infrastructure
  to track all our experiments. Mher Safaryan has received funding from the European
  Union’s Horizon 2020 research and innovation program under the Maria Skłodowska-Curie
  grant agreement No 101034413.
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Diyuan
  full_name: Wu, Diyuan
  id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
  last_name: Wu
- first_name: Ionut-Vlad
  full_name: Modoranu, Ionut-Vlad
  id: 449f7a18-f128-11eb-9611-9b430c0c6333
  last_name: Modoranu
- first_name: Mher
  full_name: Safaryan, Mher
  id: dd546b39-0804-11ed-9c55-ef075c39778d
  last_name: Safaryan
- first_name: Denis
  full_name: Kuznedelev, Denis
  last_name: Kuznedelev
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. The iterative
    optimal brain surgeon: Faster sparse recovery by leveraging second-order information.
    In: <i>38th Conference on Neural Information Processing Systems</i>. Vol 37. Neural
    Information Processing Systems Foundation; 2024.'
  apa: 'Wu, D., Modoranu, I.-V., Safaryan, M., Kuznedelev, D., &#38; Alistarh, D.-A.
    (2024). The iterative optimal brain surgeon: Faster sparse recovery by leveraging
    second-order information. In <i>38th Conference on Neural Information Processing
    Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing Systems
    Foundation.'
  chicago: 'Wu, Diyuan, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, and
    Dan-Adrian Alistarh. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery
    by Leveraging Second-Order Information.” In <i>38th Conference on Neural Information
    Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation,
    2024.'
  ieee: 'D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, and D.-A. Alistarh, “The
    iterative optimal brain surgeon: Faster sparse recovery by leveraging second-order
    information,” in <i>38th Conference on Neural Information Processing Systems</i>,
    Vancouver, Canada, 2024, vol. 37.'
  ista: 'Wu D, Modoranu I-V, Safaryan M, Kuznedelev D, Alistarh D-A. 2024. The iterative
    optimal brain surgeon: Faster sparse recovery by leveraging second-order information.
    38th Conference on Neural Information Processing Systems. NeurIPS: Neural Information
    Processing Systems, Advances in Neural Information Processing Systems, vol. 37.'
  mla: 'Wu, Diyuan, et al. “The Iterative Optimal Brain Surgeon: Faster Sparse Recovery
    by Leveraging Second-Order Information.” <i>38th Conference on Neural Information
    Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation,
    2024.'
  short: D. Wu, I.-V. Modoranu, M. Safaryan, D. Kuznedelev, D.-A. Alistarh, in:, 38th
    Conference on Neural Information Processing Systems, Neural Information Processing
    Systems Foundation, 2024.
conference:
  end_date: 2024-12-15
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-09
corr_author: '1'
date_created: 2025-04-06T22:01:32Z
date_published: 2024-12-20T00:00:00Z
date_updated: 2025-05-14T11:37:10Z
day: '20'
department:
- _id: DaAl
- _id: MaMo
ec_funded: 1
external_id:
  arxiv:
  - '2408.17163'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2408.17163
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'The iterative optimal brain surgeon: Faster sparse recovery by leveraging
  second-order information'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '17330'
abstract:
- lang: eng
  text: In this work we consider the list-decodability and list-recoverability of
    arbitrary q -ary codes, for all integer values of q ⩾ 2. A code is called ( p
    , L ) q -list-decodable if every radius pn Hamming ball contains less than L codewords;
    ( p , ℓ, L ) q -recoverability is a generalization where we place radius pn Hamming
    balls on every point of a combinatorial rectangle with side length ℓ and again
    stipulate that there be less than L codewords. Our main contribution is to precisely
    calculate the maximum value of p for which there exist infinite families of positive
    rate ( p , ℓ, L ) q -list-recoverable codes, the quantity we call the zero-rate
    threshold . Denoting this value by p *, we in fact show that codes correcting
    a p * + ε fraction of errors must have size O ε (1), i.e., independent of n .
    Such a result is typically referred to as a “Plotkin bound.” To complement this,
    a standard random code with expurgation construction shows that there exist positive
    rate codes correcting a p * − ε fraction of errors. We also follow a classical
    proof template (typically attributed to Elias and Bassalygo) to derive from the
    zero-rate threshold other tradeoffs between rate and decoding radius for list-decoding
    and list-recovery. Technically, proving the Plotkin bound boils down to demon-strating
    the Schur convexity of a certain function defined on the q -simplex as well as
    the convexity of a univariate function derived from it. We remark that an earlier
    argument claimed similar results for q -ary list-decoding; however, we point out
    that this earlier proof is flawed.
acknowledgement: Part of this work was done while Nicolas Resch was affiliated with
  the Centrum Wiskunde & Informatica and supported in part by ERC H2020 grant No.74079
  (ALGSTRONGCRYPTO). Chen Yuan is supported in part by the National Key Research and
  Development Projects under Grant 2022YFA1004900 and Grant 2021YFE0109900, the National
  Natural Science Foundation of China under Grant 12101403 and Grant 12031011.
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Nicolas
  full_name: Resch, Nicolas
  last_name: Resch
- first_name: Chen
  full_name: Yuan, Chen
  last_name: Yuan
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
citation:
  ama: Resch N, Yuan C, Zhang Y. Zero-rate thresholds and new capacity bounds for
    list-decoding and list-recovery. <i>IEEE Transactions on Information Theory</i>.
    2024;70(9):6211-6238. doi:<a href="https://doi.org/10.1109/TIT.2024.3430842">10.1109/TIT.2024.3430842</a>
  apa: Resch, N., Yuan, C., &#38; Zhang, Y. (2024). Zero-rate thresholds and new capacity
    bounds for list-decoding and list-recovery. <i>IEEE Transactions on Information
    Theory</i>. IEEE. <a href="https://doi.org/10.1109/TIT.2024.3430842">https://doi.org/10.1109/TIT.2024.3430842</a>
  chicago: Resch, Nicolas, Chen Yuan, and Yihan Zhang. “Zero-Rate Thresholds and New
    Capacity Bounds for List-Decoding and List-Recovery.” <i>IEEE Transactions on
    Information Theory</i>. IEEE, 2024. <a href="https://doi.org/10.1109/TIT.2024.3430842">https://doi.org/10.1109/TIT.2024.3430842</a>.
  ieee: N. Resch, C. Yuan, and Y. Zhang, “Zero-rate thresholds and new capacity bounds
    for list-decoding and list-recovery,” <i>IEEE Transactions on Information Theory</i>,
    vol. 70, no. 9. IEEE, pp. 6211–6238, 2024.
  ista: Resch N, Yuan C, Zhang Y. 2024. Zero-rate thresholds and new capacity bounds
    for list-decoding and list-recovery. IEEE Transactions on Information Theory.
    70(9), 6211–6238.
  mla: Resch, Nicolas, et al. “Zero-Rate Thresholds and New Capacity Bounds for List-Decoding
    and List-Recovery.” <i>IEEE Transactions on Information Theory</i>, vol. 70, no.
    9, IEEE, 2024, pp. 6211–38, doi:<a href="https://doi.org/10.1109/TIT.2024.3430842">10.1109/TIT.2024.3430842</a>.
  short: N. Resch, C. Yuan, Y. Zhang, IEEE Transactions on Information Theory 70 (2024)
    6211–6238.
corr_author: '1'
date_created: 2024-07-28T22:01:10Z
date_published: 2024-09-01T00:00:00Z
date_updated: 2025-09-08T08:31:53Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/TIT.2024.3430842
external_id:
  arxiv:
  - '2210.07754'
  isi:
  - '001299623600019'
intvolume: '        70'
isi: 1
issue: '9'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2210.07754
month: '09'
oa: 1
oa_version: Preprint
page: 6211-6238
publication: IEEE Transactions on Information Theory
publication_identifier:
  eissn:
  - 1557-9654
  issn:
  - 0018-9448
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
  record:
  - id: '14083'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: Zero-rate thresholds and new capacity bounds for list-decoding and list-recovery
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
volume: 70
year: '2024'
...
---
OA_place: repository
_id: '17350'
abstract:
- lang: eng
  text: "Score-based generative models (SGMs) are powerful tools to sample from\r\ncomplex
    data distributions. Their underlying idea is to (i) run a forward\r\nprocess for
    time $T_1$ by adding noise to the data, (ii) estimate its score\r\nfunction, and
    (iii) use such estimate to run a reverse process. As the reverse\r\nprocess is
    initialized with the stationary distribution of the forward one, the\r\nexisting
    analysis paradigm requires $T_1\\to\\infty$. This is however\r\nproblematic: from
    a theoretical viewpoint, for a given precision of the score\r\napproximation,
    the convergence guarantee fails as $T_1$ diverges; from a\r\npractical viewpoint,
    a large $T_1$ increases computational costs and leads to\r\nerror propagation.
    This paper addresses the issue by considering a version of\r\nthe popular predictor-corrector
    scheme: after running the forward process, we\r\nfirst estimate the final distribution
    via an inexact Langevin dynamics and then\r\nrevert the process. Our key technical
    contribution is to provide convergence\r\nguarantees which require to run the
    forward process only for a fixed finite\r\ntime $T_1$. Our bounds exhibit a mild
    logarithmic dependence on the input\r\ndimension and the subgaussian norm of the
    target distribution, have minimal\r\nassumptions on the data, and require only
    to control the $L^2$ loss on the\r\nscore approximation, which is the quantity
    minimized in practice."
article_processing_charge: No
arxiv: 1
author:
- first_name: Francesco
  full_name: Pedrotti, Francesco
  id: d3ac8ac6-dc8d-11ea-abe3-e2a9628c4c3c
  last_name: Pedrotti
- first_name: Jan
  full_name: Maas, Jan
  id: 4C5696CE-F248-11E8-B48F-1D18A9856A87
  last_name: Maas
  orcid: 0000-0002-0845-1338
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion
    models via prediction-correction. <i>arXiv</i>. doi:<a href="https://doi.org/10.48550/arXiv.2305.14164">10.48550/arXiv.2305.14164</a>
  apa: Pedrotti, F., Maas, J., &#38; Mondelli, M. (n.d.). Improved convergence of
    score-based diffusion models via prediction-correction. <i>arXiv</i>. <a href="https://doi.org/10.48550/arXiv.2305.14164">https://doi.org/10.48550/arXiv.2305.14164</a>
  chicago: Pedrotti, Francesco, Jan Maas, and Marco Mondelli. “Improved Convergence
    of Score-Based Diffusion Models via Prediction-Correction.” <i>ArXiv</i>, n.d.
    <a href="https://doi.org/10.48550/arXiv.2305.14164">https://doi.org/10.48550/arXiv.2305.14164</a>.
  ieee: F. Pedrotti, J. Maas, and M. Mondelli, “Improved convergence of score-based
    diffusion models via prediction-correction,” <i>arXiv</i>. .
  ista: Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion
    models via prediction-correction. arXiv, <a href="https://doi.org/10.48550/arXiv.2305.14164">10.48550/arXiv.2305.14164</a>.
  mla: Pedrotti, Francesco, et al. “Improved Convergence of Score-Based Diffusion
    Models via Prediction-Correction.” <i>ArXiv</i>, doi:<a href="https://doi.org/10.48550/arXiv.2305.14164">10.48550/arXiv.2305.14164</a>.
  short: F. Pedrotti, J. Maas, M. Mondelli, ArXiv (n.d.).
corr_author: '1'
date_created: 2024-07-31T07:56:40Z
date_published: 2024-06-06T00:00:00Z
date_updated: 2026-04-07T13:00:02Z
day: '06'
department:
- _id: JaMa
- _id: MaMo
doi: 10.48550/arXiv.2305.14164
external_id:
  arxiv:
  - '2305.14164'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2305.14164
month: '06'
oa: 1
oa_version: Preprint
project:
- _id: fc31cba2-9c52-11eb-aca3-ff467d239cd2
  grant_number: F6504
  name: Taming Complexity in Partial Differential Systems
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: arXiv
publication_status: draft
related_material:
  record:
  - id: '18897'
    relation: later_version
    status: public
  - id: '17336'
    relation: dissertation_contains
    status: public
status: public
title: Improved convergence of score-based diffusion models via prediction-correction
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
_id: '17893'
abstract:
- lang: eng
  text: Strong data processing inequalities (SDPI) are an important object of study
    in Information Theory and have been well studied for f -divergences. Universal
    upper and lower bounds have been provided along with several applications, connecting
    them to impossibility (converse) results, concentration of measure, hypercontractivity,
    and so on. In this paper, we study Renyi divergence and the corresponding SDPI
    constant whose behavior seems to deviate from that of ordinary <1>-divergences.
    In particular, one can find examples showing that the universal upper bound relating
    its SDPI constant to the one of Total Variation does not hold in general. In this
    work, we prove, however, that the universal lower bound involving the SDPI constant
    of the Chi-square divergence does indeed hold. Furthermore, we also provide a
    characterization of the distribution that achieves the supremum when is equal
    to 2 and consequently compute the SDPI constant for Renyi divergence of the general
    binary channel.
acknowledgement: "The work in this paper was supported in part by the Swiss National
  Science Foundation under Grant 200364.\r\n"
article_processing_charge: No
arxiv: 1
author:
- first_name: Lifu
  full_name: Jin, Lifu
  last_name: Jin
- first_name: Amedeo Roberto
  full_name: Esposito, Amedeo Roberto
  id: 9583e921-e1ad-11ec-9862-cef099626dc9
  last_name: Esposito
- first_name: Michael
  full_name: Gastpar, Michael
  last_name: Gastpar
citation:
  ama: 'Jin L, Esposito AR, Gastpar M. Properties of the strong data processing constant
    for Rényi divergence. In: <i>Proceedings of the 2024 IEEE International Symposium
    on Information Theory</i>. Institute of Electrical and Electronics Engineers;
    2024:3178-3183. doi:<a href="https://doi.org/10.1109/ISIT57864.2024.10619367">10.1109/ISIT57864.2024.10619367</a>'
  apa: 'Jin, L., Esposito, A. R., &#38; Gastpar, M. (2024). Properties of the strong
    data processing constant for Rényi divergence. In <i>Proceedings of the 2024 IEEE
    International Symposium on Information Theory</i> (pp. 3178–3183). Athens, Greece:
    Institute of Electrical and Electronics Engineers. <a href="https://doi.org/10.1109/ISIT57864.2024.10619367">https://doi.org/10.1109/ISIT57864.2024.10619367</a>'
  chicago: Jin, Lifu, Amedeo Roberto Esposito, and Michael Gastpar. “Properties of
    the Strong Data Processing Constant for Rényi Divergence.” In <i>Proceedings of
    the 2024 IEEE International Symposium on Information Theory</i>, 3178–83. Institute
    of Electrical and Electronics Engineers, 2024. <a href="https://doi.org/10.1109/ISIT57864.2024.10619367">https://doi.org/10.1109/ISIT57864.2024.10619367</a>.
  ieee: L. Jin, A. R. Esposito, and M. Gastpar, “Properties of the strong data processing
    constant for Rényi divergence,” in <i>Proceedings of the 2024 IEEE International
    Symposium on Information Theory</i>, Athens, Greece, 2024, pp. 3178–3183.
  ista: 'Jin L, Esposito AR, Gastpar M. 2024. Properties of the strong data processing
    constant for Rényi divergence. Proceedings of the 2024 IEEE International Symposium
    on Information Theory. ISIT: International Symposium on Information Theory, 3178–3183.'
  mla: Jin, Lifu, et al. “Properties of the Strong Data Processing Constant for Rényi
    Divergence.” <i>Proceedings of the 2024 IEEE International Symposium on Information
    Theory</i>, Institute of Electrical and Electronics Engineers, 2024, pp. 3178–83,
    doi:<a href="https://doi.org/10.1109/ISIT57864.2024.10619367">10.1109/ISIT57864.2024.10619367</a>.
  short: L. Jin, A.R. Esposito, M. Gastpar, in:, Proceedings of the 2024 IEEE International
    Symposium on Information Theory, Institute of Electrical and Electronics Engineers,
    2024, pp. 3178–3183.
conference:
  end_date: 2024-07-12
  location: Athens, Greece
  name: 'ISIT: International Symposium on Information Theory'
  start_date: 2024-07-07
corr_author: '1'
date_created: 2024-09-08T22:01:12Z
date_published: 2024-08-19T00:00:00Z
date_updated: 2025-09-08T09:18:00Z
day: '19'
department:
- _id: MaMo
doi: 10.1109/ISIT57864.2024.10619367
external_id:
  arxiv:
  - '2403.10656'
  isi:
  - '001304426903055'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: 'https://doi.org/10.48550/arXiv.2403.10656 '
month: '08'
oa: 1
oa_version: Preprint
page: 3178-3183
publication: Proceedings of the 2024 IEEE International Symposium on Information Theory
publication_identifier:
  isbn:
  - '9798350382846'
  issn:
  - 2157-8095
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Properties of the strong data processing constant for Rényi divergence
type: conference
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
year: '2024'
...
---
_id: '17894'
abstract:
- lang: eng
  text: 'Sibson''s α -mutual information has received renewed attention recently in
    several contexts: concentration of measure under dependence, statistical learning,
    hypothesis testing, and estimation theory. In this work, we introduce several
    variational representations of Sibson''s α -mutual information: 1) as a supremum
    over joint distributions of (a combination of) KL divergences; and 2) as a supremum
    over functions of opportune expected values. Leveraging them, we produce a variety
    of novel and known results, including a generalization of transportation-cost
    inequalities and Fano''s inequality.'
acknowledgement: The work in this paper was supported in part by the Swiss National
  Science Foundation under Grant 200364.
article_processing_charge: No
author:
- first_name: Amedeo Roberto
  full_name: Esposito, Amedeo Roberto
  id: 9583e921-e1ad-11ec-9862-cef099626dc9
  last_name: Esposito
- first_name: Michael
  full_name: Gastpar, Michael
  last_name: Gastpar
- first_name: Ibrahim
  full_name: Issa, Ibrahim
  last_name: Issa
citation:
  ama: 'Esposito AR, Gastpar M, Issa I. Variational characterizations of Sibson’s
    α-mutual information. In: <i>Proceedings of the 2024 IEEE International Symposium
    on Information Theory </i>. Institute of Electrical and Electronics Engineers;
    2024:2110-2115. doi:<a href="https://doi.org/10.1109/ISIT57864.2024.10619378">10.1109/ISIT57864.2024.10619378</a>'
  apa: 'Esposito, A. R., Gastpar, M., &#38; Issa, I. (2024). Variational characterizations
    of Sibson’s α-mutual information. In <i>Proceedings of the 2024 IEEE International
    Symposium on Information Theory </i> (pp. 2110–2115). Athens, Greece: Institute
    of Electrical and Electronics Engineers. <a href="https://doi.org/10.1109/ISIT57864.2024.10619378">https://doi.org/10.1109/ISIT57864.2024.10619378</a>'
  chicago: Esposito, Amedeo Roberto, Michael Gastpar, and Ibrahim Issa. “Variational
    Characterizations of Sibson’s α-Mutual Information.” In <i>Proceedings of the
    2024 IEEE International Symposium on Information Theory </i>, 2110–15. Institute
    of Electrical and Electronics Engineers, 2024. <a href="https://doi.org/10.1109/ISIT57864.2024.10619378">https://doi.org/10.1109/ISIT57864.2024.10619378</a>.
  ieee: A. R. Esposito, M. Gastpar, and I. Issa, “Variational characterizations of
    Sibson’s α-mutual information,” in <i>Proceedings of the 2024 IEEE International
    Symposium on Information Theory </i>, Athens, Greece, 2024, pp. 2110–2115.
  ista: 'Esposito AR, Gastpar M, Issa I. 2024. Variational characterizations of Sibson’s
    α-mutual information. Proceedings of the 2024 IEEE International Symposium on
    Information Theory . ISIT: International Symposium on Information Theory, 2110–2115.'
  mla: Esposito, Amedeo Roberto, et al. “Variational Characterizations of Sibson’s
    α-Mutual Information.” <i>Proceedings of the 2024 IEEE International Symposium
    on Information Theory </i>, Institute of Electrical and Electronics Engineers,
    2024, pp. 2110–15, doi:<a href="https://doi.org/10.1109/ISIT57864.2024.10619378">10.1109/ISIT57864.2024.10619378</a>.
  short: A.R. Esposito, M. Gastpar, I. Issa, in:, Proceedings of the 2024 IEEE International
    Symposium on Information Theory , Institute of Electrical and Electronics Engineers,
    2024, pp. 2110–2115.
conference:
  end_date: 2024-07-12
  location: Athens, Greece
  name: 'ISIT: International Symposium on Information Theory'
  start_date: 2024-07-07
corr_author: '1'
date_created: 2024-09-08T22:01:12Z
date_published: 2024-08-19T00:00:00Z
date_updated: 2025-09-08T09:18:44Z
day: '19'
department:
- _id: MaMo
doi: 10.1109/ISIT57864.2024.10619378
external_id:
  isi:
  - '001304426902023'
isi: 1
language:
- iso: eng
month: '08'
oa_version: None
page: 2110-2115
publication: 'Proceedings of the 2024 IEEE International Symposium on Information
  Theory '
publication_identifier:
  isbn:
  - '9798350382846'
  issn:
  - 2157-8095
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Variational characterizations of Sibson's α-mutual information
type: conference
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
year: '2024'
...
---
_id: '17895'
abstract:
- lang: eng
  text: We propose a concatenated code construction for a class of discrete-alphabet
    oblivious arbitrarily varying channels (AVCs) with cost constraints. The code
    has time and space complexity polynomial in the blocklength n . It uses a Reed-Solomon
    outer code, logarithmic blocklength random inner codes, and stochastic encoding
    by permuting the codeword before transmission. When the channel satisfies a condition
    called strong DS-nonsymmetrizability (a modified version of nonsymmetrizability
    originally due to Dobrushin and Stambler), we show that the code achieves a rate
    that for a variety of oblivious AVCs (such as classically studied error/erasure
    channels) match the known capacities.
acknowledgement: "The work of M. Langberg and A. D. Sarwate was supported in part
  by the US NSF under awards CCF-1909451 and CCF1909468. B. K. Dey was supported in
  part by the Bharti Centre\r\nfor Communication in IIT Bombay. "
article_processing_charge: No
author:
- first_name: B. K.
  full_name: Dey, B. K.
  last_name: Dey
- first_name: S.
  full_name: Jaggi, S.
  last_name: Jaggi
- first_name: M.
  full_name: Langberg, M.
  last_name: Langberg
- first_name: A. D.
  full_name: Sarwate, A. D.
  last_name: Sarwate
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
citation:
  ama: 'Dey BK, Jaggi S, Langberg M, Sarwate AD, Zhang Y. Computationally efficient
    codes for strongly Dobrushin-Stambler nonsymmetrizable oblivious AVCs. In: <i>Proceedings
    of the 2024 IEEE International Symposium on Information Theory </i>. Institute
    of Electrical and Electronics Engineers; 2024:1586-1591. doi:<a href="https://doi.org/10.1109/ISIT57864.2024.10619362">10.1109/ISIT57864.2024.10619362</a>'
  apa: 'Dey, B. K., Jaggi, S., Langberg, M., Sarwate, A. D., &#38; Zhang, Y. (2024).
    Computationally efficient codes for strongly Dobrushin-Stambler nonsymmetrizable
    oblivious AVCs. In <i>Proceedings of the 2024 IEEE International Symposium on
    Information Theory </i> (pp. 1586–1591). Athens, Greece: Institute of Electrical
    and Electronics Engineers. <a href="https://doi.org/10.1109/ISIT57864.2024.10619362">https://doi.org/10.1109/ISIT57864.2024.10619362</a>'
  chicago: Dey, B. K., S. Jaggi, M. Langberg, A. D. Sarwate, and Yihan Zhang. “Computationally
    Efficient Codes for Strongly Dobrushin-Stambler Nonsymmetrizable Oblivious AVCs.”
    In <i>Proceedings of the 2024 IEEE International Symposium on Information Theory
    </i>, 1586–91. Institute of Electrical and Electronics Engineers, 2024. <a href="https://doi.org/10.1109/ISIT57864.2024.10619362">https://doi.org/10.1109/ISIT57864.2024.10619362</a>.
  ieee: B. K. Dey, S. Jaggi, M. Langberg, A. D. Sarwate, and Y. Zhang, “Computationally
    efficient codes for strongly Dobrushin-Stambler nonsymmetrizable oblivious AVCs,”
    in <i>Proceedings of the 2024 IEEE International Symposium on Information Theory
    </i>, Athens, Greece, 2024, pp. 1586–1591.
  ista: 'Dey BK, Jaggi S, Langberg M, Sarwate AD, Zhang Y. 2024. Computationally efficient
    codes for strongly Dobrushin-Stambler nonsymmetrizable oblivious AVCs. Proceedings
    of the 2024 IEEE International Symposium on Information Theory . ISIT: International
    Symposium on Information Theory, 1586–1591.'
  mla: Dey, B. K., et al. “Computationally Efficient Codes for Strongly Dobrushin-Stambler
    Nonsymmetrizable Oblivious AVCs.” <i>Proceedings of the 2024 IEEE International
    Symposium on Information Theory </i>, Institute of Electrical and Electronics
    Engineers, 2024, pp. 1586–91, doi:<a href="https://doi.org/10.1109/ISIT57864.2024.10619362">10.1109/ISIT57864.2024.10619362</a>.
  short: B.K. Dey, S. Jaggi, M. Langberg, A.D. Sarwate, Y. Zhang, in:, Proceedings
    of the 2024 IEEE International Symposium on Information Theory , Institute of
    Electrical and Electronics Engineers, 2024, pp. 1586–1591.
conference:
  end_date: 2024-07-12
  location: Athens, Greece
  name: 'ISIT: International Symposium on Information Theory'
  start_date: 2024-07-07
date_created: 2024-09-08T22:01:12Z
date_published: 2024-08-19T00:00:00Z
date_updated: 2025-09-08T09:19:25Z
day: '19'
department:
- _id: MaMo
doi: 10.1109/ISIT57864.2024.10619362
external_id:
  isi:
  - '001304426901091'
isi: 1
language:
- iso: eng
month: '08'
oa_version: None
page: 1586-1591
publication: 'Proceedings of the 2024 IEEE International Symposium on Information
  Theory '
publication_identifier:
  isbn:
  - '9798350382846'
  issn:
  - 2157-8095
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Computationally efficient codes for strongly Dobrushin-Stambler nonsymmetrizable
  oblivious AVCs
type: conference
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
year: '2024'
...
---
OA_place: repository
_id: '17465'
abstract:
- lang: eng
  text: "In the modern age of machine learning, artificial neural networks have become
    an integral part\r\nof many practical systems. One of the key ingredients of the
    success of the deep learning\r\napproach is recent computational advances which
    allowed the training of models with billions\r\nof parameters on large-scale data.
    Such over-parameterized and data-hungry regimes pose a\r\nchallenge for the theoretical
    analysis of modern models since “classical” statistical wisdom\r\nis no longer
    applicable. In this view, it is paramount to extend or develop new machinery\r\nthat
    will allow tackling the neural network analysis under new challenging asymptotic
    regimes,\r\nwhich is the focus of this thesis.\r\nLarge neural network systems
    are usually optimized via “local” search algorithms, such\r\nas stochastic gradient
    descent (SGD). However, given the high-dimensional nature of the\r\nparameter
    space, it is a priori not clear why such a crude “local” approach works so remarkably\r\nwell
    in practice. We take a step towards demystifying this phenomenon by showing that\r\nthe
    landscape of the SGD training dynamics exhibits a few beneficial properties for
    the\r\noptimization. First, we show that along the SGD trajectory an over-parameterized
    network\r\nis dropout stable. The emergence of dropout stability allows to conclude
    that the minima\r\nfound by SGD are connected via a continuous path of small loss.
    This in turn means that\r\nthe high-dimensional landscape of the neural network
    optimization problem is provably not so\r\nunfavourable to gradient-based training,
    due to mode connectivity. Next, we show that SGD\r\nfor an over-parameterized
    network tends to find solutions that are functionally more “simple”.\r\nThis in
    turn means that the SGD minima are more robust, since a less complicated solution\r\nwill
    less likely overfit the data. More formally, for a prototypical example of a wide
    two-layer\r\nReLU network on a 1d regression task we show that the SGD algorithm
    is implicitly selective in\r\nits choice of an interpolating solution. Namely,
    at convergence the neural network implements\r\na piece-wise linear function with
    the number of linear regions depending only on the amount\r\nof training data.
    This is in contrast to a “smooth”-like behaviour which one would expect\r\ngiven
    such a severe over-parameterization of the model.\r\nDiverging from the generic
    supervised setting of classification and regression problems, we\r\nanalyze an
    auto-encoder model that is commonly used for representation learning and data\r\ncompression.
    Despite the wide applicability of the auto-encoding paradigm, the theoretical\r\nunderstanding
    of their behaviour is limited even in the simplistic shallow case. The related\r\nwork
    is restricted to extreme asymptotic regimes in which the auto-encoder is either
    severely\r\nover-parameterized or under-parameterized. In contrast, we provide
    a tight characterization\r\nfor the 1-bit compression of Gaussian signals in the
    challenging proportional regime, i.e., the\r\ninput dimension and the size of
    the compressed representation obey the same asymptotics.\r\nWe also show that
    gradient-based methods are able to find a globally optimal solution and\r\nthat
    the predictions made for Gaussian data extrapolate beyond - to the case of compression\r\nof
    natural images. Next, we relax the Gaussian assumption and study more structured
    input\r\nsources. We show that the shallow model is sometimes agnostic to the
    structure of the data\r\nvii\r\nwhich results in a Gaussian-like behaviour. We
    prove that making the decoding component\r\nslightly less shallow is already enough
    to escape the “curse” of Gaussian performance.\r\n"
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
citation:
  ama: Shevchenko A. High-dimensional limits in artificial neural networks. 2024.
    doi:<a href="https://doi.org/10.15479/at:ista:17465">10.15479/at:ista:17465</a>
  apa: Shevchenko, A. (2024). <i>High-dimensional limits in artificial neural networks</i>.
    Institute of Science and Technology Austria. <a href="https://doi.org/10.15479/at:ista:17465">https://doi.org/10.15479/at:ista:17465</a>
  chicago: Shevchenko, Alexander. “High-Dimensional Limits in Artificial Neural Networks.”
    Institute of Science and Technology Austria, 2024. <a href="https://doi.org/10.15479/at:ista:17465">https://doi.org/10.15479/at:ista:17465</a>.
  ieee: A. Shevchenko, “High-dimensional limits in artificial neural networks,” Institute
    of Science and Technology Austria, 2024.
  ista: Shevchenko A. 2024. High-dimensional limits in artificial neural networks.
    Institute of Science and Technology Austria.
  mla: Shevchenko, Alexander. <i>High-Dimensional Limits in Artificial Neural Networks</i>.
    Institute of Science and Technology Austria, 2024, doi:<a href="https://doi.org/10.15479/at:ista:17465">10.15479/at:ista:17465</a>.
  short: A. Shevchenko, High-Dimensional Limits in Artificial Neural Networks, Institute
    of Science and Technology Austria, 2024.
corr_author: '1'
date_created: 2024-08-28T15:14:25Z
date_published: 2024-08-29T00:00:00Z
date_updated: 2026-06-18T17:55:53Z
day: '29'
ddc:
- '519'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: MaMo
doi: 10.15479/at:ista:17465
file:
- access_level: open_access
  checksum: da6dd3166078934577f6af93d27000e2
  content_type: application/pdf
  creator: ashevche
  date_created: 2024-09-02T09:23:32Z
  date_updated: 2024-10-05T22:30:05Z
  embargo: 2024-10-04
  file_id: '17482'
  file_name: thesis_a2b.pdf
  file_size: 4468610
  relation: main_file
- access_level: closed
  checksum: 76a39ef252239560923cdda4ce0a31a4
  content_type: application/zip
  creator: ashevche
  date_created: 2024-09-02T09:23:46Z
  date_updated: 2024-10-05T22:30:05Z
  embargo_to: open_access
  file_id: '17483'
  file_name: Thesis Alex - ISTA.zip
  file_size: 15930999
  relation: source_file
file_date_updated: 2024-10-05T22:30:05Z
has_accepted_license: '1'
language:
- iso: eng
month: '08'
oa: 1
oa_version: Published Version
page: '232'
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
  grant_number: W1260-N35
  name: Vienna Graduate School on Computational Optimization
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '11420'
    relation: part_of_dissertation
    status: public
  - id: '14459'
    relation: part_of_dissertation
    status: public
  - id: '9198'
    relation: part_of_dissertation
    status: public
  - id: '17469'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: High-dimensional limits in artificial neural networks
type: dissertation
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2024'
...
---
_id: '17469'
abstract:
- lang: eng
  text: 'Autoencoders are a prominent model in many empirical branches of machine
    learning and lossy data compression. However, basic theoretical questions remain
    unanswered even in a shallow two-layer setting. In particular, to what degree
    does a shallow autoencoder capture the structure of the underlying data distribution?
    For the prototypical case of the 1-bit compression of sparse Gaussian data, we
    prove that gradient descent converges to a solution that completely disregards
    the sparse structure of the input. Namely, the performance of the algorithm is
    the same as if it was compressing a Gaussian source - with no sparsity. For general
    data distributions, we give evidence of a phase transition phenomenon in the shape
    of the gradient descent minimizer, as a function of the data sparsity: below the
    critical sparsity level, the minimizer is a rotation taken uniformly at random
    (just like in the compression of non-sparse data); above the critical sparsity,
    the minimizer is the identity (up to a permutation). Finally, by exploiting a
    connection with approximate message passing algorithms, we show how to improve
    upon Gaussian performance for the compression of sparse data: adding a denoising
    function to a shallow architecture already reduces the loss provably, and a suitable
    multi-layer decoder leads to a further improvement. We validate our findings on
    image datasets, such as CIFAR-10 and MNIST.'
acknowledgement: "Kevin Kogler, Alexander Shevchenko and Marco Mondelli are supported
  by the 2019 Lopez-Loreta Prize. Hamed\r\nHassani acknowledges the support by the
  NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data
  Science (EnCORE)."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Kevin
  full_name: Kögler, Kevin
  id: 94ec913c-dc85-11ea-9058-e5051ab2428b
  last_name: Kögler
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Hamed
  full_name: Hassani, Hamed
  last_name: Hassani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Kögler K, Shevchenko A, Hassani H, Mondelli M. Compression of structured data
    with autoencoders: Provable benefit of nonlinearities and depth. In: <i>Proceedings
    of the 41st International Conference on Machine Learning</i>. Vol 235. ML Research
    Press; 2024:24964-25015.'
  apa: 'Kögler, K., Shevchenko, A., Hassani, H., &#38; Mondelli, M. (2024). Compression
    of structured data with autoencoders: Provable benefit of nonlinearities and depth.
    In <i>Proceedings of the 41st International Conference on Machine Learning</i>
    (Vol. 235, pp. 24964–25015). Vienna, Austria: ML Research Press.'
  chicago: 'Kögler, Kevin, Alexander Shevchenko, Hamed Hassani, and Marco Mondelli.
    “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities
    and Depth.” In <i>Proceedings of the 41st International Conference on Machine
    Learning</i>, 235:24964–15. ML Research Press, 2024.'
  ieee: 'K. Kögler, A. Shevchenko, H. Hassani, and M. Mondelli, “Compression of structured
    data with autoencoders: Provable benefit of nonlinearities and depth,” in <i>Proceedings
    of the 41st International Conference on Machine Learning</i>, Vienna, Austria,
    2024, vol. 235, pp. 24964–25015.'
  ista: 'Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured
    data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings
    of the 41st International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 235, 24964–25015.'
  mla: 'Kögler, Kevin, et al. “Compression of Structured Data with Autoencoders: Provable
    Benefit of Nonlinearities and Depth.” <i>Proceedings of the 41st International
    Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 24964–5015.'
  short: K. Kögler, A. Shevchenko, H. Hassani, M. Mondelli, in:, Proceedings of the
    41st International Conference on Machine Learning, ML Research Press, 2024, pp.
    24964–25015.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2024-08-29T11:47:57Z
date_published: 2024-07-01T00:00:00Z
date_updated: 2026-06-26T22:33:09Z
day: '01'
ddc:
- '000'
department:
- _id: DaAl
- _id: MaMo
external_id:
  arxiv:
  - '2402.05013'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.mlr.press/v235/kogler24a.html
month: '07'
oa: 1
oa_version: Published Version
page: 24964-25015
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 41st International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '17465'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: 'Compression of structured data with autoencoders: Provable benefit of nonlinearities
  and depth'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
_id: '12838'
abstract:
- lang: eng
  text: We study the problem of high-dimensional multiple packing in Euclidean space.
    Multiple packing is a natural generalization of sphere packing and is defined
    as follows. Let N > 0 and L ∈ Z ≽2 . A multiple packing is a set C of points in
    R n such that any point in R n lies in the intersection of at most L – 1 balls
    of radius √ nN around points in C . Given a well-known connection with coding
    theory, multiple packings can be viewed as the Euclidean analog of list-decodable
    codes, which are well-studied for finite fields. In this paper, we derive the
    best known lower bounds on the optimal density of list-decodable infinite constellations
    for constant L under a stronger notion called average-radius multiple packing.
    To this end, we apply tools from high-dimensional geometry and large deviation
    theory.
acknowledgement: "YZ thanks Jiajin Li for making the observation given by Equation
  (23). He also would like to thank Nir Ailon and Ely Porat for several helpful conversations
  throughout this project, and Alexander Barg for insightful comments on the manuscript.\r\nYZ
  has received funding from the European Union’s Horizon 2020 research and innovation
  programme under grant agreement No 682203-ERC-[Inf-Speed-Tradeoff]. The work of
  SV was supported by a seed grant from IIT Hyderabad and the start-up research grant
  from the Science and Engineering Research Board, India (SRG/2020/000910)."
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
- first_name: Shashank
  full_name: Vatedka, Shashank
  last_name: Vatedka
citation:
  ama: 'Zhang Y, Vatedka S. Multiple packing: Lower bounds via infinite constellations.
    <i>IEEE Transactions on Information Theory</i>. 2023;69(7):4513-4527. doi:<a href="https://doi.org/10.1109/TIT.2023.3260950">10.1109/TIT.2023.3260950</a>'
  apa: 'Zhang, Y., &#38; Vatedka, S. (2023). Multiple packing: Lower bounds via infinite
    constellations. <i>IEEE Transactions on Information Theory</i>. IEEE. <a href="https://doi.org/10.1109/TIT.2023.3260950">https://doi.org/10.1109/TIT.2023.3260950</a>'
  chicago: 'Zhang, Yihan, and Shashank Vatedka. “Multiple Packing: Lower Bounds via
    Infinite Constellations.” <i>IEEE Transactions on Information Theory</i>. IEEE,
    2023. <a href="https://doi.org/10.1109/TIT.2023.3260950">https://doi.org/10.1109/TIT.2023.3260950</a>.'
  ieee: 'Y. Zhang and S. Vatedka, “Multiple packing: Lower bounds via infinite constellations,”
    <i>IEEE Transactions on Information Theory</i>, vol. 69, no. 7. IEEE, pp. 4513–4527,
    2023.'
  ista: 'Zhang Y, Vatedka S. 2023. Multiple packing: Lower bounds via infinite constellations.
    IEEE Transactions on Information Theory. 69(7), 4513–4527.'
  mla: 'Zhang, Yihan, and Shashank Vatedka. “Multiple Packing: Lower Bounds via Infinite
    Constellations.” <i>IEEE Transactions on Information Theory</i>, vol. 69, no.
    7, IEEE, 2023, pp. 4513–27, doi:<a href="https://doi.org/10.1109/TIT.2023.3260950">10.1109/TIT.2023.3260950</a>.'
  short: Y. Zhang, S. Vatedka, IEEE Transactions on Information Theory 69 (2023) 4513–4527.
date_created: 2023-04-16T22:01:09Z
date_published: 2023-07-01T00:00:00Z
date_updated: 2023-12-13T11:16:46Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/TIT.2023.3260950
external_id:
  arxiv:
  - '2211.04407'
  isi:
  - '001017307000023'
intvolume: '        69'
isi: 1
issue: '7'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2211.04407
month: '07'
oa: 1
oa_version: Preprint
page: 4513-4527
publication: IEEE Transactions on Information Theory
publication_identifier:
  eissn:
  - 1557-9654
  issn:
  - 0018-9448
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Multiple packing: Lower bounds via infinite constellations'
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 69
year: '2023'
...
---
_id: '12859'
abstract:
- lang: eng
  text: 'Machine learning models are vulnerable to adversarial perturbations, and
    a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through
    the lens of over-parameterization: interpolating smoothly the data requires significantly
    more parameters than simply memorizing it. However, this "universal" law provides
    only a necessary condition for robustness, and it is unable to discriminate between
    models. In this paper, we address these gaps by focusing on empirical risk minimization
    in two prototypical settings, namely, random features and the neural tangent kernel
    (NTK). We prove that, for random features, the model is not robust for any degree
    of over-parameterization, even when the necessary condition coming from the universal
    law of robustness is satisfied. In contrast, for even activations, the NTK model
    meets the universal lower bound, and it is robust as soon as the necessary condition
    on over-parameterization is fulfilled. This also addresses a conjecture in prior
    work by Bubeck, Li and Nagaraj. Our analysis decouples the effect of the kernel
    of the model from an "interaction matrix", which describes the interaction with
    the test data and captures the effect of the activation. Our theoretical results
    are corroborated by numerical evidence on both synthetic and standard datasets
    (MNIST, CIFAR-10).'
acknowledgement: "Simone Bombari and Marco Mondelli were partially supported by the
  2019 Lopez-Loreta prize, and\r\nthe authors would like to thank Hamed Hassani for
  helpful discussions.\r\n"
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Shayan
  full_name: Kiyani, Shayan
  id: f5a2b424-e339-11ed-8435-ff3b4fe70cf8
  last_name: Kiyani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Kiyani S, Mondelli M. Beyond the universal law of robustness: Sharper
    laws for random features and neural tangent kernels. In: <i>Proceedings of the
    40th International Conference on Machine Learning</i>. Vol 202. ML Research Press;
    2023:2738-2776.'
  apa: 'Bombari, S., Kiyani, S., &#38; Mondelli, M. (2023). Beyond the universal law
    of robustness: Sharper laws for random features and neural tangent kernels. In
    <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol.
    202, pp. 2738–2776). Honolulu, HI, United States: ML Research Press.'
  chicago: 'Bombari, Simone, Shayan Kiyani, and Marco Mondelli. “Beyond the Universal
    Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels.”
    In <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    202:2738–76. ML Research Press, 2023.'
  ieee: 'S. Bombari, S. Kiyani, and M. Mondelli, “Beyond the universal law of robustness:
    Sharper laws for random features and neural tangent kernels,” in <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, Honolulu, HI, United
    States, 2023, vol. 202, pp. 2738–2776.'
  ista: 'Bombari S, Kiyani S, Mondelli M. 2023. Beyond the universal law of robustness:
    Sharper laws for random features and neural tangent kernels. Proceedings of the
    40th International Conference on Machine Learning. ICML: International Conference
    on Machine Learning, PMLR, vol. 202, 2738–2776.'
  mla: 'Bombari, Simone, et al. “Beyond the Universal Law of Robustness: Sharper Laws
    for Random Features and Neural Tangent Kernels.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 2738–76.'
  short: S. Bombari, S. Kiyani, M. Mondelli, in:, Proceedings of the 40th International
    Conference on Machine Learning, ML Research Press, 2023, pp. 2738–2776.
conference:
  end_date: 2023-07-29
  location: Honolulu, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
corr_author: '1'
date_created: 2023-04-23T16:11:03Z
date_published: 2023-10-27T00:00:00Z
date_updated: 2025-04-15T07:50:16Z
day: '27'
department:
- _id: GradSch
- _id: MaMo
external_id:
  arxiv:
  - '2302.01629'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2302.01629
month: '10'
oa: 1
oa_version: Preprint
page: 2738-2776
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 40th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/simone-bombari/beyond-universal-robustness
status: public
title: 'Beyond the universal law of robustness: Sharper laws for random features and
  neural tangent kernels'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '13269'
abstract:
- lang: eng
  text: This paper is a collection of results on combinatorial properties of codes
    for the Z-channel . A Z-channel with error fraction τ takes as input a length-
    n binary codeword and injects in an adversarial manner up to n τ asymmetric errors,
    i.e., errors that only zero out bits but do not flip 0’s to 1’s. It is known that
    the largest ( L - 1)-list-decodable code for the Z-channel with error fraction
    τ has exponential size (in n ) if τ is less than a critical value that we call
    the ( L - 1)- list-decoding Plotkin point and has constant size if τ is larger
    than the threshold. The ( L -1)-list-decoding Plotkin point is known to be L -1/L-1
    – L -L/ L-1 , which equals 1/4 for unique-decoding with L -1 = 1. In this paper,
    we derive various results for the size of the largest codes above and below the
    list-decoding Plotkin point. In particular, we show that the largest ( L -1)-list-decodable
    code ε-above the Plotkin point, for any given sufficiently small positive constant
    ε > 0, has size Θ L (ε -3/2 ) for any L - 1 ≥ 1. We also devise upper and lower
    bounds on the exponential size of codes below the list-decoding Plotkin point.
acknowledgement: "Nikita Polyanskii’s research was conducted in part during October
  2020 - December 2021 with the Technical University of Munich and the Skolkovo Institute
  of Science and Technology. His work was supported by the German Research Foundation
  (Deutsche Forschungsgemeinschaft, DFG) under Grant No. WA3907/1-1 and the Russian
  Foundation for Basic Research (RFBR)\r\nunder Grant No. 20-01-00559.\r\nYihan Zhang
  is supported by funding from the European Union’s Horizon 2020 research and innovation
  programme under grant agreement No 682203-ERC-[Inf-Speed-Tradeoff]."
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Nikita
  full_name: Polyanskii, Nikita
  last_name: Polyanskii
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
citation:
  ama: Polyanskii N, Zhang Y. Codes for the Z-channel. <i>IEEE Transactions on Information
    Theory</i>. 2023;69(10):6340-6357. doi:<a href="https://doi.org/10.1109/TIT.2023.3292219">10.1109/TIT.2023.3292219</a>
  apa: Polyanskii, N., &#38; Zhang, Y. (2023). Codes for the Z-channel. <i>IEEE Transactions
    on Information Theory</i>. Institute of Electrical and Electronics Engineers.
    <a href="https://doi.org/10.1109/TIT.2023.3292219">https://doi.org/10.1109/TIT.2023.3292219</a>
  chicago: Polyanskii, Nikita, and Yihan Zhang. “Codes for the Z-Channel.” <i>IEEE
    Transactions on Information Theory</i>. Institute of Electrical and Electronics
    Engineers, 2023. <a href="https://doi.org/10.1109/TIT.2023.3292219">https://doi.org/10.1109/TIT.2023.3292219</a>.
  ieee: N. Polyanskii and Y. Zhang, “Codes for the Z-channel,” <i>IEEE Transactions
    on Information Theory</i>, vol. 69, no. 10. Institute of Electrical and Electronics
    Engineers, pp. 6340–6357, 2023.
  ista: Polyanskii N, Zhang Y. 2023. Codes for the Z-channel. IEEE Transactions on
    Information Theory. 69(10), 6340–6357.
  mla: Polyanskii, Nikita, and Yihan Zhang. “Codes for the Z-Channel.” <i>IEEE Transactions
    on Information Theory</i>, vol. 69, no. 10, Institute of Electrical and Electronics
    Engineers, 2023, pp. 6340–57, doi:<a href="https://doi.org/10.1109/TIT.2023.3292219">10.1109/TIT.2023.3292219</a>.
  short: N. Polyanskii, Y. Zhang, IEEE Transactions on Information Theory 69 (2023)
    6340–6357.
corr_author: '1'
date_created: 2023-07-23T22:01:14Z
date_published: 2023-07-04T00:00:00Z
date_updated: 2024-10-09T21:06:01Z
day: '04'
department:
- _id: MaMo
doi: 10.1109/TIT.2023.3292219
external_id:
  arxiv:
  - '2105.01427'
  isi:
  - '001069680100011'
intvolume: '        69'
isi: 1
issue: '10'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2105.01427
month: '07'
oa: 1
oa_version: Preprint
page: 6340-6357
publication: IEEE Transactions on Information Theory
publication_identifier:
  eissn:
  - 1557-9654
  issn:
  - 0018-9448
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Codes for the Z-channel
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 69
year: '2023'
...
---
_id: '13315'
abstract:
- lang: eng
  text: How do statistical dependencies in measurement noise influence high-dimensional
    inference? To answer this, we study the paradigmatic spiked matrix model of principal
    components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
    We go beyond the usual independence assumption on the noise entries, by drawing
    the noise from a low-order polynomial orthogonal matrix ensemble. The resulting
    noise correlations make the setting relevant for applications but analytically
    challenging. We provide characterization of the Bayes optimal limits of inference
    in this model. If the spike is rotation invariant, we show that standard spectral
    PCA is optimal. However, for more general priors, both PCA and the existing approximate
    message-passing algorithm (AMP) fall short of achieving the information-theoretic
    limits, which we compute using the replica method from statistical physics. We
    thus propose an AMP, inspired by the theory of adaptive Thouless–Anderson–Palmer
    equations, which is empirically observed to saturate the conjectured theoretical
    limit. This AMP comes with a rigorous state evolution analysis tracking its performance.
    Although we focus on specific noise distributions, our methodology can be generalized
    to a wide class of trace matrix ensembles at the cost of more involved expressions.
    Finally, despite the seemingly strong assumption of rotation-invariant noise,
    our theory empirically predicts algorithmic performance on real data, pointing
    at strong universality properties.
acknowledgement: J.B. was funded by the European Union (ERC, CHORAL, project number
  101039794). Views and opinions expressed are however those of the author(s) only
  and do not necessarily reflect those of the European Union or the European Research
  Council. Neither the European Union nor the granting authority can be held responsible
  for them. M.M. was supported by the 2019 Lopez-Loreta Prize. We would like to thank
  the reviewers for the insightful comments and, in particular, for suggesting the
  BAMP-inspired denoisers leading to AMP-AP.
article_number: e2302028120
article_processing_charge: Yes (in subscription journal)
article_type: original
author:
- first_name: Jean
  full_name: Barbier, Jean
  last_name: Barbier
- first_name: Francesco
  full_name: Camilli, Francesco
  last_name: Camilli
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Manuel
  full_name: Sáenz, Manuel
  last_name: Sáenz
citation:
  ama: Barbier J, Camilli F, Mondelli M, Sáenz M. Fundamental limits in structured
    principal component analysis and how to reach them. <i>Proceedings of the National
    Academy of Sciences of the United States of America</i>. 2023;120(30). doi:<a
    href="https://doi.org/10.1073/pnas.2302028120">10.1073/pnas.2302028120</a>
  apa: Barbier, J., Camilli, F., Mondelli, M., &#38; Sáenz, M. (2023). Fundamental
    limits in structured principal component analysis and how to reach them. <i>Proceedings
    of the National Academy of Sciences of the United States of America</i>. National
    Academy of Sciences. <a href="https://doi.org/10.1073/pnas.2302028120">https://doi.org/10.1073/pnas.2302028120</a>
  chicago: Barbier, Jean, Francesco Camilli, Marco Mondelli, and Manuel Sáenz. “Fundamental
    Limits in Structured Principal Component Analysis and How to Reach Them.” <i>Proceedings
    of the National Academy of Sciences of the United States of America</i>. National
    Academy of Sciences, 2023. <a href="https://doi.org/10.1073/pnas.2302028120">https://doi.org/10.1073/pnas.2302028120</a>.
  ieee: J. Barbier, F. Camilli, M. Mondelli, and M. Sáenz, “Fundamental limits in
    structured principal component analysis and how to reach them,” <i>Proceedings
    of the National Academy of Sciences of the United States of America</i>, vol.
    120, no. 30. National Academy of Sciences, 2023.
  ista: Barbier J, Camilli F, Mondelli M, Sáenz M. 2023. Fundamental limits in structured
    principal component analysis and how to reach them. Proceedings of the National
    Academy of Sciences of the United States of America. 120(30), e2302028120.
  mla: Barbier, Jean, et al. “Fundamental Limits in Structured Principal Component
    Analysis and How to Reach Them.” <i>Proceedings of the National Academy of Sciences
    of the United States of America</i>, vol. 120, no. 30, e2302028120, National Academy
    of Sciences, 2023, doi:<a href="https://doi.org/10.1073/pnas.2302028120">10.1073/pnas.2302028120</a>.
  short: J. Barbier, F. Camilli, M. Mondelli, M. Sáenz, Proceedings of the National
    Academy of Sciences of the United States of America 120 (2023).
date_created: 2023-07-30T22:01:02Z
date_published: 2023-07-25T00:00:00Z
date_updated: 2025-09-09T12:41:50Z
day: '25'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.1073/pnas.2302028120
external_id:
  isi:
  - '001121663500001'
  pmid:
  - '37463204'
file:
- access_level: open_access
  checksum: 1fc06228afdb3aa80cf8e7766bcf9dc5
  content_type: application/pdf
  creator: dernst
  date_created: 2023-07-31T07:30:48Z
  date_updated: 2023-07-31T07:30:48Z
  file_id: '13323'
  file_name: 2023_PNAS_Barbier.pdf
  file_size: 995933
  relation: main_file
  success: 1
file_date_updated: 2023-07-31T07:30:48Z
has_accepted_license: '1'
intvolume: '       120'
isi: 1
issue: '30'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
pmid: 1
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the National Academy of Sciences of the United States
  of America
publication_identifier:
  eissn:
  - 1091-6490
publication_status: published
publisher: National Academy of Sciences
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/fcamilli95/Structured-PCA-
scopus_import: '1'
status: public
title: Fundamental limits in structured principal component analysis and how to reach
  them
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
volume: 120
year: '2023'
...
---
_id: '13321'
abstract:
- lang: eng
  text: We consider the problem of reconstructing the signal and the hidden variables
    from observations coming from a multi-layer network with rotationally invariant
    weight matrices. The multi-layer structure models inference from deep generative
    priors, and the rotational invariance imposed on the weights generalizes the i.i.d.
    Gaussian assumption by allowing for a complex correlation structure, which is
    typical in applications. In this work, we present a new class of approximate message
    passing (AMP) algorithms and give a state evolution recursion which precisely
    characterizes their performance in the large system limit. In contrast with the
    existing multi-layer VAMP (ML-VAMP) approach, our proposed AMP – dubbed multilayer
    rotationally invariant generalized AMP (ML-RI-GAMP) – provides a natural generalization
    beyond Gaussian designs, in the sense that it recovers the existing Gaussian AMP
    as a special case. Furthermore, ML-RI-GAMP exhibits a significantly lower complexity
    than ML-VAMP, as the computationally intensive singular value decomposition is
    replaced by an estimation of the moments of the design matrices. Finally, our
    numerical results show that this complexity gain comes at little to no cost in
    the performance of the algorithm.
acknowledgement: Marco Mondelli was partially supported by the 2019 Lopez-Loreta prize.
article_processing_charge: No
arxiv: 1
author:
- first_name: Yizhou
  full_name: Xu, Yizhou
  last_name: Xu
- first_name: Tian Qi
  full_name: Hou, Tian Qi
  last_name: Hou
- first_name: Shan Suo
  full_name: Liang, Shan Suo
  last_name: Liang
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Xu Y, Hou TQ, Liang SS, Mondelli M. Approximate message passing for multi-layer
    estimation in rotationally invariant models. In: <i>2023 IEEE Information Theory
    Workshop</i>. Institute of Electrical and Electronics Engineers; 2023:294-298.
    doi:<a href="https://doi.org/10.1109/ITW55543.2023.10160238">10.1109/ITW55543.2023.10160238</a>'
  apa: 'Xu, Y., Hou, T. Q., Liang, S. S., &#38; Mondelli, M. (2023). Approximate message
    passing for multi-layer estimation in rotationally invariant models. In <i>2023
    IEEE Information Theory Workshop</i> (pp. 294–298). Saint-Malo, France: Institute
    of Electrical and Electronics Engineers. <a href="https://doi.org/10.1109/ITW55543.2023.10160238">https://doi.org/10.1109/ITW55543.2023.10160238</a>'
  chicago: Xu, Yizhou, Tian Qi Hou, Shan Suo Liang, and Marco Mondelli. “Approximate
    Message Passing for Multi-Layer Estimation in Rotationally Invariant Models.”
    In <i>2023 IEEE Information Theory Workshop</i>, 294–98. Institute of Electrical
    and Electronics Engineers, 2023. <a href="https://doi.org/10.1109/ITW55543.2023.10160238">https://doi.org/10.1109/ITW55543.2023.10160238</a>.
  ieee: Y. Xu, T. Q. Hou, S. S. Liang, and M. Mondelli, “Approximate message passing
    for multi-layer estimation in rotationally invariant models,” in <i>2023 IEEE
    Information Theory Workshop</i>, Saint-Malo, France, 2023, pp. 294–298.
  ista: 'Xu Y, Hou TQ, Liang SS, Mondelli M. 2023. Approximate message passing for
    multi-layer estimation in rotationally invariant models. 2023 IEEE Information
    Theory Workshop. ITW: Information Theory Workshop, 294–298.'
  mla: Xu, Yizhou, et al. “Approximate Message Passing for Multi-Layer Estimation
    in Rotationally Invariant Models.” <i>2023 IEEE Information Theory Workshop</i>,
    Institute of Electrical and Electronics Engineers, 2023, pp. 294–98, doi:<a href="https://doi.org/10.1109/ITW55543.2023.10160238">10.1109/ITW55543.2023.10160238</a>.
  short: Y. Xu, T.Q. Hou, S.S. Liang, M. Mondelli, in:, 2023 IEEE Information Theory
    Workshop, Institute of Electrical and Electronics Engineers, 2023, pp. 294–298.
conference:
  end_date: 2023-04-28
  location: Saint-Malo, France
  name: 'ITW: Information Theory Workshop'
  start_date: 2023-04-23
corr_author: '1'
date_created: 2023-07-30T22:01:04Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2025-04-15T07:50:16Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/ITW55543.2023.10160238
external_id:
  arxiv:
  - '2212.01572'
  isi:
  - '001031733100053'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2212.01572
month: '05'
oa: 1
oa_version: Preprint
page: 294-298
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 2023 IEEE Information Theory Workshop
publication_identifier:
  eissn:
  - 2475-4218
  isbn:
  - '9798350301496'
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Approximate message passing for multi-layer estimation in rotationally invariant
  models
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14083'
abstract:
- lang: eng
  text: "In this work we consider the list-decodability and list-recoverability of
    arbitrary q-ary codes, for all integer values of q ≥ 2. A code is called (p,L)_q-list-decodable
    if every radius pn Hamming ball contains less than L codewords; (p,\U0001D4C1,L)_q-list-recoverability
    is a generalization where we place radius pn Hamming balls on every point of a
    combinatorial rectangle with side length \U0001D4C1 and again stipulate that there
    be less than L codewords.\r\nOur main contribution is to precisely calculate the
    maximum value of p for which there exist infinite families of positive rate (p,\U0001D4C1,L)_q-list-recoverable
    codes, the quantity we call the zero-rate threshold. Denoting this value by p_*,
    we in fact show that codes correcting a p_*+ε fraction of errors must have size
    O_ε(1), i.e., independent of n. Such a result is typically referred to as a \"Plotkin
    bound.\" To complement this, a standard random code with expurgation construction
    shows that there exist positive rate codes correcting a p_*-ε fraction of errors.
    We also follow a classical proof template (typically attributed to Elias and Bassalygo)
    to derive from the zero-rate threshold other tradeoffs between rate and decoding
    radius for list-decoding and list-recovery.\r\nTechnically, proving the Plotkin
    bound boils down to demonstrating the Schur convexity of a certain function defined
    on the q-simplex as well as the convexity of a univariate function derived from
    it. We remark that an earlier argument claimed similar results for q-ary list-decoding;
    however, we point out that this earlier proof is flawed."
acknowledgement: "Nicolas Resch: Research supported in part by ERC H2020 grant No.74079
  (ALGSTRONGCRYPTO). Chen Yuan: Research supported in part by the National Key Research
  and Development Projects under Grant 2022YFA1004900 and Grant 2021YFE0109900, the
  National Natural Science Foundation of China under Grant 12101403 and Grant 12031011.\r\nAcknowledgements
  YZ is grateful to Shashank Vatedka, Diyuan Wu and Fengxing Zhu for inspiring discussions."
alternative_title:
- LIPIcs
article_number: '99'
article_processing_charge: Yes
arxiv: 1
author:
- first_name: Nicolas
  full_name: Resch, Nicolas
  last_name: Resch
- first_name: Chen
  full_name: Yuan, Chen
  last_name: Yuan
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
citation:
  ama: 'Resch N, Yuan C, Zhang Y. Zero-rate thresholds and new capacity bounds for
    list-decoding and list-recovery. In: <i>50th International Colloquium on Automata,
    Languages, and Programming</i>. Vol 261. Schloss Dagstuhl - Leibniz-Zentrum für
    Informatik; 2023. doi:<a href="https://doi.org/10.4230/LIPIcs.ICALP.2023.99">10.4230/LIPIcs.ICALP.2023.99</a>'
  apa: 'Resch, N., Yuan, C., &#38; Zhang, Y. (2023). Zero-rate thresholds and new
    capacity bounds for list-decoding and list-recovery. In <i>50th International
    Colloquium on Automata, Languages, and Programming</i> (Vol. 261). Paderborn,
    Germany: Schloss Dagstuhl - Leibniz-Zentrum für Informatik. <a href="https://doi.org/10.4230/LIPIcs.ICALP.2023.99">https://doi.org/10.4230/LIPIcs.ICALP.2023.99</a>'
  chicago: Resch, Nicolas, Chen Yuan, and Yihan Zhang. “Zero-Rate Thresholds and New
    Capacity Bounds for List-Decoding and List-Recovery.” In <i>50th International
    Colloquium on Automata, Languages, and Programming</i>, Vol. 261. Schloss Dagstuhl
    - Leibniz-Zentrum für Informatik, 2023. <a href="https://doi.org/10.4230/LIPIcs.ICALP.2023.99">https://doi.org/10.4230/LIPIcs.ICALP.2023.99</a>.
  ieee: N. Resch, C. Yuan, and Y. Zhang, “Zero-rate thresholds and new capacity bounds
    for list-decoding and list-recovery,” in <i>50th International Colloquium on Automata,
    Languages, and Programming</i>, Paderborn, Germany, 2023, vol. 261.
  ista: 'Resch N, Yuan C, Zhang Y. 2023. Zero-rate thresholds and new capacity bounds
    for list-decoding and list-recovery. 50th International Colloquium on Automata,
    Languages, and Programming. ICALP: Automata, Languages and Programming, LIPIcs,
    vol. 261, 99.'
  mla: Resch, Nicolas, et al. “Zero-Rate Thresholds and New Capacity Bounds for List-Decoding
    and List-Recovery.” <i>50th International Colloquium on Automata, Languages, and
    Programming</i>, vol. 261, 99, Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
    2023, doi:<a href="https://doi.org/10.4230/LIPIcs.ICALP.2023.99">10.4230/LIPIcs.ICALP.2023.99</a>.
  short: N. Resch, C. Yuan, Y. Zhang, in:, 50th International Colloquium on Automata,
    Languages, and Programming, Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
    2023.
conference:
  end_date: 2023-07-14
  location: Paderborn, Germany
  name: 'ICALP: Automata, Languages and Programming'
  start_date: 2023-07-10
corr_author: '1'
date_created: 2023-08-20T22:01:13Z
date_published: 2023-07-01T00:00:00Z
date_updated: 2025-09-08T08:31:53Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.4230/LIPIcs.ICALP.2023.99
external_id:
  arxiv:
  - '2210.07754'
file:
- access_level: open_access
  checksum: a449143fec3fbebb092cb8ef3b53c226
  content_type: application/pdf
  creator: dernst
  date_created: 2023-08-21T07:23:18Z
  date_updated: 2023-08-21T07:23:18Z
  file_id: '14091'
  file_name: 2023_LIPIcsICALP_Resch.pdf
  file_size: 1141497
  relation: main_file
  success: 1
file_date_updated: 2023-08-21T07:23:18Z
has_accepted_license: '1'
intvolume: '       261'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
publication: 50th International Colloquium on Automata, Languages, and Programming
publication_identifier:
  isbn:
  - '9783959772785'
  issn:
  - 1868-8969
publication_status: published
publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
quality_controlled: '1'
related_material:
  record:
  - id: '17330'
    relation: later_version
    status: public
scopus_import: '1'
status: public
title: Zero-rate thresholds and new capacity bounds for list-decoding and list-recovery
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 261
year: '2023'
...
---
_id: '14751'
abstract:
- lang: eng
  text: 'We consider zero-error communication over a two-transmitter deterministic
    adversarial multiple access channel (MAC) governed by an adversary who has access
    to the transmissions of both senders (hence called omniscient ) and aims to maliciously
    corrupt the communication. None of the encoders, jammer and decoder is allowed
    to randomize using private or public randomness. This enforces a combinatorial
    nature of the problem. Our model covers a large family of channels studied in
    the literature, including all deterministic discrete memoryless noisy or noiseless
    MACs. In this work, given an arbitrary two-transmitter deterministic omniscient
    adversarial MAC, we characterize when the capacity region: 1) has nonempty interior
    (in particular, is two-dimensional); 2) consists of two line segments (in particular,
    has empty interior); 3) consists of one line segment (in particular, is one-dimensional);
    4) or only contains (0,0) (in particular, is zero-dimensional). This extends a
    recent result by Wang et al. (201 9) from the point-to-point setting to the multiple
    access setting. Indeed, our converse arguments build upon their generalized Plotkin
    bound and involve delicate case analysis. One of the technical challenges is to
    take care of both “joint confusability” and “marginal confusability”. In particular,
    the treatment of marginal confusability does not follow from the point-to-point
    results by Wang et al. Our achievability results follow from random coding with
    expurgation.'
acknowledgement: "The author would like to thank Amitalok J. Budkuley and Sidharth
  Jaggi for many helpful discussions at the early stage of this work. He would also
  like to thank Nir Ailon, Qi Cao, and Chandra Nair for discussions on a related problem
  regarding zero-error binary adder MACs.\r\nThe work of Yihan Zhang was supported
  by the European Union’s Horizon 2020 Research and Innovation Programme under Grant
  682203-ERC-[Inf-Speed-Tradeoff]"
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Yihan
  full_name: Zhang, Yihan
  id: 2ce5da42-b2ea-11eb-bba5-9f264e9d002c
  last_name: Zhang
  orcid: 0000-0002-6465-6258
citation:
  ama: Zhang Y. Zero-error communication over adversarial MACs. <i>IEEE Transactions
    on Information Theory</i>. 2023;69(7):4093-4127. doi:<a href="https://doi.org/10.1109/tit.2023.3257239">10.1109/tit.2023.3257239</a>
  apa: Zhang, Y. (2023). Zero-error communication over adversarial MACs. <i>IEEE Transactions
    on Information Theory</i>. Institute of Electrical and Electronics Engineers.
    <a href="https://doi.org/10.1109/tit.2023.3257239">https://doi.org/10.1109/tit.2023.3257239</a>
  chicago: Zhang, Yihan. “Zero-Error Communication over Adversarial MACs.” <i>IEEE
    Transactions on Information Theory</i>. Institute of Electrical and Electronics
    Engineers, 2023. <a href="https://doi.org/10.1109/tit.2023.3257239">https://doi.org/10.1109/tit.2023.3257239</a>.
  ieee: Y. Zhang, “Zero-error communication over adversarial MACs,” <i>IEEE Transactions
    on Information Theory</i>, vol. 69, no. 7. Institute of Electrical and Electronics
    Engineers, pp. 4093–4127, 2023.
  ista: Zhang Y. 2023. Zero-error communication over adversarial MACs. IEEE Transactions
    on Information Theory. 69(7), 4093–4127.
  mla: Zhang, Yihan. “Zero-Error Communication over Adversarial MACs.” <i>IEEE Transactions
    on Information Theory</i>, vol. 69, no. 7, Institute of Electrical and Electronics
    Engineers, 2023, pp. 4093–127, doi:<a href="https://doi.org/10.1109/tit.2023.3257239">10.1109/tit.2023.3257239</a>.
  short: Y. Zhang, IEEE Transactions on Information Theory 69 (2023) 4093–4127.
corr_author: '1'
date_created: 2024-01-08T13:04:54Z
date_published: 2023-07-01T00:00:00Z
date_updated: 2025-09-09T14:12:32Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/tit.2023.3257239
external_id:
  arxiv:
  - '2101.12426'
  isi:
  - '001017307000001'
intvolume: '        69'
isi: 1
issue: '7'
keyword:
- Computer Science Applications
- Information Systems
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2101.12426
month: '07'
oa: 1
oa_version: Preprint
page: 4093-4127
publication: IEEE Transactions on Information Theory
publication_identifier:
  eissn:
  - 1557-9654
  issn:
  - 0018-9448
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Zero-error communication over adversarial MACs
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
volume: 69
year: '2023'
...
---
_id: '14921'
abstract:
- lang: eng
  text: Neural collapse (NC) refers to the surprising structure of the last layer
    of deep neural networks in the terminal phase of gradient descent training. Recently,
    an increasing amount of experimental evidence has pointed to the propagation of
    NC to earlier layers of neural networks. However, while the NC in the last layer
    is well studied theoretically, much less is known about its multi-layered counterpart
    - deep neural collapse (DNC). In particular, existing work focuses either on linear
    layers or only on the last two layers at the price of an extra assumption. Our
    paper fills this gap by generalizing the established analytical framework for
    NC - the unconstrained features model - to multiple non-linear layers. Our key
    technical contribution is to show that, in a deep unconstrained features model,
    the unique global optimum for binary classification exhibits all the properties
    typical of DNC. This explains the existing experimental evidence of DNC. We also
    empirically show that (i) by optimizing deep unconstrained features models via
    gradient descent, the resulting solution agrees well with our theory, and (ii)
    trained networks recover the unconstrained features suitable for the occurrence
    of DNC, thus supporting the validity of this modeling principle.
acknowledgement: M. M. is partially supported by the 2019 Lopez-Loreta Prize. The
  authors would like to thank Eugenia Iofinova, Bernd Prach and Simone Bombari for
  valuable feedback on the manuscript.
alternative_title:
- NeurIPS
article_processing_charge: No
arxiv: 1
author:
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
    for the deep unconstrained features model. In: <i>37th Annual Conference on Neural
    Information Processing Systems</i>. ; 2023.'
  apa: Súkeník, P., Mondelli, M., &#38; Lampert, C. (2023). Deep neural collapse is
    provably optimal for the deep unconstrained features model. In <i>37th Annual
    Conference on Neural Information Processing Systems</i>. New Orleans, LA, United
    States.
  chicago: Súkeník, Peter, Marco Mondelli, and Christoph Lampert. “Deep Neural Collapse
    Is Provably Optimal for the Deep Unconstrained Features Model.” In <i>37th Annual
    Conference on Neural Information Processing Systems</i>, 2023.
  ieee: P. Súkeník, M. Mondelli, and C. Lampert, “Deep neural collapse is provably
    optimal for the deep unconstrained features model,” in <i>37th Annual Conference
    on Neural Information Processing Systems</i>, New Orleans, LA, United States,
    2023.
  ista: 'Súkeník P, Mondelli M, Lampert C. 2023. Deep neural collapse is provably
    optimal for the deep unconstrained features model. 37th Annual Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    NeurIPS, .'
  mla: Súkeník, Peter, et al. “Deep Neural Collapse Is Provably Optimal for the Deep
    Unconstrained Features Model.” <i>37th Annual Conference on Neural Information
    Processing Systems</i>, 2023.
  short: P. Súkeník, M. Mondelli, C. Lampert, in:, 37th Annual Conference on Neural
    Information Processing Systems, 2023.
conference:
  end_date: 2023-12-16
  location: New Orleans, LA, United States
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2023-12-10
corr_author: '1'
date_created: 2024-02-02T11:17:41Z
date_published: 2023-12-15T00:00:00Z
date_updated: 2025-04-15T07:50:16Z
day: '15'
department:
- _id: MaMo
- _id: ChLa
external_id:
  arxiv:
  - '2305.13165'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: ' https://doi.org/10.48550/arXiv.2305.13165'
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 37th Annual Conference on Neural Information Processing Systems
publication_status: published
quality_controlled: '1'
status: public
title: Deep neural collapse is provably optimal for the deep unconstrained features
  model
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...