---
OA_place: repository
OA_type: green
_id: '18972'
abstract:
- lang: eng
  text: 'Deep learning models are known to overfit and memorize spurious features
    in the training dataset. While numerous empirical studies have aimed at understanding
    this phenomenon, a rigorous theoretical framework to quantify it is still missing.
    In this paper, we consider spurious features that are uncorrelated with the learning
    task, and we provide a precise characterization of how they are memorized via
    two separate terms: (i) the stability of the model with respect to individual
    training samples, and (ii) the feature alignment between the spurious pattern
    and the full sample. While the first term is well established in learning theory
    and it is connected to the generalization error in classical work, the second
    one is, to the best of our knowledge, novel. Our key technical result gives a
    precise characterization of the feature alignment for the two prototypical settings
    of random features (RF) and neural tangent kernel (NTK) regression. We prove that
    the memorization of spurious features weakens as the generalization capability
    increases and, through the analysis of the feature alignment, we unveil the role
    of the model and of its activation function. Numerical experiments show the predictive
    power of our theory on standard datasets (MNIST, CIFAR-10).'
acknowledgement: "The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank (in alphabetical order) Grigorios Chrysos, Simone Maria
  Giancola, Mahyar\r\nJafari Nodeh, Christoph Lampert, Marco Miani, GuanWen Qiu, and
  Peter Sukenık for helpful discussions."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. How spurious features are memorized: Precise analysis
    for random and NTK features. In: <i>41st International Conference on Machine Learning</i>.
    Vol 235. ML Research Press; 2024:4267-4299.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). How spurious features are memorized:
    Precise analysis for random and NTK features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4267–4299). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4267–99. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “How spurious features are memorized: Precise
    analysis for random and NTK features,” in <i>41st International Conference on
    Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4267–4299.'
  ista: 'Bombari S, Mondelli M. 2024. How spurious features are memorized: Precise
    analysis for random and NTK features. 41st International Conference on Machine
    Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235,
    4267–4299.'
  mla: 'Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized:
    Precise Analysis for Random and NTK Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4267–99.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4267–4299.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:29:47Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2305.12100'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2305.12100
month: '07'
oa: 1
oa_version: Preprint
page: 4267-4299
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How spurious features are memorized: Precise analysis for random and NTK features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18973'
abstract:
- lang: eng
  text: 'Understanding the reasons behind the exceptional success of transformers
    requires a better analysis of why attention layers are suitable for NLP tasks.
    In particular, such tasks require predictive models to capture contextual meaning
    which often depends on one or few words, even if the sentence is long. Our work
    studies this key property, dubbed word sensitivity (WS), in the prototypical setting
    of random features. We show that attention layers enjoy high WS, namely, there
    exists a vector in the space of embeddings that largely perturbs the random attention
    features map. The argument critically exploits the role of the softmax in the
    attention layer, highlighting its benefit compared to other activations (e.g.,
    ReLU). In contrast, the WS of standard random features is of order 1/n−−√, n being
    the number of words in the textual sample, and thus it decays with the length
    of the context. We then translate these results on the word sensitivity into generalization
    bounds: due to their low WS, random features provably cannot learn to distinguish
    between two sentences that differ only in a single word; in contrast, due to their
    high WS, random attention features have higher generalization capabilities. We
    validate our theoretical results with experimental evidence over the BERT-Base
    word embeddings of the imdb review dataset.'
acknowledgement: The authors were partially supported by the 2019 LopezLoreta prize,
  and they would like to thank Mohammad Hossein Amani, Lorenzo Beretta, and Clement
  Rebuffel for helpful discussions.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Mondelli M. Towards understanding the word sensitivity of attention
    layers: A study via random features. In: <i>41st International Conference on Machine
    Learning</i>. Vol 235. ML Research Press; 2024:4300-4328.'
  apa: 'Bombari, S., &#38; Mondelli, M. (2024). Towards understanding the word sensitivity
    of attention layers: A study via random features. In <i>41st International Conference
    on Machine Learning</i> (Vol. 235, pp. 4300–4328). Vienna, Austria: ML Research
    Press.'
  chicago: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” In <i>41st International Conference
    on Machine Learning</i>, 235:4300–4328. ML Research Press, 2024.'
  ieee: 'S. Bombari and M. Mondelli, “Towards understanding the word sensitivity of
    attention layers: A study via random features,” in <i>41st International Conference
    on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4300–4328.'
  ista: 'Bombari S, Mondelli M. 2024. Towards understanding the word sensitivity of
    attention layers: A study via random features. 41st International Conference on
    Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol.
    235, 4300–4328.'
  mla: 'Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity
    of Attention Layers: A Study via Random Features.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4300–28.'
  short: S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning,
    ML Research Press, 2024, pp. 4300–4328.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:35:49Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-04-15T07:50:12Z
day: '30'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2402.02969'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2402.02969
month: '07'
oa: 1
oa_version: Preprint
page: 4300-4328
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Towards understanding the word sensitivity of attention layers: A study via
  random features'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: publisher
OA_type: green
_id: '18974'
abstract:
- lang: eng
  text: Reinforcement Learning (RL) from temporal logical specifications is a fundamental
    problem in sequential decision making. One of the basic and core such specification
    is the reachability specification that requires a target set to be eventually
    visited. Despite strong empirical results for RL from such specifications, the
    theoretical guarantees are bleak, including the impossibility of Probably Approximately
    Correct (PAC) guarantee for reachability specifications. Given the impossibility
    result, in this work we consider the problem of RL from reachability specifications
    along with the information of expected conditional distance (ECD). We present
    (a) lower bound results which establish the necessity of ECD information for PAC
    guarantees and (b) an algorithm that establishes PAC-guarantees given the ECD
    information. To the best of our knowledge, this is the first RL from reachability
    specifications that does not make any assumptions on the underlying environment
    to learn policies.
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Jakub
  full_name: Svoboda, Jakub
  id: 130759D2-D7DD-11E9-87D2-DE0DE6697425
  last_name: Svoboda
  orcid: 0000-0002-1419-3267
- first_name: Suguman
  full_name: Bansal, Suguman
  last_name: Bansal
- first_name: Krishnendu
  full_name: Chatterjee, Krishnendu
  id: 2E5DCA20-F248-11E8-B48F-1D18A9856A87
  last_name: Chatterjee
  orcid: 0000-0002-4561-241X
citation:
  ama: 'Svoboda J, Bansal S, Chatterjee K. Reinforcement learning from reachability
    specifications: PAC guarantees with expected conditional distance. In: <i>41st
    International Conference on Machine Learning</i>. Vol 235. ML Research Press;
    2024:47331-47344.'
  apa: 'Svoboda, J., Bansal, S., &#38; Chatterjee, K. (2024). Reinforcement learning
    from reachability specifications: PAC guarantees with expected conditional distance.
    In <i>41st International Conference on Machine Learning</i> (Vol. 235, pp. 47331–47344).
    Vienna, Austria: ML Research Press.'
  chicago: 'Svoboda, Jakub, Suguman Bansal, and Krishnendu Chatterjee. “Reinforcement
    Learning from Reachability Specifications: PAC Guarantees with Expected Conditional
    Distance.” In <i>41st International Conference on Machine Learning</i>, 235:47331–44.
    ML Research Press, 2024.'
  ieee: 'J. Svoboda, S. Bansal, and K. Chatterjee, “Reinforcement learning from reachability
    specifications: PAC guarantees with expected conditional distance,” in <i>41st
    International Conference on Machine Learning</i>, Vienna, Austria, 2024, vol.
    235, pp. 47331–47344.'
  ista: 'Svoboda J, Bansal S, Chatterjee K. 2024. Reinforcement learning from reachability
    specifications: PAC guarantees with expected conditional distance. 41st International
    Conference on Machine Learning. ICML: International Conference on Machine Learning,
    PMLR, vol. 235, 47331–47344.'
  mla: 'Svoboda, Jakub, et al. “Reinforcement Learning from Reachability Specifications:
    PAC Guarantees with Expected Conditional Distance.” <i>41st International Conference
    on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 47331–44.'
  short: J. Svoboda, S. Bansal, K. Chatterjee, in:, 41st International Conference
    on Machine Learning, ML Research Press, 2024, pp. 47331–47344.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:45:22Z
date_published: 2024-07-29T00:00:00Z
date_updated: 2025-01-30T07:46:16Z
day: '29'
department:
- _id: KrCh
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/forum?id=mXUDDL4r1Q
month: '07'
oa: 1
oa_version: Preprint
page: 47331-47344
publication: 41st International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Reinforcement learning from reachability specifications: PAC guarantees with
  expected conditional distance'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18975'
abstract:
- lang: eng
  text: Leveraging second-order information about the loss at the scale of deep networks
    is one of the main lines of approach for improving the performance of current
    optimizers for deep learning. Yet, existing approaches for accurate full-matrix
    preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate
    Curvature (M-FAC) suffer from massive storage costs when applied even to small-scale
    models, as they must store a sliding window of gradients, whose memory requirements
    are multiplicative in the model dimension. In this paper, we address this issue
    via a novel and efficient error-feedback technique that can be applied to compress
    preconditioners by up to two orders of magnitude in practice, without loss of
    convergence. Specifically, our approach compresses the gradient information via
    sparsification or low-rank compression before it is fed into the preconditioner,
    feeding the compression error back into future iterations. Extensive experiments
    on deep neural networks show that this approach can compress full-matrix preconditioners
    to up to 99% sparsity without accuracy loss, effectively removing the memory overhead
    of fullmatrix preconditioners such as GGT and M-FAC.
acknowledged_ssus:
- _id: CampIT
acknowledgement: The authors thank Adrian Vladu, Razvan Pascanu, Alexandra Peste,
  Mher Safaryan for their valuable feedback, the IT department from Institute of Science
  and Technology Austria for the hardware support and Weights and Biases for the infrastructure
  to track all our experiments.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Ionut-Vlad
  full_name: Modoranu, Ionut-Vlad
  id: 449f7a18-f128-11eb-9611-9b430c0c6333
  last_name: Modoranu
- first_name: Aleksei
  full_name: Kalinov, Aleksei
  id: 44b7120e-eb97-11eb-a6c2-e1557aa81d02
  last_name: Kalinov
  orcid: 0000-0003-2189-3904
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. Error feedback
    can accurately compress preconditioners. In: <i>41st International Conference
    on Machine Learning</i>. Vol 235. ML Research Press; 2024:35910-35933.'
  apa: 'Modoranu, I.-V., Kalinov, A., Kurtic, E., Frantar, E., &#38; Alistarh, D.-A.
    (2024). Error feedback can accurately compress preconditioners. In <i>41st International
    Conference on Machine Learning</i> (Vol. 235, pp. 35910–35933). Vienna, Austria:
    ML Research Press.'
  chicago: Modoranu, Ionut-Vlad, Aleksei Kalinov, Eldar Kurtic, Elias Frantar, and
    Dan-Adrian Alistarh. “Error Feedback Can Accurately Compress Preconditioners.”
    In <i>41st International Conference on Machine Learning</i>, 235:35910–33. ML
    Research Press, 2024.
  ieee: I.-V. Modoranu, A. Kalinov, E. Kurtic, E. Frantar, and D.-A. Alistarh, “Error
    feedback can accurately compress preconditioners,” in <i>41st International Conference
    on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 35910–35933.
  ista: 'Modoranu I-V, Kalinov A, Kurtic E, Frantar E, Alistarh D-A. 2024. Error feedback
    can accurately compress preconditioners. 41st International Conference on Machine
    Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235,
    35910–35933.'
  mla: Modoranu, Ionut-Vlad, et al. “Error Feedback Can Accurately Compress Preconditioners.”
    <i>41st International Conference on Machine Learning</i>, vol. 235, ML Research
    Press, 2024, pp. 35910–33.
  short: I.-V. Modoranu, A. Kalinov, E. Kurtic, E. Frantar, D.-A. Alistarh, in:, 41st
    International Conference on Machine Learning, ML Research Press, 2024, pp. 35910–35933.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2025-01-30T07:53:22Z
date_published: 2024-07-30T00:00:00Z
date_updated: 2025-01-30T07:54:16Z
day: '30'
department:
- _id: DaAl
external_id:
  arxiv:
  - '2306.06098'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2306.06098
month: '07'
oa: 1
oa_version: Preprint
page: 35910-35933
publication: 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Error feedback can accurately compress preconditioners
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18976'
abstract:
- lang: eng
  text: We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous
    setting, where each worker has its own computation and communication speeds, as
    well as data distribution. In these algorithms, workers compute possibly stale
    and stochastic gradients associated with their local data at some iteration back
    in history and then return those gradients to the server without synchronizing
    with other workers. We present a unified convergence theory for non-convex smooth
    functions in the heterogeneous regime. The proposed analysis provides convergence
    for pure asynchronous SGD and its various modifications. Moreover, our theory
    explains what affects the convergence rate and what can be done to improve the
    performance of asynchronous algorithms. In particular, we introduce a novel asynchronous
    method based on worker shuffling. As a by-product of our analysis, we also demonstrate
    convergence guarantees for gradient-type algorithms such as SGD with random reshuffling
    and shuffle-once mini-batch SGD. The derived rates match the best-known results
    for those algorithms, highlighting the tightness of our approach. Finally, our
    numerical evaluations support theoretical findings and show the good practical
    performance of our method.
acknowledgement: "The authors thank all anonymous reviewers for their valuable comments
  and suggestions on how to improve the manuscript. This work was done when Rustem
  Islamov was a Master’s student at Institut Polytechnique de Paris (IP Paris) and
  an intern at Institute of Science and Technology Austria (ISTA). The research of
  Rustem Islamov was supported by ISTA internship\r\nprogram. Mher Safaryan has received
  funding from the European Union’s Horizon 2020 research and innovation program under
  the Marie Skłodowska-Curie grant agreement No 101034413."
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Rustem
  full_name: Islamov, Rustem
  last_name: Islamov
- first_name: Mher
  full_name: Safaryan, Mher
  id: dd546b39-0804-11ed-9c55-ef075c39778d
  last_name: Safaryan
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Islamov R, Safaryan M, Alistarh D-A. AsGrad: A sharp unified analysis of asynchronous-SGD
    algorithms. In: <i>Proceedings of The 27th International Conference on Artificial
    Intelligence and Statistics</i>. Vol 238. ML Research Press; 2024:649-657.'
  apa: 'Islamov, R., Safaryan, M., &#38; Alistarh, D.-A. (2024). AsGrad: A sharp unified
    analysis of asynchronous-SGD algorithms. In <i>Proceedings of The 27th International
    Conference on Artificial Intelligence and Statistics</i> (Vol. 238, pp. 649–657).
    Valencia, Spain: ML Research Press.'
  chicago: 'Islamov, Rustem, Mher Safaryan, and Dan-Adrian Alistarh. “AsGrad: A Sharp
    Unified Analysis of Asynchronous-SGD Algorithms.” In <i>Proceedings of The 27th
    International Conference on Artificial Intelligence and Statistics</i>, 238:649–57.
    ML Research Press, 2024.'
  ieee: 'R. Islamov, M. Safaryan, and D.-A. Alistarh, “AsGrad: A sharp unified analysis
    of asynchronous-SGD algorithms,” in <i>Proceedings of The 27th International Conference
    on Artificial Intelligence and Statistics</i>, Valencia, Spain, 2024, vol. 238,
    pp. 649–657.'
  ista: 'Islamov R, Safaryan M, Alistarh D-A. 2024. AsGrad: A sharp unified analysis
    of asynchronous-SGD algorithms. Proceedings of The 27th International Conference
    on Artificial Intelligence and Statistics. AISTATS: Conference on Artificial Intelligence
    and Statistics, PMLR, vol. 238, 649–657.'
  mla: 'Islamov, Rustem, et al. “AsGrad: A Sharp Unified Analysis of Asynchronous-SGD
    Algorithms.” <i>Proceedings of The 27th International Conference on Artificial
    Intelligence and Statistics</i>, vol. 238, ML Research Press, 2024, pp. 649–57.'
  short: R. Islamov, M. Safaryan, D.-A. Alistarh, in:, Proceedings of The 27th International
    Conference on Artificial Intelligence and Statistics, ML Research Press, 2024,
    pp. 649–657.
conference:
  end_date: 2024-05-04
  location: Valencia, Spain
  name: 'AISTATS: Conference on Artificial Intelligence and Statistics'
  start_date: 2024-05-02
corr_author: '1'
date_created: 2025-01-30T08:15:49Z
date_published: 2024-05-15T00:00:00Z
date_updated: 2025-04-14T07:54:52Z
day: '15'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2310.20452'
intvolume: '       238'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2310.20452
month: '05'
oa: 1
oa_version: Preprint
page: 649-657
project:
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
publication: Proceedings of The 27th International Conference on Artificial Intelligence
  and Statistics
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'AsGrad: A sharp unified analysis of asynchronous-SGD algorithms'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 238
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18977'
abstract:
- lang: eng
  text: "Recent advances in large language model (LLM) pretraining have led to high-quality
    LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4
    bits per parameter, they can fit into memory-limited devices such as laptops and
    mobile phones, enabling personalized use. Quantizing models to 3-4 bits per parameter
    can lead to moderate to high accuracy losses, especially for smaller models (1-10B
    parameters), which are suitable for edge deployment. To address this accuracy
    issue, we introduce the Sparse-Quantized Representation (SpQR), a new compressed
    format and quantization technique that enables for the first time \\emph{near-lossless}
    compression of LLMs across model scales while reaching similar compression levels
    to previous methods. SpQR works by identifying and isolating \\emph{outlier weights},
    which cause particularly large quantization errors, and storing them in higher
    precision while compressing all other weights to 3-4 bits, and achieves relative
    accuracy losses of less than \r\n in perplexity for highly-accurate LLaMA and
    Falcon LLMs. This makes it possible to run a 33B parameter LLM on a single 24
    GB consumer GPU without performance degradation at 15% speedup, thus making powerful
    LLMs available to consumers without any downsides. SpQR comes with efficient algorithms
    for both encoding weights into its format, as well as decoding them efficiently
    at runtime. Specifically, we provide an efficient GPU inference algorithm for
    SpQR, which yields faster inference than 16-bit baselines at similar accuracy
    while enabling memory compression gains of more than 4x."
acknowledgement: "Denis Kuznedelev acknowledges the support from the Russian Ministry
  of Science and Higher\r\nEducation, grant No. 075-10-2021-068. Ruslan Svirschevski
  and Vage Egiazarian and Denis\r\nKuznedelev were supported by the grant for research
  centers in the field of AI provided by the\r\nAnalytical Center for the Government
  of the Russian Federation (ACRF) in accordance with the\r\nagreement on the provision
  of subsidies (identifier of the agreement 000000D730321P5Q0002) and the agreement
  with HSE University No. 70-2021-00139."
article_processing_charge: No
arxiv: 1
author:
- first_name: Tim
  full_name: Dettmers, Tim
  last_name: Dettmers
- first_name: Ruslan A.
  full_name: Svirschevski, Ruslan A.
  last_name: Svirschevski
- first_name: Vage
  full_name: Egiazarian, Vage
  last_name: Egiazarian
- first_name: Denis
  full_name: Kuznedelev, Denis
  last_name: Kuznedelev
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Saleh
  full_name: Ashkboos, Saleh
  last_name: Ashkboos
- first_name: Alexander
  full_name: Borzunov, Alexander
  last_name: Borzunov
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Dettmers T, Svirschevski RA, Egiazarian V, et al. SpQR: A sparse-quantized
    representation for near-lossless LLM weight compression. In: <i>12th International
    Conference on Learning Representations</i>. OpenReview; 2024.'
  apa: 'Dettmers, T., Svirschevski, R. A., Egiazarian, V., Kuznedelev, D., Frantar,
    E., Ashkboos, S., … Alistarh, D.-A. (2024). SpQR: A sparse-quantized representation
    for near-lossless LLM weight compression. In <i>12th International Conference
    on Learning Representations</i>. Vienna, Austria: OpenReview.'
  chicago: 'Dettmers, Tim, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev,
    Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan-Adrian
    Alistarh. “SpQR: A Sparse-Quantized Representation for near-Lossless LLM Weight
    Compression.” In <i>12th International Conference on Learning Representations</i>.
    OpenReview, 2024.'
  ieee: 'T. Dettmers <i>et al.</i>, “SpQR: A sparse-quantized representation for near-lossless
    LLM weight compression,” in <i>12th International Conference on Learning Representations</i>,
    Vienna, Austria, 2024.'
  ista: 'Dettmers T, Svirschevski RA, Egiazarian V, Kuznedelev D, Frantar E, Ashkboos
    S, Borzunov A, Hoefler T, Alistarh D-A. 2024. SpQR: A sparse-quantized representation
    for near-lossless LLM weight compression. 12th International Conference on Learning
    Representations. ICLR: International Conference on Learning Representations.'
  mla: 'Dettmers, Tim, et al. “SpQR: A Sparse-Quantized Representation for near-Lossless
    LLM Weight Compression.” <i>12th International Conference on Learning Representations</i>,
    OpenReview, 2024.'
  short: T. Dettmers, R.A. Svirschevski, V. Egiazarian, D. Kuznedelev, E. Frantar,
    S. Ashkboos, A. Borzunov, T. Hoefler, D.-A. Alistarh, in:, 12th International
    Conference on Learning Representations, OpenReview, 2024.
conference:
  end_date: 2024-05-11
  location: Vienna, Austria
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2024-05-07
date_created: 2025-01-30T08:26:59Z
date_published: 2024-05-15T00:00:00Z
date_updated: 2025-01-30T08:27:47Z
day: '15'
department:
- _id: DaAl
external_id:
  arxiv:
  - '2306.03078'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2306.03078
month: '05'
oa: 1
oa_version: Preprint
publication: 12th International Conference on Learning Representations
publication_status: published
publisher: OpenReview
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SpQR: A sparse-quantized representation for near-lossless LLM weight compression'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18996'
abstract:
- lang: eng
  text: 'We consider the linear causal representation learning setting where we observe
    a linear mixing of d unknown latent factors, which follow a linear structural
    causal model. Recent work has shown that it is possible to recover the latent
    factors as well as the underlying structural causal model over them, up to permutation
    and scaling, provided that we have at least d environments, each of which corresponds
    to perfect interventions on a single latent node (factor). After this powerful
    result, a key open problem faced by the community has been to relax these conditions:
    allow for coarser than perfect single-node interventions, and allow for fewer
    than d of them, since the number of latent factors d could be very large. In this
    work, we consider precisely such a setting, where we allow a smaller than d number
    of environments, and also allow for very coarse interventions that can very coarsely
    \textit{change the entire causal graph over the latent factors}. On the flip side,
    we relax what we wish to extract to simply the \textit{list of nodes that have
    shifted between one or more environments}. We provide a surprising identifiability
    result that it is indeed possible, under some very mild standard assumptions,
    to identify the set of shifted nodes. Our identifiability proof moreover is a
    constructive one: we explicitly provide necessary and sufficient conditions for
    a node to be a shifted node, and show that we can check these conditions given
    observed data. Our algorithm lends itself very naturally to the sample setting
    where instead of just interventional distributions, we are provided datasets of
    samples from each of these distributions. We corroborate our results on both synthetic
    experiments as well as an interesting psychometric dataset. The code can be found
    at https://github.com/TianyuCodings/iLCS.'
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Tianyu
  full_name: Chen, Tianyu
  last_name: Chen
- first_name: Kevin
  full_name: Bello, Kevin
  last_name: Bello
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
- first_name: Bryon
  full_name: Aragam, Bryon
  last_name: Aragam
- first_name: Pradeep Kumar
  full_name: Ravikumar, Pradeep Kumar
  last_name: Ravikumar
citation:
  ama: 'Chen T, Bello K, Locatello F, Aragam B, Ravikumar PK. Identifying general
    mechanism shifts in linear causal representations. In: <i>38th Conference on Neural
    Information Processing Systems</i>. Vol 37. Neural Information Processing Systems
    Foundation; 2024.'
  apa: 'Chen, T., Bello, K., Locatello, F., Aragam, B., &#38; Ravikumar, P. K. (2024).
    Identifying general mechanism shifts in linear causal representations. In <i>38th
    Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver,
    Canada: Neural Information Processing Systems Foundation.'
  chicago: Chen, Tianyu, Kevin Bello, Francesco Locatello, Bryon Aragam, and Pradeep
    Kumar Ravikumar. “Identifying General Mechanism Shifts in Linear Causal Representations.”
    In <i>38th Conference on Neural Information Processing Systems</i>, Vol. 37. Neural
    Information Processing Systems Foundation, 2024.
  ieee: T. Chen, K. Bello, F. Locatello, B. Aragam, and P. K. Ravikumar, “Identifying
    general mechanism shifts in linear causal representations,” in <i>38th Conference
    on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Chen T, Bello K, Locatello F, Aragam B, Ravikumar PK. 2024. Identifying general
    mechanism shifts in linear causal representations. 38th Conference on Neural Information
    Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in
    Neural Information Processing Systems, vol. 37.'
  mla: Chen, Tianyu, et al. “Identifying General Mechanism Shifts in Linear Causal
    Representations.” <i>38th Conference on Neural Information Processing Systems</i>,
    vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: T. Chen, K. Bello, F. Locatello, B. Aragam, P.K. Ravikumar, in:, 38th Conference
    on Neural Information Processing Systems, Neural Information Processing Systems
    Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
date_created: 2025-02-04T13:09:34Z
date_published: 2024-09-25T00:00:00Z
date_updated: 2025-07-07T13:23:49Z
day: '25'
ddc:
- '000'
department:
- _id: FrLo
external_id:
  arxiv:
  - '2410.24059'
file:
- access_level: open_access
  checksum: 75c3091e70bd2916cd94afbf40a0c425
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-04T13:09:08Z
  date_updated: 2025-02-04T13:09:08Z
  file_id: '18997'
  file_name: 2024_NeurIPS_Chen.pdf
  file_size: 5659119
  relation: main_file
  success: 1
file_date_updated: 2025-02-04T13:09:08Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '09'
oa: 1
oa_version: Published Version
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Identifying general mechanism shifts in linear causal representations
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: publisher
OA_type: gold
_id: '18998'
abstract:
- lang: eng
  text: Word embeddings represent language vocabularies as clouds of d-dimensional
    points. We investigate how information is conveyed by the general shape of these
    clouds, instead of representing the semantic meaning of each token. Specifically,
    we use the notion of persistent homology from topological data analysis (TDA)
    to measure the distances between language pairs from the shape of their unlabeled
    embeddings. These distances quantify the degree of non-isometry of the embeddings.
    To distinguish whether these differences are random training errors or capture
    real information about the languages, we use the computed distance matrices to
    construct language phylogenetic trees over 81 Indo-European languages. Careful
    evaluation shows that our reconstructed trees exhibit strong and statistically-significant
    similarities to the reference.
article_processing_charge: No
arxiv: 1
author:
- first_name: Ondrej
  full_name: Draganov, Ondrej
  id: 2B23F01E-F248-11E8-B48F-1D18A9856A87
  last_name: Draganov
  orcid: 0000-0003-0464-3823
- first_name: Steven
  full_name: Skiena, Steven
  last_name: Skiena
citation:
  ama: 'Draganov O, Skiena S. The shape of word embeddings: Quantifying non-isometry
    with topological data analysis. In: <i>Findings of the Association for Computational
    Linguistics: EMNLP 2024</i>. Association for Computational Linguistics; 2024:12080-12099.
    doi:<a href="https://doi.org/10.18653/v1/2024.findings-emnlp.705">10.18653/v1/2024.findings-emnlp.705</a>'
  apa: 'Draganov, O., &#38; Skiena, S. (2024). The shape of word embeddings: Quantifying
    non-isometry with topological data analysis. In <i>Findings of the Association
    for Computational Linguistics: EMNLP 2024</i> (pp. 12080–12099). Miami, FL, United
    States: Association for Computational Linguistics. <a href="https://doi.org/10.18653/v1/2024.findings-emnlp.705">https://doi.org/10.18653/v1/2024.findings-emnlp.705</a>'
  chicago: 'Draganov, Ondrej, and Steven Skiena. “The Shape of Word Embeddings: Quantifying
    Non-Isometry with Topological Data Analysis.” In <i>Findings of the Association
    for Computational Linguistics: EMNLP 2024</i>, 12080–99. Association for Computational
    Linguistics, 2024. <a href="https://doi.org/10.18653/v1/2024.findings-emnlp.705">https://doi.org/10.18653/v1/2024.findings-emnlp.705</a>.'
  ieee: 'O. Draganov and S. Skiena, “The shape of word embeddings: Quantifying non-isometry
    with topological data analysis,” in <i>Findings of the Association for Computational
    Linguistics: EMNLP 2024</i>, Miami, FL, United States, 2024, pp. 12080–12099.'
  ista: 'Draganov O, Skiena S. 2024. The shape of word embeddings: Quantifying non-isometry
    with topological data analysis. Findings of the Association for Computational
    Linguistics: EMNLP 2024. EMNLP: Conference on Empirical Methods in Natural Language
    Processing, 12080–12099.'
  mla: 'Draganov, Ondrej, and Steven Skiena. “The Shape of Word Embeddings: Quantifying
    Non-Isometry with Topological Data Analysis.” <i>Findings of the Association for
    Computational Linguistics: EMNLP 2024</i>, Association for Computational Linguistics,
    2024, pp. 12080–99, doi:<a href="https://doi.org/10.18653/v1/2024.findings-emnlp.705">10.18653/v1/2024.findings-emnlp.705</a>.'
  short: 'O. Draganov, S. Skiena, in:, Findings of the Association for Computational
    Linguistics: EMNLP 2024, Association for Computational Linguistics, 2024, pp.
    12080–12099.'
conference:
  end_date: 2024-11-16
  location: Miami, FL, United States
  name: 'EMNLP: Conference on Empirical Methods in Natural Language Processing'
  start_date: 2024-11-12
corr_author: '1'
date_created: 2025-02-04T16:19:28Z
date_published: 2024-11-01T00:00:00Z
date_updated: 2025-02-10T08:21:37Z
day: '01'
ddc:
- '500'
department:
- _id: GradSch
- _id: HeEd
doi: 10.18653/v1/2024.findings-emnlp.705
external_id:
  arxiv:
  - '2404.00500'
file:
- access_level: open_access
  checksum: f4416a5962194f0181ab0dc7f9ef93c0
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-10T08:20:34Z
  date_updated: 2025-02-10T08:20:34Z
  file_id: '19016'
  file_name: 2024_EMNLP_Draganov.pdf
  file_size: 1312638
  relation: main_file
  success: 1
file_date_updated: 2025-02-10T08:20:34Z
has_accepted_license: '1'
language:
- iso: eng
month: '11'
oa: 1
oa_version: Published Version
page: 12080-12099
publication: 'Findings of the Association for Computational Linguistics: EMNLP 2024'
publication_status: published
publisher: Association for Computational Linguistics
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'The shape of word embeddings: Quantifying non-isometry with topological data
  analysis'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18999'
abstract:
- lang: eng
  text: Exploring the shape of point configurations has been a key driver in the evolution
    of TDA (short for topological data analysis) since its infancy. This survey illustrates
    the recent efforts to broaden these ideas to model spatial interactions among
    multiple configurations, each distinguished by a color. It describes advances
    in this area and prepares the ground for further exploration by mentioning unresolved
    questions and promising research avenues while focusing on the overlap with discrete
    geometry.
article_number: '2406.04102'
article_processing_charge: No
arxiv: 1
author:
- first_name: Sebastiano
  full_name: Cultrera di Montesano, Sebastiano
  id: 34D2A09C-F248-11E8-B48F-1D18A9856A87
  last_name: Cultrera di Montesano
  orcid: 0000-0001-6249-0832
- first_name: Ondrej
  full_name: Draganov, Ondrej
  id: 2B23F01E-F248-11E8-B48F-1D18A9856A87
  last_name: Draganov
  orcid: 0000-0003-0464-3823
- first_name: Herbert
  full_name: Edelsbrunner, Herbert
  id: 3FB178DA-F248-11E8-B48F-1D18A9856A87
  last_name: Edelsbrunner
  orcid: 0000-0002-9823-6833
- first_name: Morteza
  full_name: Saghafian, Morteza
  id: f86f7148-b140-11ec-9577-95435b8df824
  last_name: Saghafian
citation:
  ama: Cultrera di Montesano S, Draganov O, Edelsbrunner H, Saghafian M. Chromatic
    topological data analysis. <i>arXiv</i>. doi:<a href="https://doi.org/10.48550/ARXIV.2406.04102">10.48550/ARXIV.2406.04102</a>
  apa: Cultrera di Montesano, S., Draganov, O., Edelsbrunner, H., &#38; Saghafian,
    M. (n.d.). Chromatic topological data analysis. <i>arXiv</i>. <a href="https://doi.org/10.48550/ARXIV.2406.04102">https://doi.org/10.48550/ARXIV.2406.04102</a>
  chicago: Cultrera di Montesano, Sebastiano, Ondrej Draganov, Herbert Edelsbrunner,
    and Morteza Saghafian. “Chromatic Topological Data Analysis.” <i>ArXiv</i>, n.d.
    <a href="https://doi.org/10.48550/ARXIV.2406.04102">https://doi.org/10.48550/ARXIV.2406.04102</a>.
  ieee: S. Cultrera di Montesano, O. Draganov, H. Edelsbrunner, and M. Saghafian,
    “Chromatic topological data analysis,” <i>arXiv</i>. .
  ista: Cultrera di Montesano S, Draganov O, Edelsbrunner H, Saghafian M. Chromatic
    topological data analysis. arXiv, 2406.04102.
  mla: Cultrera di Montesano, Sebastiano, et al. “Chromatic Topological Data Analysis.”
    <i>ArXiv</i>, 2406.04102, doi:<a href="https://doi.org/10.48550/ARXIV.2406.04102">10.48550/ARXIV.2406.04102</a>.
  short: S. Cultrera di Montesano, O. Draganov, H. Edelsbrunner, M. Saghafian, ArXiv
    (n.d.).
corr_author: '1'
date_created: 2025-02-04T16:21:21Z
date_published: 2024-06-06T00:00:00Z
date_updated: 2025-02-10T08:14:27Z
day: '06'
ddc:
- '510'
department:
- _id: GradSch
- _id: HeEd
doi: 10.48550/ARXIV.2406.04102
external_id:
  arxiv:
  - '2406.04102'
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2406.04102
month: '06'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: Chromatic topological data analysis
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: publisher
OA_type: gold
_id: '19005'
abstract:
- lang: eng
  text: "Causal representation learning promises to extend causal models to hidden
    causal\r\nvariables from raw entangled measurements. However, most progress has
    focused\r\non proving identifiability results in different settings, and we are
    not aware of any\r\nsuccessful real-world application. At the same time, the field
    of dynamical systems\r\nbenefited from deep learning and scaled to countless applications
    but does not allow\r\nparameter identification. In this paper, we draw a clear
    connection between the two\r\nand their key assumptions, allowing us to apply
    identifiable methods developed\r\nin causal representation learning to dynamical
    systems. At the same time, we can\r\nleverage scalable differentiable solvers
    developed for differential equations to build\r\nmodels that are both identifiable
    and practical. Overall, we learn explicitly controllable models that isolate the
    trajectory-specific parameters for further downstream\r\ntasks such as out-of-distribution
    classification or treatment effect estimation. We\r\nexperiment with a wind simulator
    with partially known factors of variation. We\r\nalso apply the resulting model
    to real-world climate data and successfully answer\r\ndownstream causal questions
    in line with existing literature on climate change.\r\nCode is available at https://github.com/CausalLearningAI/crl-dynamical-systems."
acknowledgement: "We thank Niklas Boers for recommending the SpeedyWeather simulator
  and Valentino Maiorca\r\nfor guidance on Fourier transformation for SST data. We
  are also grateful to Shimeng Huang and Riccardo Cadei for their feedback on the
  treatment effect estimation experiment and to Jiale Chen and Adeel Pervez for their
  assistance with the solver implementation. Finally, we appreciate the anonymous
  reviewers for their insightful suggestions, which helped improve the manuscript. "
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Dingling
  full_name: Yao, Dingling
  id: d3e02e50-48a8-11ee-8f62-c108061797fa
  last_name: Yao
- first_name: Caroline J
  full_name: Muller, Caroline J
  id: f978ccb0-3f7f-11eb-b193-b0e2bd13182b
  last_name: Muller
  orcid: 0000-0001-5836-5350
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
citation:
  ama: 'Yao D, Muller CJ, Locatello F. Marrying causal representation learning with
    dynamical systems for science. In: <i>38th Conference on Neural Information Processing
    Systems</i>. Vol 37. Neural Information Processing Systems Foundation; 2024.'
  apa: 'Yao, D., Muller, C. J., &#38; Locatello, F. (2024). Marrying causal representation
    learning with dynamical systems for science. In <i>38th Conference on Neural Information
    Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing
    Systems Foundation.'
  chicago: Yao, Dingling, Caroline J Muller, and Francesco Locatello. “Marrying Causal
    Representation Learning with Dynamical Systems for Science.” In <i>38th Conference
    on Neural Information Processing Systems</i>, Vol. 37. Neural Information Processing
    Systems Foundation, 2024.
  ieee: D. Yao, C. J. Muller, and F. Locatello, “Marrying causal representation learning
    with dynamical systems for science,” in <i>38th Conference on Neural Information
    Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Yao D, Muller CJ, Locatello F. 2024. Marrying causal representation learning
    with dynamical systems for science. 38th Conference on Neural Information Processing
    Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information
    Processing Systems, vol. 37.'
  mla: Yao, Dingling, et al. “Marrying Causal Representation Learning with Dynamical
    Systems for Science.” <i>38th Conference on Neural Information Processing Systems</i>,
    vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: D. Yao, C.J. Muller, F. Locatello, in:, 38th Conference on Neural Information
    Processing Systems, Neural Information Processing Systems Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-02-05T07:49:00Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-07-10T11:51:32Z
day: '01'
ddc:
- '000'
- '550'
department:
- _id: CaMu
- _id: FrLo
external_id:
  arxiv:
  - '2405.13888'
file:
- access_level: open_access
  checksum: fe8832367e7143876f178244385d859e
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-05T07:44:58Z
  date_updated: 2025-02-05T07:44:58Z
  file_id: '19006'
  file_name: 2024_NeurIPS_Yao.pdf
  file_size: 2595855
  relation: main_file
  success: 1
file_date_updated: 2025-02-05T07:44:58Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
publication: 38th Conference on Neural Information Processing Systems
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/CausalLearningAI/crl-dynamical-systems
scopus_import: '1'
status: public
title: Marrying causal representation learning with dynamical systems for science
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: publisher
OA_type: hybrid
_id: '19007'
abstract:
- lang: eng
  text: "Learning modular object-centric representations is crucial for systematic
    generalization. Existing methods show promising object-binding capabilities empirically,\r\nbut
    theoretical identifiability guarantees remain relatively underdeveloped. Understanding
    when object-centric representations can theoretically be identified is\r\ncrucial
    for scaling slot-based methods to high-dimensional images with correctness\r\nguarantees.
    To that end, we propose a probabilistic slot-attention algorithm that\r\nimposes
    an aggregate mixture prior over object-centric slot representations, thereby\r\nproviding
    slot identifiability guarantees without supervision, up to an equivalence\r\nrelation.
    We provide empirical verification of our theoretical identifiability result\r\nusing
    both simple 2-dimensional data and high-resolution imaging datasets.\r\n"
acknowledgement: A. Kori is supported by UKRI (grant number EP/S023356/1), as part
  of the UKRI Centre for Doctoral Training in Safe and Trusted AI. B. Glocker and
  F.D.S. Ribeiro acknowledge the support of the UKRI AI programme, and the Engineering
  and Physical Sciences Research Council, for CHAI - EPSRC Causality in Healthcare
  AI Hub (grant number EP/Y028856/1).
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Avinash
  full_name: Kori, Avinash
  last_name: Kori
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
- first_name: Ainkaran
  full_name: Santhirasekaram, Ainkaran
  last_name: Santhirasekaram
- first_name: Francesca
  full_name: Toni, Francesca
  last_name: Toni
- first_name: Ben
  full_name: Glocker, Ben
  last_name: Glocker
- first_name: Fabio
  full_name: De Sousa Ribeiro, Fabio
  last_name: De Sousa Ribeiro
citation:
  ama: 'Kori A, Locatello F, Santhirasekaram A, Toni F, Glocker B, De Sousa Ribeiro
    F. Identifiable object-centric representation learning via probabilistic slot
    attention. In: <i>38th Conference on Neural Information Processing Systems</i>.
    Vol 37. Neural Information Processing Systems Foundation; 2024.'
  apa: 'Kori, A., Locatello, F., Santhirasekaram, A., Toni, F., Glocker, B., &#38;
    De Sousa Ribeiro, F. (2024). Identifiable object-centric representation learning
    via probabilistic slot attention. In <i>38th Conference on Neural Information
    Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing
    Systems Foundation.'
  chicago: Kori, Avinash, Francesco Locatello, Ainkaran Santhirasekaram, Francesca
    Toni, Ben Glocker, and Fabio De Sousa Ribeiro. “Identifiable Object-Centric Representation
    Learning via Probabilistic Slot Attention.” In <i>38th Conference on Neural Information
    Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation,
    2024.
  ieee: A. Kori, F. Locatello, A. Santhirasekaram, F. Toni, B. Glocker, and F. De
    Sousa Ribeiro, “Identifiable object-centric representation learning via probabilistic
    slot attention,” in <i>38th Conference on Neural Information Processing Systems</i>,
    Vancouver, Canada, 2024, vol. 37.
  ista: 'Kori A, Locatello F, Santhirasekaram A, Toni F, Glocker B, De Sousa Ribeiro
    F. 2024. Identifiable object-centric representation learning via probabilistic
    slot attention. 38th Conference on Neural Information Processing Systems. NeurIPS:
    Neural Information Processing Systems, Advances in Neural Information Processing
    Systems, vol. 37.'
  mla: Kori, Avinash, et al. “Identifiable Object-Centric Representation Learning
    via Probabilistic Slot Attention.” <i>38th Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: A. Kori, F. Locatello, A. Santhirasekaram, F. Toni, B. Glocker, F. De Sousa
    Ribeiro, in:, 38th Conference on Neural Information Processing Systems, Neural
    Information Processing Systems Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
date_created: 2025-02-05T08:36:22Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-05-14T11:29:10Z
day: '01'
ddc:
- '000'
department:
- _id: FrLo
external_id:
  arxiv:
  - '2406.07141'
file:
- access_level: open_access
  checksum: d27b3c7102adc28e798fe41001f0b919
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-05T08:34:25Z
  date_updated: 2025-02-05T08:34:25Z
  file_id: '19008'
  file_name: 2024_NeurIPS_Kori.pdf
  file_size: 6943800
  relation: main_file
  success: 1
file_date_updated: 2025-02-05T08:34:25Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
publication: 38th Conference on Neural Information Processing Systems
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Identifiable object-centric representation learning via probabilistic slot
  attention
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
_id: '19013'
abstract:
- lang: eng
  text: We study the singularities of the moduli space of degree e maps from smooth
    genus g curves to an arbitrary smooth hypersurface of low degree. For e large
    compared to g, we show that these moduli spaces have at worst terminal singularities.
    Our main approach is to study the jet schemes of these moduli spaces by developing
    a suitable form of the circle method.
article_processing_charge: No
arxiv: 1
author:
- first_name: Jakob
  full_name: Glas, Jakob
  id: d6423cba-dc74-11ea-a0a7-ee61689ff5fb
  last_name: Glas
- first_name: 'Matthew '
  full_name: 'Hase-Liu, Matthew '
  last_name: Hase-Liu
citation:
  ama: Glas J, Hase-Liu M. Terminal singularities of the moduli space of curves on
    low degree hypersurfaces and the circle method. <i>arXiv</i>. doi:<a href="https://doi.org/10.48550/arXiv.2412.14923">10.48550/arXiv.2412.14923</a>
  apa: Glas, J., &#38; Hase-Liu, M. (n.d.). Terminal singularities of the moduli space
    of curves on low degree hypersurfaces and the circle method. <i>arXiv</i>. <a
    href="https://doi.org/10.48550/arXiv.2412.14923">https://doi.org/10.48550/arXiv.2412.14923</a>
  chicago: Glas, Jakob, and Matthew  Hase-Liu. “Terminal Singularities of the Moduli
    Space of Curves on Low Degree Hypersurfaces and the Circle Method.” <i>ArXiv</i>,
    n.d. <a href="https://doi.org/10.48550/arXiv.2412.14923">https://doi.org/10.48550/arXiv.2412.14923</a>.
  ieee: J. Glas and M. Hase-Liu, “Terminal singularities of the moduli space of curves
    on low degree hypersurfaces and the circle method,” <i>arXiv</i>. .
  ista: Glas J, Hase-Liu M. Terminal singularities of the moduli space of curves on
    low degree hypersurfaces and the circle method. arXiv, <a href="https://doi.org/10.48550/arXiv.2412.14923">10.48550/arXiv.2412.14923</a>.
  mla: Glas, Jakob, and Matthew Hase-Liu. “Terminal Singularities of the Moduli Space
    of Curves on Low Degree Hypersurfaces and the Circle Method.” <i>ArXiv</i>, doi:<a
    href="https://doi.org/10.48550/arXiv.2412.14923">10.48550/arXiv.2412.14923</a>.
  short: J. Glas, M. Hase-Liu, ArXiv (n.d.).
corr_author: '1'
date_created: 2025-02-07T12:04:11Z
date_published: 2024-12-19T00:00:00Z
date_updated: 2025-04-15T08:05:40Z
day: '19'
department:
- _id: TiBr
doi: 10.48550/arXiv.2412.14923
external_id:
  arxiv:
  - '2412.14923'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2412.14923
month: '12'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: draft
related_material:
  record:
  - id: '18295'
    relation: earlier_version
    status: public
status: public
title: Terminal singularities of the moduli space of curves on low degree hypersurfaces
  and the circle method
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: preprint
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2024'
...
---
OA_place: publisher
OA_type: hybrid
_id: '19051'
abstract:
- lang: eng
  text: This paper corrects an error in an earlier work of the author.
article_processing_charge: Yes (via OA deal)
article_type: original
author:
- first_name: Timothy D
  full_name: Browning, Timothy D
  id: 35827D50-F248-11E8-B48F-1D18A9856A87
  last_name: Browning
  orcid: 0000-0002-8314-0177
citation:
  ama: Browning TD. The polynomial sieve and equal sums of like polynomials. <i>International
    Mathematics Research Notices</i>. 2024;2024(13):10165-10168. doi:<a href="https://doi.org/10.1093/imrn/rnae066">10.1093/imrn/rnae066</a>
  apa: Browning, T. D. (2024). The polynomial sieve and equal sums of like polynomials.
    <i>International Mathematics Research Notices</i>. Oxford University Press. <a
    href="https://doi.org/10.1093/imrn/rnae066">https://doi.org/10.1093/imrn/rnae066</a>
  chicago: Browning, Timothy D. “The Polynomial Sieve and Equal Sums of like Polynomials.”
    <i>International Mathematics Research Notices</i>. Oxford University Press, 2024.
    <a href="https://doi.org/10.1093/imrn/rnae066">https://doi.org/10.1093/imrn/rnae066</a>.
  ieee: T. D. Browning, “The polynomial sieve and equal sums of like polynomials,”
    <i>International Mathematics Research Notices</i>, vol. 2024, no. 13. Oxford University
    Press, pp. 10165–10168, 2024.
  ista: Browning TD. 2024. The polynomial sieve and equal sums of like polynomials.
    International Mathematics Research Notices. 2024(13), 10165–10168.
  mla: Browning, Timothy D. “The Polynomial Sieve and Equal Sums of like Polynomials.”
    <i>International Mathematics Research Notices</i>, vol. 2024, no. 13, Oxford University
    Press, 2024, pp. 10165–68, doi:<a href="https://doi.org/10.1093/imrn/rnae066">10.1093/imrn/rnae066</a>.
  short: T.D. Browning, International Mathematics Research Notices 2024 (2024) 10165–10168.
corr_author: '1'
date_created: 2025-02-18T07:15:50Z
date_published: 2024-07-01T00:00:00Z
date_updated: 2025-09-09T12:16:45Z
day: '01'
ddc:
- '510'
department:
- _id: TiBr
doi: 10.1093/imrn/rnae066
external_id:
  isi:
  - '001196957300001'
file:
- access_level: open_access
  checksum: b625b8adf018d2a97591813c1fc17b96
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-18T07:56:36Z
  date_updated: 2025-02-18T07:56:36Z
  file_id: '19052'
  file_name: 2024_IMRN_Browning.pdf
  file_size: 205750
  relation: main_file
  success: 1
file_date_updated: 2025-02-18T07:56:36Z
has_accepted_license: '1'
intvolume: '      2024'
isi: 1
issue: '13'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 10165-10168
publication: International Mathematics Research Notices
publication_identifier:
  eissn:
  - 1687-0247
  issn:
  - 1073-7928
publication_status: published
publisher: Oxford University Press
quality_controlled: '1'
related_material:
  record:
  - id: '254'
    relation: earlier_version
    status: public
scopus_import: '1'
status: public
title: The polynomial sieve and equal sums of like polynomials
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 317138e5-6ab7-11ef-aa6d-ffef3953e345
volume: 2024
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '19063'
abstract:
- lang: eng
  text: "Instruction-tuned Large Language Models (LLMs) show impressive results in
    numerous practical applications, but they lack essential safety features that
    are common in other areas of computer science, particularly an explicit separation
    of instructions and data. This makes them vulnerable to manipulations such as
    indirect prompt injections and generally unsuitable for safety-critical tasks.
    Surprisingly, there is currently no established definition or benchmark to quantify
    this phenomenon. In this work, we close this gap by introducing a formal measure
    for instruction-data separation and an empirical variant that is calculable from
    a model's outputs. We also present a new dataset, SEP, that allows estimating
    the measure for real-world models. Our results on various LLMs show that the problem
    of instruction-data separation is real: all models fail to achieve high separation,
    and canonical mitigation techniques, such as prompt engineering and fine-tuning,
    either fail to substantially improve separation or reduce model utility. The source
    code and SEP dataset are openly accessible at https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed.\r\n"
acknowledged_ssus:
- _id: ScienComp
acknowledgement: The authors would like to sincerely thank Juan Rocamonde for valuable
  feedback to our manuscript. We acknowledge the support from the Scientific Service
  Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).
  We thank Dan Alistarh for providing us with computational resources. This work was
  partially funded by the German Federal Ministry of Education and Research (BMBF)
  under the grant AIgenCY (16KIS2012) and ELSA – European Lighthouse on Secure and
  Safe AI funded by the European Union under grant agreement No. 101070617. Views
  and opinions expressed are however those of the authors only and do not necessarily
  reflect those of the European Union or European Commission. Neither the European
  Union nor the European Commission can be held responsible for them.
article_number: '2403.06833'
article_processing_charge: No
arxiv: 1
author:
- first_name: Egor
  full_name: Zverev, Egor
  id: 05162b19-1340-11ed-8f02-fa94e0e8c3bc
  last_name: Zverev
- first_name: Sahar
  full_name: Abdelnabi, Sahar
  last_name: Abdelnabi
- first_name: Soroush
  full_name: Tabesh, Soroush
  id: 06000900-6068-11ef-8d61-c2472ef2e752
  last_name: Tabesh
  orcid: 0009-0003-4119-6281
- first_name: Mario
  full_name: Fritz, Mario
  last_name: Fritz
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: Zverev E, Abdelnabi S, Tabesh S, Fritz M, Lampert C. Can LLMs separate instructions
    from data? And what do we even mean by that? <i>arXiv</i>. 2024. doi:<a href="https://doi.org/10.48550/arXiv.2403.06833">10.48550/arXiv.2403.06833</a>
  apa: Zverev, E., Abdelnabi, S., Tabesh, S., Fritz, M., &#38; Lampert, C. (2024).
    Can LLMs separate instructions from data? And what do we even mean by that? <i>arXiv</i>.
    <a href="https://doi.org/10.48550/arXiv.2403.06833">https://doi.org/10.48550/arXiv.2403.06833</a>
  chicago: Zverev, Egor, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, and Christoph
    Lampert. “Can LLMs Separate Instructions from Data? And What Do We Even Mean by
    That?” <i>ArXiv</i>, 2024. <a href="https://doi.org/10.48550/arXiv.2403.06833">https://doi.org/10.48550/arXiv.2403.06833</a>.
  ieee: E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, and C. Lampert, “Can LLMs separate
    instructions from data? And what do we even mean by that?,” <i>arXiv</i>. 2024.
  ista: Zverev E, Abdelnabi S, Tabesh S, Fritz M, Lampert C. 2024. Can LLMs separate
    instructions from data? And what do we even mean by that? arXiv, 2403.06833.
  mla: Zverev, Egor, et al. “Can LLMs Separate Instructions from Data? And What Do
    We Even Mean by That?” <i>ArXiv</i>, 2403.06833, 2024, doi:<a href="https://doi.org/10.48550/arXiv.2403.06833">10.48550/arXiv.2403.06833</a>.
  short: E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, C. Lampert, ArXiv (2024).
corr_author: '1'
date_created: 2025-02-20T10:13:42Z
date_published: 2024-03-01T00:00:00Z
date_updated: 2025-02-24T12:52:23Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: ChLa
doi: 10.48550/arXiv.2403.06833
external_id:
  arxiv:
  - '2403.06833'
file:
- access_level: open_access
  checksum: 35eb43968684b87be59144603ef10af0
  content_type: application/pdf
  creator: ezverev
  date_created: 2025-02-20T10:11:45Z
  date_updated: 2025-02-20T10:11:45Z
  file_id: '19064'
  file_name: 2403.06833v3.pdf
  file_size: 530972
  relation: main_file
  success: 1
file_date_updated: 2025-02-20T10:11:45Z
has_accepted_license: '1'
language:
- iso: eng
license: https://creativecommons.org/licenses/by-sa/4.0/
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2403.06833
month: '03'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: published
related_material:
  link:
  - relation: software
    url: ' https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed'
status: public
title: Can LLMs separate instructions from data? And what do we even mean by that?
tmp:
  image: /images/cc_by_sa.png
  legal_code_url: https://creativecommons.org/licenses/by-sa/4.0/legalcode
  name: Creative Commons Attribution-ShareAlike 4.0 International Public License (CC
    BY-SA 4.0)
  short: CC BY-SA (4.0)
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '19307'
abstract:
- lang: eng
  text: "This repository contains the data, scripts, SAM codes and files required
    to reproduce the results of the manuscript \"The Unreasonable Efficiency of Total
    Rain Evaporation Removal in Triggering Convective Self-Aggregation\" submitted
    to the Geophysical Research Letters (GRL).\r\n\r\nBrief description of project:
    This project aims to examine the impact of rain evaporation removal or reduction
    in the planetary boundary layer (PBL) on convective self aggregation (CSA). Non-rotating
    radiative-convective equilibrium (RCE) simulations were conducted with the System
    for Atmospheric Modeling (SAM) cloud resolving model. Rain evaporation in the
    lowest 1 km was progressively reduced and the effect on CSA was investigated.
    The physical processes underlying this type of aggregation (referred to in the
    manuscript as no-evaporation CSA, or NE-CSA) were analyzed and described. \r\nThe
    default SAM code base (version 6.10.8) can be downloaded from here: http://rossby.msrc.sunysb.edu/~marat/SAM.html"
article_processing_charge: No
author:
- first_name: Yi-Ling
  full_name: Hwong, Yi-Ling
  id: 1217aa61-4dd1-11ec-9ac3-f2ba3f17ee22
  last_name: Hwong
  orcid: 0000-0001-9281-3479
- first_name: Caroline J
  full_name: Muller, Caroline J
  id: f978ccb0-3f7f-11eb-b193-b0e2bd13182b
  last_name: Muller
  orcid: 0000-0001-5836-5350
citation:
  ama: Hwong Y-L, Muller CJ. Data - The unreasonable efficiency of total rain evaporation
    removal in triggering convective self-aggregation. 2024. doi:<a href="https://doi.org/10.5281/ZENODO.10687169">10.5281/ZENODO.10687169</a>
  apa: Hwong, Y.-L., &#38; Muller, C. J. (2024). Data - The unreasonable efficiency
    of total rain evaporation removal in triggering convective self-aggregation. Zenodo.
    <a href="https://doi.org/10.5281/ZENODO.10687169">https://doi.org/10.5281/ZENODO.10687169</a>
  chicago: Hwong, Yi-Ling, and Caroline J Muller. “Data - The Unreasonable Efficiency
    of Total Rain Evaporation Removal in Triggering Convective Self-Aggregation.”
    Zenodo, 2024. <a href="https://doi.org/10.5281/ZENODO.10687169">https://doi.org/10.5281/ZENODO.10687169</a>.
  ieee: Y.-L. Hwong and C. J. Muller, “Data - The unreasonable efficiency of total
    rain evaporation removal in triggering convective self-aggregation.” Zenodo, 2024.
  ista: Hwong Y-L, Muller CJ. 2024. Data - The unreasonable efficiency of total rain
    evaporation removal in triggering convective self-aggregation, Zenodo, <a href="https://doi.org/10.5281/ZENODO.10687169">10.5281/ZENODO.10687169</a>.
  mla: Hwong, Yi-Ling, and Caroline J. Muller. <i>Data - The Unreasonable Efficiency
    of Total Rain Evaporation Removal in Triggering Convective Self-Aggregation</i>.
    Zenodo, 2024, doi:<a href="https://doi.org/10.5281/ZENODO.10687169">10.5281/ZENODO.10687169</a>.
  short: Y.-L. Hwong, C.J. Muller, (2024).
corr_author: '1'
date_created: 2025-03-07T08:39:40Z
date_published: 2024-02-21T00:00:00Z
date_updated: 2025-09-04T13:16:39Z
day: '21'
ddc:
- '550'
department:
- _id: CaMu
doi: 10.5281/ZENODO.10687169
has_accepted_license: '1'
main_file_link:
- open_access: '1'
  url: https://doi.org/10.5281/zenodo.8369509
month: '02'
oa: 1
oa_version: Published Version
publisher: Zenodo
related_material:
  record:
  - id: '15186'
    relation: used_in_publication
    status: public
status: public
title: Data - The unreasonable efficiency of total rain evaporation removal in triggering
  convective self-aggregation
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: research_data_reference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
OA_place: publisher
OA_type: diamond
_id: '19408'
abstract:
- lang: eng
  text: 'Continual learning is a subfield of machine learning, which aims to allow
    machine learning models to continuously learn on new data, by accumulating knowledge
    without forgetting what was learned in the past. In this work, we take a step
    back, and ask: "Why should one care about continual learning in the first place?".
    We set the stage by examining recent continual learning papers published at four
    major machine learning conferences, and show that memory-constrained settings
    dominate the field. Then, we discuss five open problems in machine learning, and
    even though they might seem unrelated to continual learning at first sight, we
    show that continual learning will inevitably be part of their solution. These
    problems are model editing, personalization and specialization, on-device learning,
    faster (re-)training and reinforcement learning. Finally, by comparing the desiderata
    from these unsolved problems and the current assumptions in continual learning,
    we highlight and discuss four future directions for continual learning research.
    We hope that this work offers an interesting perspective on the future of continual
    learning, while displaying its potential value and the paths we have to pursue
    in order to make it successful. This work is the result of the many discussions
    the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.'
alternative_title:
- TMLR
article_processing_charge: No
article_type: original
arxiv: 1
author:
- first_name: Eli
  full_name: Verwimp, Eli
  last_name: Verwimp
- first_name: Rahaf
  full_name: Aljundi, Rahaf
  last_name: Aljundi
- first_name: Shai
  full_name: Ben-David, Shai
  last_name: Ben-David
- first_name: Matthias
  full_name: Bethge, Matthias
  last_name: Bethge
- first_name: Andrea
  full_name: Cossu, Andrea
  last_name: Cossu
- first_name: Alexander
  full_name: Gepperth, Alexander
  last_name: Gepperth
- first_name: Tyler L.
  full_name: Hayes, Tyler L.
  last_name: Hayes
- first_name: Eyke
  full_name: Hüllermeier, Eyke
  last_name: Hüllermeier
- first_name: Christopher
  full_name: Kanan, Christopher
  last_name: Kanan
- first_name: Dhireesha
  full_name: Kudithipudi, Dhireesha
  last_name: Kudithipudi
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Martin
  full_name: Mundt, Martin
  last_name: Mundt
- first_name: Razvan
  full_name: Pascanu, Razvan
  last_name: Pascanu
- first_name: Adrian
  full_name: Popescu, Adrian
  last_name: Popescu
- first_name: Andreas S.
  full_name: Tolias, Andreas S.
  last_name: Tolias
- first_name: Joost
  full_name: Van De Weijer, Joost
  last_name: Van De Weijer
- first_name: Bing
  full_name: Liu, Bing
  last_name: Liu
- first_name: Vincenzo
  full_name: Lomonaco, Vincenzo
  last_name: Lomonaco
- first_name: Tinne
  full_name: Tuytelaars, Tinne
  last_name: Tuytelaars
- first_name: Gido M.
  full_name: Van De Ven, Gido M.
  last_name: Van De Ven
citation:
  ama: 'Verwimp E, Aljundi R, Ben-David S, et al. Continual learning: Applications
    and the road forward. <i>Transactions on Machine Learning Research</i>. 2024;2024.'
  apa: 'Verwimp, E., Aljundi, R., Ben-David, S., Bethge, M., Cossu, A., Gepperth,
    A., … Van De Ven, G. M. (2024). Continual learning: Applications and the road
    forward. <i>Transactions on Machine Learning Research</i>. Transactions on Machine
    Learning Research.'
  chicago: 'Verwimp, Eli, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu,
    Alexander Gepperth, Tyler L. Hayes, et al. “Continual Learning: Applications and
    the Road Forward.” <i>Transactions on Machine Learning Research</i>. Transactions
    on Machine Learning Research, 2024.'
  ieee: 'E. Verwimp <i>et al.</i>, “Continual learning: Applications and the road
    forward,” <i>Transactions on Machine Learning Research</i>, vol. 2024. Transactions
    on Machine Learning Research, 2024.'
  ista: 'Verwimp E, Aljundi R, Ben-David S, Bethge M, Cossu A, Gepperth A, Hayes TL,
    Hüllermeier E, Kanan C, Kudithipudi D, Lampert C, Mundt M, Pascanu R, Popescu
    A, Tolias AS, Van De Weijer J, Liu B, Lomonaco V, Tuytelaars T, Van De Ven GM.
    2024. Continual learning: Applications and the road forward. Transactions on Machine
    Learning Research. 2024.'
  mla: 'Verwimp, Eli, et al. “Continual Learning: Applications and the Road Forward.”
    <i>Transactions on Machine Learning Research</i>, vol. 2024, Transactions on Machine
    Learning Research, 2024.'
  short: E. Verwimp, R. Aljundi, S. Ben-David, M. Bethge, A. Cossu, A. Gepperth, T.L.
    Hayes, E. Hüllermeier, C. Kanan, D. Kudithipudi, C. Lampert, M. Mundt, R. Pascanu,
    A. Popescu, A.S. Tolias, J. Van De Weijer, B. Liu, V. Lomonaco, T. Tuytelaars,
    G.M. Van De Ven, Transactions on Machine Learning Research 2024 (2024).
date_created: 2025-03-16T23:01:25Z
date_published: 2024-04-12T00:00:00Z
date_updated: 2025-03-20T09:21:02Z
day: '12'
ddc:
- '000'
department:
- _id: ChLa
external_id:
  arxiv:
  - '2311.11908'
file:
- access_level: open_access
  checksum: 0714e12f7423cd098976ed9974561155
  content_type: application/pdf
  creator: dernst
  date_created: 2025-03-20T09:02:18Z
  date_updated: 2025-03-20T09:02:18Z
  file_id: '19426'
  file_name: 2024_TMLR_Verwimp.pdf
  file_size: 1367966
  relation: main_file
  success: 1
file_date_updated: 2025-03-20T09:02:18Z
has_accepted_license: '1'
intvolume: '      2024'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
publication: Transactions on Machine Learning Research
publication_identifier:
  eissn:
  - 2835-8856
publication_status: published
publisher: Transactions on Machine Learning Research
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Continual learning: Applications and the road forward'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 2024
year: '2024'
...
---
OA_type: closed access
_id: '19446'
abstract:
- lang: eng
  text: This Comment explores new approaches to enrich large-scale population data,
    including incorporating macro-environmental and digital health measures.
acknowledgement: Funded by the European Union. Complementary funding was received
  by the UK Research and Innovation (UKRI) under the UK government’s Horizon Europe
  funding guarantee (10041392 and 10038599). Views and opinions expressed are however
  those of the author(s) only and do not necessarily reflect those of the European
  Union, the European Health and Digital Executive Agency (HADEA) or UKRI. The European
  Union, HADEA and UKRI cannot be held responsible for them. This work received also
  support from Chinese Ministry for Science and Technology (MOST), the Horizon 2020-funded
  European Research Council Advanced Grant ‘STRATIFY’ (695313), the German Research
  Foundation (COPE; 675346; NE 1383/15-1 (CoviDrug)) and the National Natural Science
  Foundation of China grant 82150710554.
article_processing_charge: No
article_type: letter_note
author:
- first_name: Frauke
  full_name: Nees, Frauke
  last_name: Nees
- first_name: Paul
  full_name: Renner, Paul
  last_name: Renner
- first_name: Nathalie E.
  full_name: Holz, Nathalie E.
  last_name: Holz
- first_name: Elli
  full_name: Polemiti, Elli
  last_name: Polemiti
- first_name: Sebastian
  full_name: Siehl, Sebastian
  last_name: Siehl
- first_name: Sören
  full_name: Hese, Sören
  last_name: Hese
- first_name: Kerstin
  full_name: Schepanski, Kerstin
  last_name: Schepanski
- first_name: Gunter
  full_name: Schumann, Gunter
  last_name: Schumann
- first_name: Henrik
  full_name: Walter, Henrik
  last_name: Walter
- first_name: Andreas
  full_name: Heinz, Andreas
  last_name: Heinz
- first_name: Markus
  full_name: Ralser, Markus
  last_name: Ralser
- first_name: Sven
  full_name: Twardziok, Sven
  last_name: Twardziok
- first_name: Nilakshi
  full_name: Vaidya, Nilakshi
  last_name: Vaidya
- first_name: Antoine
  full_name: Bernas, Antoine
  last_name: Bernas
- first_name: Emin
  full_name: Serin, Emin
  last_name: Serin
- first_name: Marcel
  full_name: Jentsch, Marcel
  last_name: Jentsch
- first_name: Esther
  full_name: Hitchen, Esther
  last_name: Hitchen
- first_name: Hedi
  full_name: Kebir, Hedi
  last_name: Kebir
- first_name: Tristram A.
  full_name: Lett, Tristram A.
  last_name: Lett
- first_name: Jean Charles
  full_name: Roy, Jean Charles
  last_name: Roy
- first_name: Roland
  full_name: Eils, Roland
  last_name: Eils
- first_name: Ulrike Helene
  full_name: Taron, Ulrike Helene
  last_name: Taron
- first_name: Tatjana
  full_name: Schütz, Tatjana
  last_name: Schütz
- first_name: Jamie
  full_name: Banks, Jamie
  last_name: Banks
- first_name: Tobias
  full_name: Banaschewski, Tobias
  last_name: Banaschewski
- first_name: Karina
  full_name: Jansone, Karina
  last_name: Jansone
- first_name: Nina
  full_name: Christmann, Nina
  last_name: Christmann
- first_name: Andreas
  full_name: Meyer-Lindenberg, Andreas
  last_name: Meyer-Lindenberg
- first_name: Heike
  full_name: Tost, Heike
  last_name: Tost
- first_name: Nathalie
  full_name: Holz, Nathalie
  last_name: Holz
- first_name: Emanuel
  full_name: Schwarz, Emanuel
  last_name: Schwarz
- first_name: Argyris
  full_name: Stringaris, Argyris
  last_name: Stringaris
- first_name: Maja
  full_name: Neidhart, Maja
  last_name: Neidhart
- first_name: Beke
  full_name: Seefried, Beke
  last_name: Seefried
- first_name: Rieke
  full_name: Aden, Rieke
  last_name: Aden
- first_name: Ole A.
  full_name: Andreassen, Ole A.
  last_name: Andreassen
- first_name: Lars T.
  full_name: Westlye, Lars T.
  last_name: Westlye
- first_name: Dennis
  full_name: Van Der Meer, Dennis
  last_name: Van Der Meer
- first_name: Sara
  full_name: Fernandez, Sara
  last_name: Fernandez
- first_name: Rikka
  full_name: Kjelkenes, Rikka
  last_name: Kjelkenes
- first_name: Helga
  full_name: Ask, Helga
  last_name: Ask
- first_name: Michael
  full_name: Rapp, Michael
  last_name: Rapp
- first_name: Mira
  full_name: Tschorn, Mira
  last_name: Tschorn
- first_name: Sarah Jane
  full_name: Böttger, Sarah Jane
  last_name: Böttger
- first_name: Andre
  full_name: Marquand, Andre
  last_name: Marquand
- first_name: Gaia
  full_name: Novarino, Gaia
  id: 3E57A680-F248-11E8-B48F-1D18A9856A87
  last_name: Novarino
  orcid: 0000-0002-7673-7178
- first_name: Lena
  full_name: Marr, Lena
  id: 4406F586-F248-11E8-B48F-1D18A9856A87
  last_name: Marr
- first_name: Mel
  full_name: Slater, Mel
  last_name: Slater
- first_name: Guillem Feixas
  full_name: Viapiana, Guillem Feixas
  last_name: Viapiana
- first_name: Francisco Eiroa
  full_name: Orosa, Francisco Eiroa
  last_name: Orosa
- first_name: Jaime
  full_name: Gallego, Jaime
  last_name: Gallego
- first_name: Alvaro
  full_name: Pastor, Alvaro
  last_name: Pastor
- first_name: Andreas J.
  full_name: Forstner, Andreas J.
  last_name: Forstner
- first_name: Per
  full_name: Hoffmann, Per
  last_name: Hoffmann
- first_name: Markus M.
  full_name: Nöthen, Markus M.
  last_name: Nöthen
- first_name: Isabelle
  full_name: Claus, Isabelle
  last_name: Claus
- first_name: Abigail
  full_name: Miller, Abigail
  last_name: Miller
- first_name: Carina M.
  full_name: Mathey, Carina M.
  last_name: Mathey
- first_name: Stefanie
  full_name: Heilmann-Heimbach, Stefanie
  last_name: Heilmann-Heimbach
- first_name: Peter
  full_name: Sommer, Peter
  last_name: Sommer
- first_name: Myrto
  full_name: Patraskaki, Myrto
  last_name: Patraskaki
- first_name: Johannes
  full_name: Wilbertz, Johannes
  last_name: Wilbertz
- first_name: Karen
  full_name: Schmitt, Karen
  last_name: Schmitt
- first_name: Viktor
  full_name: Jirsa, Viktor
  last_name: Jirsa
- first_name: Spase
  full_name: Petkoski, Spase
  last_name: Petkoski
- first_name: Séverine
  full_name: Pitel, Séverine
  last_name: Pitel
- first_name: Lisa
  full_name: Otten, Lisa
  last_name: Otten
- first_name: Anastasios Polykarpos
  full_name: Athanasiadis, Anastasios Polykarpos
  last_name: Athanasiadis
- first_name: Charlie
  full_name: Pearmund, Charlie
  last_name: Pearmund
- first_name: Bernhard
  full_name: Spanlang, Bernhard
  last_name: Spanlang
- first_name: Elena
  full_name: Alvarez, Elena
  last_name: Alvarez
- first_name: Mavi
  full_name: Sanchez, Mavi
  last_name: Sanchez
- first_name: Arantxa
  full_name: Giner, Arantxa
  last_name: Giner
- first_name: Tianye
  full_name: Jia, Tianye
  last_name: Jia
- first_name: Yanting
  full_name: Gong, Yanting
  last_name: Gong
- first_name: Yunman
  full_name: Xia, Yunman
  last_name: Xia
- first_name: Xiao
  full_name: Chang, Xiao
  last_name: Chang
- first_name: Vince
  full_name: Calhoun, Vince
  last_name: Calhoun
- first_name: Jingyu
  full_name: Liu, Jingyu
  last_name: Liu
- first_name: Ameli
  full_name: Schwalber, Ameli
  last_name: Schwalber
- first_name: Paul
  full_name: Thompson, Paul
  last_name: Thompson
- first_name: Nicholas
  full_name: Clinton, Nicholas
  last_name: Clinton
- first_name: Sylvane
  full_name: Desrivières, Sylvane
  last_name: Desrivières
- first_name: Allan H.
  full_name: Young, Allan H.
  last_name: Young
- first_name: Bernd
  full_name: Stahl, Bernd
  last_name: Stahl
- first_name: George
  full_name: Ogoh, George
  last_name: Ogoh
citation:
  ama: Nees F, Renner P, Holz NE, et al. Large-scale population data enrichment in
    mental health research. <i>Nature Mental Health</i>. 2024;2(10):1124-1127. doi:<a
    href="https://doi.org/10.1038/s44220-024-00316-z">10.1038/s44220-024-00316-z</a>
  apa: Nees, F., Renner, P., Holz, N. E., Polemiti, E., Siehl, S., Hese, S., … Ogoh,
    G. (2024). Large-scale population data enrichment in mental health research. <i>Nature
    Mental Health</i>. Springer Nature. <a href="https://doi.org/10.1038/s44220-024-00316-z">https://doi.org/10.1038/s44220-024-00316-z</a>
  chicago: Nees, Frauke, Paul Renner, Nathalie E. Holz, Elli Polemiti, Sebastian Siehl,
    Sören Hese, Kerstin Schepanski, et al. “Large-Scale Population Data Enrichment
    in Mental Health Research.” <i>Nature Mental Health</i>. Springer Nature, 2024.
    <a href="https://doi.org/10.1038/s44220-024-00316-z">https://doi.org/10.1038/s44220-024-00316-z</a>.
  ieee: F. Nees <i>et al.</i>, “Large-scale population data enrichment in mental health
    research,” <i>Nature Mental Health</i>, vol. 2, no. 10. Springer Nature, pp. 1124–1127,
    2024.
  ista: Nees F, Renner P, Holz NE, Polemiti E, Siehl S, Hese S, Schepanski K, Schumann
    G, Walter H, Heinz A, Ralser M, Twardziok S, Vaidya N, Bernas A, Serin E, Jentsch
    M, Hitchen E, Kebir H, Lett TA, Roy JC, Eils R, Taron UH, Schütz T, Banks J, Banaschewski
    T, Jansone K, Christmann N, Meyer-Lindenberg A, Tost H, Holz N, Schwarz E, Stringaris
    A, Neidhart M, Seefried B, Aden R, Andreassen OA, Westlye LT, Van Der Meer D,
    Fernandez S, Kjelkenes R, Ask H, Rapp M, Tschorn M, Böttger SJ, Marquand A, Novarino
    G, Marr L, Slater M, Viapiana GF, Orosa FE, Gallego J, Pastor A, Forstner AJ,
    Hoffmann P, Nöthen MM, Claus I, Miller A, Mathey CM, Heilmann-Heimbach S, Sommer
    P, Patraskaki M, Wilbertz J, Schmitt K, Jirsa V, Petkoski S, Pitel S, Otten L,
    Athanasiadis AP, Pearmund C, Spanlang B, Alvarez E, Sanchez M, Giner A, Jia T,
    Gong Y, Xia Y, Chang X, Calhoun V, Liu J, Schwalber A, Thompson P, Clinton N,
    Desrivières S, Young AH, Stahl B, Ogoh G. 2024. Large-scale population data enrichment
    in mental health research. Nature Mental Health. 2(10), 1124–1127.
  mla: Nees, Frauke, et al. “Large-Scale Population Data Enrichment in Mental Health
    Research.” <i>Nature Mental Health</i>, vol. 2, no. 10, Springer Nature, 2024,
    pp. 1124–27, doi:<a href="https://doi.org/10.1038/s44220-024-00316-z">10.1038/s44220-024-00316-z</a>.
  short: F. Nees, P. Renner, N.E. Holz, E. Polemiti, S. Siehl, S. Hese, K. Schepanski,
    G. Schumann, H. Walter, A. Heinz, M. Ralser, S. Twardziok, N. Vaidya, A. Bernas,
    E. Serin, M. Jentsch, E. Hitchen, H. Kebir, T.A. Lett, J.C. Roy, R. Eils, U.H.
    Taron, T. Schütz, J. Banks, T. Banaschewski, K. Jansone, N. Christmann, A. Meyer-Lindenberg,
    H. Tost, N. Holz, E. Schwarz, A. Stringaris, M. Neidhart, B. Seefried, R. Aden,
    O.A. Andreassen, L.T. Westlye, D. Van Der Meer, S. Fernandez, R. Kjelkenes, H.
    Ask, M. Rapp, M. Tschorn, S.J. Böttger, A. Marquand, G. Novarino, L. Marr, M.
    Slater, G.F. Viapiana, F.E. Orosa, J. Gallego, A. Pastor, A.J. Forstner, P. Hoffmann,
    M.M. Nöthen, I. Claus, A. Miller, C.M. Mathey, S. Heilmann-Heimbach, P. Sommer,
    M. Patraskaki, J. Wilbertz, K. Schmitt, V. Jirsa, S. Petkoski, S. Pitel, L. Otten,
    A.P. Athanasiadis, C. Pearmund, B. Spanlang, E. Alvarez, M. Sanchez, A. Giner,
    T. Jia, Y. Gong, Y. Xia, X. Chang, V. Calhoun, J. Liu, A. Schwalber, P. Thompson,
    N. Clinton, S. Desrivières, A.H. Young, B. Stahl, G. Ogoh, Nature Mental Health
    2 (2024) 1124–1127.
date_created: 2025-03-23T23:01:28Z
date_published: 2024-10-01T00:00:00Z
date_updated: 2025-03-25T08:28:39Z
day: '01'
department:
- _id: GaNo
doi: 10.1038/s44220-024-00316-z
intvolume: '         2'
issue: '10'
language:
- iso: eng
month: '10'
oa_version: None
page: 1124-1127
publication: Nature Mental Health
publication_identifier:
  eissn:
  - 2731-6076
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
scopus_import: '1'
status: public
title: Large-scale population data enrichment in mental health research
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 2
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '19510'
abstract:
- lang: eng
  text: "We propose a new variant of the Adam optimizer [Kingma and Ba, 2014] called\r\nMICROADAM
    that specifically minimizes memory overheads, while maintaining\r\ntheoretical
    convergence guarantees. We achieve this by compressing the gradient\r\ninformation
    before it is fed into the optimizer state, thereby reducing its memory\r\nfootprint
    significantly. We control the resulting compression error via a novel\r\ninstance
    of the classical error feedback mechanism from distributed optimization [Seide
    et al., 2014, Alistarh et al., 2018, Karimireddy et al., 2019] in which\r\nthe
    error correction information is itself compressed to allow for practical memory\r\ngains.
    We prove that the resulting approach maintains theoretical convergence\r\nguarantees
    competitive to those of AMSGrad, while providing good practical performance. Specifically,
    we show that MICROADAM can be implemented efficiently\r\non GPUs: on both million-scale
    (BERT) and billion-scale (LLaMA) models, MICROADAM provides practical convergence
    competitive to that of the uncompressed\r\nAdam baseline, with lower memory usage
    and similar running time. Our code is\r\navailable at https://github.com/IST-DASLab/MicroAdam."
acknowledged_ssus:
- _id: CampIT
acknowledgement: The authors thank Razvan Pascanu, Mahdi Nikdan and Soroush Tabesh
  for their valuable feedback, the IT department from Institute of Science and Technology
  Austria for the hardware support and Weights and Biases for the infrastructure to
  track all our experiments. Mher Safaryan has received funding from the European
  Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie
  grant agreement No 101034413.
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Ionut-Vlad
  full_name: Modoranu, Ionut-Vlad
  id: 449f7a18-f128-11eb-9611-9b430c0c6333
  last_name: Modoranu
- first_name: Mher
  full_name: Safaryan, Mher
  id: dd546b39-0804-11ed-9c55-ef075c39778d
  last_name: Safaryan
- first_name: Grigory
  full_name: Malinovsky, Grigory
  last_name: Malinovsky
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Thomas
  full_name: Robert, Thomas
  id: de632733-1457-11f0-ae22-b5914b8c1c41
  last_name: Robert
- first_name: Peter
  full_name: Richtárik, Peter
  last_name: Richtárik
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Modoranu I-V, Safaryan M, Malinovsky G, et al. MICROADAM: Accurate adaptive
    optimization with low space overhead and provable convergence. In: <i>38th Conference
    on Neural Information Processing Systems</i>. Vol 37. Neural Information Processing
    Systems Foundation; 2024.'
  apa: 'Modoranu, I.-V., Safaryan, M., Malinovsky, G., Kurtic, E., Robert, T., Richtárik,
    P., &#38; Alistarh, D.-A. (2024). MICROADAM: Accurate adaptive optimization with
    low space overhead and provable convergence. In <i>38th Conference on Neural Information
    Processing Systems</i> (Vol. 37). Neural Information Processing Systems Foundation.'
  chicago: 'Modoranu, Ionut-Vlad, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic,
    Thomas Robert, Peter Richtárik, and Dan-Adrian Alistarh. “MICROADAM: Accurate
    Adaptive Optimization with Low Space Overhead and Provable Convergence.” In <i>38th
    Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information
    Processing Systems Foundation, 2024.'
  ieee: 'I.-V. Modoranu <i>et al.</i>, “MICROADAM: Accurate adaptive optimization
    with low space overhead and provable convergence,” in <i>38th Conference on Neural
    Information Processing Systems</i>, 2024, vol. 37.'
  ista: 'Modoranu I-V, Safaryan M, Malinovsky G, Kurtic E, Robert T, Richtárik P,
    Alistarh D-A. 2024. MICROADAM: Accurate adaptive optimization with low space overhead
    and provable convergence. 38th Conference on Neural Information Processing Systems.
    , Advances in Neural Information Processing Systems, vol. 37.'
  mla: 'Modoranu, Ionut-Vlad, et al. “MICROADAM: Accurate Adaptive Optimization with
    Low Space Overhead and Provable Convergence.” <i>38th Conference on Neural Information
    Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation,
    2024.'
  short: I.-V. Modoranu, M. Safaryan, G. Malinovsky, E. Kurtic, T. Robert, P. Richtárik,
    D.-A. Alistarh, in:, 38th Conference on Neural Information Processing Systems,
    Neural Information Processing Systems Foundation, 2024.
corr_author: '1'
date_created: 2025-04-06T22:01:32Z
date_published: 2024-12-20T00:00:00Z
date_updated: 2025-05-14T11:32:52Z
day: '20'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2405.15593'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2405.15593
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/IST-DASLab/MicroAdam
scopus_import: '1'
status: public
title: 'MICROADAM: Accurate adaptive optimization with low space overhead and provable
  convergence'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '19511'
abstract:
- lang: eng
  text: We introduce QuaRot, a new Quantization scheme based on Rotations, which is
    able to quantize LLMs end-to-end, including all weights, activations, and KV cache
    in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden
    state without changing the output, making quantization easier. This computational
    invariance is applied to the hidden state (residual) of the LLM, as well as to
    the activations of the feed-forward components, aspects of the attention mechanism,
    and to the KV cache. The result is a quantized model where all matrix multiplications
    are performed in 4 bits, without any channels identified for retention in higher
    precision. Our 4-bit quantized LLAMA2-70B model has losses of at most 0.47 WikiText-2
    perplexity and retains 99% of the zero-shot performance. We also show that QuaRot
    can provide lossless 6 and 8 bit LLAMA-2 models without any calibration data using
    round-to-nearest quantization. Code is available at github.com/spcl/QuaRot.
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Saleh
  full_name: Ashkboos, Saleh
  last_name: Ashkboos
- first_name: Amirkeivan
  full_name: Mohtashami, Amirkeivan
  last_name: Mohtashami
- first_name: Maximilian L.
  full_name: Croci, Maximilian L.
  last_name: Croci
- first_name: Bo
  full_name: Li, Bo
  last_name: Li
- first_name: Pashmina
  full_name: Cameron, Pashmina
  last_name: Cameron
- first_name: Martin
  full_name: Jaggi, Martin
  last_name: Jaggi
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
- first_name: James
  full_name: Hensman, James
  last_name: Hensman
citation:
  ama: 'Ashkboos S, Mohtashami A, Croci ML, et al. QuaRot: Outlier-free 4-bit inference
    in rotated LLMs. In: <i>38th Conference on Neural Information Processing Systems</i>.
    Vol 37. Neural Information Processing Systems Foundation; 2024.'
  apa: 'Ashkboos, S., Mohtashami, A., Croci, M. L., Li, B., Cameron, P., Jaggi, M.,
    … Hensman, J. (2024). QuaRot: Outlier-free 4-bit inference in rotated LLMs. In
    <i>38th Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver,
    Canada: Neural Information Processing Systems Foundation.'
  chicago: 'Ashkboos, Saleh, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Pashmina
    Cameron, Martin Jaggi, Dan-Adrian Alistarh, Torsten Hoefler, and James Hensman.
    “QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.” In <i>38th Conference
    on Neural Information Processing Systems</i>, Vol. 37. Neural Information Processing
    Systems Foundation, 2024.'
  ieee: 'S. Ashkboos <i>et al.</i>, “QuaRot: Outlier-free 4-bit inference in rotated
    LLMs,” in <i>38th Conference on Neural Information Processing Systems</i>, Vancouver,
    Canada, 2024, vol. 37.'
  ista: 'Ashkboos S, Mohtashami A, Croci ML, Li B, Cameron P, Jaggi M, Alistarh D-A,
    Hoefler T, Hensman J. 2024. QuaRot: Outlier-free 4-bit inference in rotated LLMs.
    38th Conference on Neural Information Processing Systems. NeurIPS: Neural Information
    Processing Systems, Advances in Neural Information Processing Systems, vol. 37.'
  mla: 'Ashkboos, Saleh, et al. “QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.”
    <i>38th Conference on Neural Information Processing Systems</i>, vol. 37, Neural
    Information Processing Systems Foundation, 2024.'
  short: S. Ashkboos, A. Mohtashami, M.L. Croci, B. Li, P. Cameron, M. Jaggi, D.-A.
    Alistarh, T. Hoefler, J. Hensman, in:, 38th Conference on Neural Information Processing
    Systems, Neural Information Processing Systems Foundation, 2024.
conference:
  end_date: 2024-12-15
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-09
date_created: 2025-04-06T22:01:32Z
date_published: 2024-12-20T00:00:00Z
date_updated: 2025-05-14T11:33:12Z
day: '20'
department:
- _id: DaAl
external_id:
  arxiv:
  - '2404.00456'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2404.00456
month: '12'
oa: 1
oa_version: Preprint
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/spcl/QuaRot
scopus_import: '1'
status: public
title: 'QuaRot: Outlier-free 4-bit inference in rotated LLMs'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '19512'
abstract:
- lang: eng
  text: "Differential privacy with gradual expiration models the setting where data
    items\r\narrive in a stream and at a given time t the privacy loss guaranteed
    for a data item\r\nseen at time (t − d) is εg(d), where g is a monotonically non-decreasing
    function.\r\nWe study the fundamental continual (binary) counting problem where
    each data\r\nitem consists of a bit, and the algorithm needs to output at each
    time step the sum of\r\nall the bits streamed so far. For a stream of length T
    and privacy without expiration\r\ncontinual counting is possible with maximum
    (over all time steps) additive error\r\nO(log2\r\n(T)/ε) and the best known lower
    bound is Ω(log(T)/ε); closing this gap\r\nis a challenging open problem.\r\nWe
    show that the situation is very different for privacy with gradual expiration
    by\r\ngiving upper and lower bounds for a large set of expiration functions g.
    Specifically,\r\nour algorithm achieves an additive error of O(log(T)/ε) for a
    large set of privacy\r\nexpiration functions. We also give a lower bound that
    shows that if C is the additive\r\nerror of any ε-DP algorithm for this problem,
    then the product of C and the privacy\r\nexpiration function after 2C steps must
    be Ω(log(T)/ε). Our algorithm matches\r\nthis lower bound as its additive error
    is O(log(T)/ε), even when g(2C) = O(1).\r\nOur empirical evaluation shows that
    we achieve a slowly growing privacy loss\r\nwith significantly smaller empirical
    privacy loss for large values of d than a natural\r\nbaseline algorithm."
acknowledgement: 'Monika Henzinger: This project has received funding from the European
  Research Council (ERC) under the European Union’s Horizon 2020 research and innovation
  programme (Grant agreement No. 101019564) and the Austrian Science Fund (FWF) grant
  DOI 10.55776/Z422, grant DOI 10.55776/I5982, and grant DOI 10.55776/P33775 with
  additional funding from the netidee SCIENCE Stiftung, 2020–2024. Joel Daniel Andersson
  and Rasmus Pagh are affiliated with Basic Algorithms Research Copenhagen (BARC),
  supported by the VILLUM Foundation grant 16582, and are also supported by Providentia,
  a Data Science Distinguished Investigator grant from Novo Nordisk Fonden. Teresa
  Anna Steiner is supported by a research grant (VIL51463) from VILLUM FONDEN. This
  work was done while Teresa Anna Steiner was a Postdoc at the Technical University
  of Denmark. Jalaj Upadhyay’s research was funded by the Rutgers Decanal Grant no.
  302918 and an unrestricted gift from Google.'
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Joel Daniel
  full_name: Andersson, Joel Daniel
  last_name: Andersson
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: Rasmus
  full_name: Pagh, Rasmus
  last_name: Pagh
- first_name: Teresa Anna
  full_name: Steiner, Teresa Anna
  last_name: Steiner
- first_name: Jalaj
  full_name: Upadhyay, Jalaj
  last_name: Upadhyay
citation:
  ama: 'Andersson JD, Henzinger M, Pagh R, Steiner TA, Upadhyay J. Continual counting
    with gradual privacy expiration. In: <i>38th Conference on Neural Information
    Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation;
    2024.'
  apa: 'Andersson, J. D., Henzinger, M., Pagh, R., Steiner, T. A., &#38; Upadhyay,
    J. (2024). Continual counting with gradual privacy expiration. In <i>38th Conference
    on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural
    Information Processing Systems Foundation.'
  chicago: Andersson, Joel Daniel, Monika Henzinger, Rasmus Pagh, Teresa Anna Steiner,
    and Jalaj Upadhyay. “Continual Counting with Gradual Privacy Expiration.” In <i>38th
    Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information
    Processing Systems Foundation, 2024.
  ieee: J. D. Andersson, M. Henzinger, R. Pagh, T. A. Steiner, and J. Upadhyay, “Continual
    counting with gradual privacy expiration,” in <i>38th Conference on Neural Information
    Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Andersson JD, Henzinger M, Pagh R, Steiner TA, Upadhyay J. 2024. Continual
    counting with gradual privacy expiration. 38th Conference on Neural Information
    Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in
    Neural Information Processing Systems, vol. 37.'
  mla: Andersson, Joel Daniel, et al. “Continual Counting with Gradual Privacy Expiration.”
    <i>38th Conference on Neural Information Processing Systems</i>, vol. 37, Neural
    Information Processing Systems Foundation, 2024.
  short: J.D. Andersson, M. Henzinger, R. Pagh, T.A. Steiner, J. Upadhyay, in:, 38th
    Conference on Neural Information Processing Systems, Neural Information Processing
    Systems Foundation, 2024.
conference:
  end_date: 2024-12-15
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-09
corr_author: '1'
date_created: 2025-04-06T22:01:32Z
date_published: 2024-12-20T00:00:00Z
date_updated: 2025-05-14T11:33:22Z
day: '20'
department:
- _id: MoHe
ec_funded: 1
external_id:
  arxiv:
  - '2406.03802'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2406.03802
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: bd9ca328-d553-11ed-ba76-dc4f890cfe62
  call_identifier: H2020
  grant_number: '101019564'
  name: The design and evaluation of modern fully dynamic data structures
- _id: 34def286-11ca-11ed-8bc3-da5948e1613c
  grant_number: Z00422
  name: Efficient algorithms
- _id: bda196b2-d553-11ed-ba76-8e8ee6c21103
  grant_number: I05982
  name: Static and Dynamic Hierarchical Graph Decompositions
- _id: bd9e3a2e-d553-11ed-ba76-8aa684ce17fe
  grant_number: P33775
  name: Fast Algorithms for a Reactive Network Layer
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  issn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Continual counting with gradual privacy expiration
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
