---
OA_place: publisher
OA_type: gold
_id: '18875'
abstract:
- lang: eng
  text: Current state-of-the-art methods for differentially private model training
    are based on matrix factorization techniques. However, these methods suffer from
    high computational overhead because they require numerically solving a demanding
    optimization problem to determine an approximately optimal factorization prior
    to the actual model training. In this work, we present a new matrix factorization
    approach, BSR, which overcomes this computational bottleneck. By exploiting properties
    of the standard matrix square root, BSR allows to efficiently handle also large-scale
    problems. For the key scenario of stochastic gradient descent with momentum and
    weight decay, we even derive analytical expressions for BSR that render the computational
    overhead negligible. We prove bounds on the approximation quality that hold both
    in the centralized and in the federated learning setting. Our numerical experiments
    demonstrate that models trained using BSR perform on par with the best existing
    methods, while completely avoiding their computational overhead.
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Nikita
  full_name: Kalinin, Nikita
  id: 4b14526e-14d2-11ed-ba64-c14c9553d137
  last_name: Kalinin
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Kalinin N, Lampert C. Banded square root matrix factorization for differentially
    private model training. In: <i>38th Annual Conference on Neural Information Processing
    Systems</i>. Vol 37. Neural Information Processing Systems Foundation; 2024.'
  apa: 'Kalinin, N., &#38; Lampert, C. (2024). Banded square root matrix factorization
    for differentially private model training. In <i>38th Annual Conference on Neural
    Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural Information
    Processing Systems Foundation.'
  chicago: Kalinin, Nikita, and Christoph Lampert. “Banded Square Root Matrix Factorization
    for Differentially Private Model Training.” In <i>38th Annual Conference on Neural
    Information Processing Systems</i>, Vol. 37. Neural Information Processing Systems
    Foundation, 2024.
  ieee: N. Kalinin and C. Lampert, “Banded square root matrix factorization for differentially
    private model training,” in <i>38th Annual Conference on Neural Information Processing
    Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Kalinin N, Lampert C. 2024. Banded square root matrix factorization for differentially
    private model training. 38th Annual Conference on Neural Information Processing
    Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information
    Processing Systems, vol. 37.'
  mla: Kalinin, Nikita, and Christoph Lampert. “Banded Square Root Matrix Factorization
    for Differentially Private Model Training.” <i>38th Annual Conference on Neural
    Information Processing Systems</i>, vol. 37, Neural Information Processing Systems
    Foundation, 2024.
  short: N. Kalinin, C. Lampert, in:, 38th Annual Conference on Neural Information
    Processing Systems, Neural Information Processing Systems Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-24T17:58:16Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-05-14T11:34:20Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: ChLa
external_id:
  arxiv:
  - '2405.13763'
file:
- access_level: open_access
  checksum: a216cab8eddc1fe7840aede0e2c0d41e
  content_type: application/pdf
  creator: dernst
  date_created: 2025-01-27T09:52:15Z
  date_updated: 2025-01-27T09:52:15Z
  file_id: '18888'
  file_name: 2024_NeurIPS_Nikita.pdf
  file_size: 1144656
  relation: main_file
  success: 1
file_date_updated: 2025-01-27T09:52:15Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
publication: 38th Annual Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Banded square root matrix factorization for differentially private model training
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18890'
abstract:
- lang: eng
  text: Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the
    data representations in the final layers of Deep Neural Networks (DNNs). Though
    the phenomenon has been measured in a variety of settings, its emergence is typically
    explained via data-agnostic approaches, such as the unconstrained features model.
    In this work, we introduce a data-dependent setting where DNC forms due to feature
    learning through the average gradient outer product (AGOP). The AGOP is defined
    with respect to a learned predictor and is equal to the uncentered covariance
    matrix of its input-output gradients averaged over the training dataset. The Deep
    Recursive Feature Machine (Deep RFM) is a method that constructs a neural network
    by iteratively mapping the data with the AGOP and applying an untrained random
    feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard
    settings as a consequence of the projection with the AGOP matrix computed at each
    layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting
    and as a result of kernel learning. We then provide evidence that this mechanism
    holds for neural networks more generally. In particular, we show that the right
    singular vectors and values of the weights can be responsible for the majority
    of within-class variability collapse for DNNs trained in the feature learning
    regime. As observed in recent work, this singular structure is highly correlated
    with that of the AGOP.
acknowledgement: 'We acknowledge support from the National Science Foundation (NSF)
  and the Simons Foundation for the Collaboration on the Theoretical Foundations of
  Deep Learning through awards DMS-2031883 and #814639 as well as the TILOS institute
  (NSF CCF-2112665). This work used the programs (1) XSEDE (Extreme science and engineering
  discovery environment) which is supported by NSF grant numbers ACI-1548562, and
  (2) ACCESS (Advanced cyberinfrastructure coordination ecosystem: services & support)
  which is supported by NSF grants numbers #2138259, #2138286, #2138307, #2137603,
  and #2138296. Specifically, we used the resources from SDSC Expanse GPU compute
  nodes, and NCSA Delta system, via allocations TG-CIS220009. Marco Mondelli is supported
  by the 2019 Lopez-Loreta prize. We also acknowledge useful feedback from anonymous
  reviewers. '
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Daniel
  full_name: Beaglehole, Daniel
  last_name: Beaglehole
- first_name: Peter
  full_name: Súkeník, Peter
  id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
  last_name: Súkeník
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Mikhail
  full_name: Belkin, Mikhail
  last_name: Belkin
citation:
  ama: 'Beaglehole D, Súkeník P, Mondelli M, Belkin M. Average gradient outer product
    as a mechanism for deep neural collapse. In: <i>38th Annual Conference on Neural
    Information Processing Systems</i>. Vol 37. Neural Information Processing Systems
    Foundation; 2024.'
  apa: 'Beaglehole, D., Súkeník, P., Mondelli, M., &#38; Belkin, M. (2024). Average
    gradient outer product as a mechanism for deep neural collapse. In <i>38th Annual
    Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver,
    Canada: Neural Information Processing Systems Foundation.'
  chicago: Beaglehole, Daniel, Peter Súkeník, Marco Mondelli, and Mikhail Belkin.
    “Average Gradient Outer Product as a Mechanism for Deep Neural Collapse.” In <i>38th
    Annual Conference on Neural Information Processing Systems</i>, Vol. 37. Neural
    Information Processing Systems Foundation, 2024.
  ieee: D. Beaglehole, P. Súkeník, M. Mondelli, and M. Belkin, “Average gradient outer
    product as a mechanism for deep neural collapse,” in <i>38th Annual Conference
    on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Beaglehole D, Súkeník P, Mondelli M, Belkin M. 2024. Average gradient outer
    product as a mechanism for deep neural collapse. 38th Annual Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    Advances in Neural Information Processing Systems, vol. 37.'
  mla: Beaglehole, Daniel, et al. “Average Gradient Outer Product as a Mechanism for
    Deep Neural Collapse.” <i>38th Annual Conference on Neural Information Processing
    Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: D. Beaglehole, P. Súkeník, M. Mondelli, M. Belkin, in:, 38th Annual Conference
    on Neural Information Processing Systems, Neural Information Processing Systems
    Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
corr_author: '1'
date_created: 2025-01-27T11:11:40Z
date_published: 2024-12-01T00:00:00Z
date_updated: 2025-05-14T11:29:45Z
day: '01'
department:
- _id: GradSch
- _id: MaMo
external_id:
  arxiv:
  - '2402.13728'
intvolume: '        37'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/forum?id=lJ1jdl2K9k
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 38th Annual Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Average gradient outer product as a mechanism for deep neural collapse
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '18996'
abstract:
- lang: eng
  text: 'We consider the linear causal representation learning setting where we observe
    a linear mixing of d unknown latent factors, which follow a linear structural
    causal model. Recent work has shown that it is possible to recover the latent
    factors as well as the underlying structural causal model over them, up to permutation
    and scaling, provided that we have at least d environments, each of which corresponds
    to perfect interventions on a single latent node (factor). After this powerful
    result, a key open problem faced by the community has been to relax these conditions:
    allow for coarser than perfect single-node interventions, and allow for fewer
    than d of them, since the number of latent factors d could be very large. In this
    work, we consider precisely such a setting, where we allow a smaller than d number
    of environments, and also allow for very coarse interventions that can very coarsely
    \textit{change the entire causal graph over the latent factors}. On the flip side,
    we relax what we wish to extract to simply the \textit{list of nodes that have
    shifted between one or more environments}. We provide a surprising identifiability
    result that it is indeed possible, under some very mild standard assumptions,
    to identify the set of shifted nodes. Our identifiability proof moreover is a
    constructive one: we explicitly provide necessary and sufficient conditions for
    a node to be a shifted node, and show that we can check these conditions given
    observed data. Our algorithm lends itself very naturally to the sample setting
    where instead of just interventional distributions, we are provided datasets of
    samples from each of these distributions. We corroborate our results on both synthetic
    experiments as well as an interesting psychometric dataset. The code can be found
    at https://github.com/TianyuCodings/iLCS.'
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Tianyu
  full_name: Chen, Tianyu
  last_name: Chen
- first_name: Kevin
  full_name: Bello, Kevin
  last_name: Bello
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
- first_name: Bryon
  full_name: Aragam, Bryon
  last_name: Aragam
- first_name: Pradeep Kumar
  full_name: Ravikumar, Pradeep Kumar
  last_name: Ravikumar
citation:
  ama: 'Chen T, Bello K, Locatello F, Aragam B, Ravikumar PK. Identifying general
    mechanism shifts in linear causal representations. In: <i>38th Conference on Neural
    Information Processing Systems</i>. Vol 37. Neural Information Processing Systems
    Foundation; 2024.'
  apa: 'Chen, T., Bello, K., Locatello, F., Aragam, B., &#38; Ravikumar, P. K. (2024).
    Identifying general mechanism shifts in linear causal representations. In <i>38th
    Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver,
    Canada: Neural Information Processing Systems Foundation.'
  chicago: Chen, Tianyu, Kevin Bello, Francesco Locatello, Bryon Aragam, and Pradeep
    Kumar Ravikumar. “Identifying General Mechanism Shifts in Linear Causal Representations.”
    In <i>38th Conference on Neural Information Processing Systems</i>, Vol. 37. Neural
    Information Processing Systems Foundation, 2024.
  ieee: T. Chen, K. Bello, F. Locatello, B. Aragam, and P. K. Ravikumar, “Identifying
    general mechanism shifts in linear causal representations,” in <i>38th Conference
    on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.
  ista: 'Chen T, Bello K, Locatello F, Aragam B, Ravikumar PK. 2024. Identifying general
    mechanism shifts in linear causal representations. 38th Conference on Neural Information
    Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in
    Neural Information Processing Systems, vol. 37.'
  mla: Chen, Tianyu, et al. “Identifying General Mechanism Shifts in Linear Causal
    Representations.” <i>38th Conference on Neural Information Processing Systems</i>,
    vol. 37, Neural Information Processing Systems Foundation, 2024.
  short: T. Chen, K. Bello, F. Locatello, B. Aragam, P.K. Ravikumar, in:, 38th Conference
    on Neural Information Processing Systems, Neural Information Processing Systems
    Foundation, 2024.
conference:
  end_date: 2024-12-16
  location: Vancouver, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2024-12-16
date_created: 2025-02-04T13:09:34Z
date_published: 2024-09-25T00:00:00Z
date_updated: 2025-07-07T13:23:49Z
day: '25'
ddc:
- '000'
department:
- _id: FrLo
external_id:
  arxiv:
  - '2410.24059'
file:
- access_level: open_access
  checksum: 75c3091e70bd2916cd94afbf40a0c425
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-04T13:09:08Z
  date_updated: 2025-02-04T13:09:08Z
  file_id: '18997'
  file_name: 2024_NeurIPS_Chen.pdf
  file_size: 5659119
  relation: main_file
  success: 1
file_date_updated: 2025-02-04T13:09:08Z
has_accepted_license: '1'
intvolume: '        37'
language:
- iso: eng
month: '09'
oa: 1
oa_version: Published Version
publication: 38th Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Identifying general mechanism shifts in linear causal representations
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 37
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '12537'
abstract:
- lang: eng
  text: 'The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide
    memorization, optimization and generalization guarantees in deep neural networks.
    A line of work has studied the NTK spectrum for two-layer and deep networks with
    at least a layer with Ω(N) neurons, N being the number of training samples. Furthermore,
    there is increasing evidence suggesting that deep networks with sub-linear layer
    widths are powerful memorizers and optimizers, as long as the number of parameters
    exceeds the number of samples. Thus, a natural open question is whether the NTK
    is well conditioned in such a challenging sub-linear setup. In this paper, we
    answer this question in the affirmative. Our key technical contribution is a lower
    bound on the smallest NTK eigenvalue for deep networks with the minimum possible
    over-parameterization: the number of parameters is roughly Ω(N) and, hence, the
    number of neurons is as little as Ω(N−−√). To showcase the applicability of our
    NTK bounds, we provide two results concerning memorization capacity and optimization
    guarantees for gradient descent training.'
acknowledgement: "The authors were partially supported by the 2019 Lopez-Loreta prize,
  and they would like to thank\r\nQuynh Nguyen, Mahdi Soltanolkotabi and Adel Javanmard
  for helpful discussions.\r\n"
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Simone
  full_name: Bombari, Simone
  id: ca726dda-de17-11ea-bc14-f9da834f63aa
  last_name: Bombari
- first_name: Mohammad Hossein
  full_name: Amani, Mohammad Hossein
  last_name: Amani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Bombari S, Amani MH, Mondelli M. Memorization and optimization in deep neural
    networks with minimum over-parameterization. In: <i>36th Conference on Neural
    Information Processing Systems</i>. Vol 35. Neural Information Processing Systems
    Foundation; 2022:7628-7640.'
  apa: 'Bombari, S., Amani, M. H., &#38; Mondelli, M. (2022). Memorization and optimization
    in deep neural networks with minimum over-parameterization. In <i>36th Conference
    on Neural Information Processing Systems</i> (Vol. 35, pp. 7628–7640). New Orleans,
    LA, United States: Neural Information Processing Systems Foundation.'
  chicago: Bombari, Simone, Mohammad Hossein Amani, and Marco Mondelli. “Memorization
    and Optimization in Deep Neural Networks with Minimum Over-Parameterization.”
    In <i>36th Conference on Neural Information Processing Systems</i>, 35:7628–40.
    Neural Information Processing Systems Foundation, 2022.
  ieee: S. Bombari, M. H. Amani, and M. Mondelli, “Memorization and optimization in
    deep neural networks with minimum over-parameterization,” in <i>36th Conference
    on Neural Information Processing Systems</i>, New Orleans, LA, United States,
    2022, vol. 35, pp. 7628–7640.
  ista: 'Bombari S, Amani MH, Mondelli M. 2022. Memorization and optimization in deep
    neural networks with minimum over-parameterization. 36th Conference on Neural
    Information Processing Systems. NeurIPS: Neural Information Processing Systems,
    Advances in Neural Information Processing Systems, vol. 35, 7628–7640.'
  mla: Bombari, Simone, et al. “Memorization and Optimization in Deep Neural Networks
    with Minimum Over-Parameterization.” <i>36th Conference on Neural Information
    Processing Systems</i>, vol. 35, Neural Information Processing Systems Foundation,
    2022, pp. 7628–40.
  short: S. Bombari, M.H. Amani, M. Mondelli, in:, 36th Conference on Neural Information
    Processing Systems, Neural Information Processing Systems Foundation, 2022, pp.
    7628–7640.
conference:
  end_date: 2022-12-09
  location: New Orleans, LA, United States
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2022-11-28
corr_author: '1'
date_created: 2023-02-10T13:46:37Z
date_published: 2022-07-24T00:00:00Z
date_updated: 2025-05-14T11:28:22Z
day: '24'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2205.10217'
intvolume: '        35'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: ' https://doi.org/10.48550/arXiv.2205.10217'
month: '07'
oa: 1
oa_version: Preprint
page: 7628-7640
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 36th Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
  isbn:
  - '9781713871088'
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
status: public
title: Memorization and optimization in deep neural networks with minimum over-parameterization
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 35
year: '2022'
...
---
_id: '14326'
abstract:
- lang: eng
  text: "Learning object-centric representations of complex scenes is a promising
    step towards enabling efficient abstract reasoning from low-level perceptual features.
    Yet, most deep learning approaches learn distributed representations that do not
    capture the compositional properties of natural scenes. In this paper, we present
    the Slot Attention module, an architectural component that interfaces with perceptual
    representations such as the output of a convolutional neural network and produces
    a set of task-dependent abstract representations which we call slots. These slots
    are exchangeable and can bind to any object in the input by specializing through
    a competitive procedure over multiple rounds of attention. We empirically demonstrate
    that Slot Attention can extract object-centric representations that enable generalization
    to unseen compositions when trained on unsupervised object discovery and supervised
    property prediction tasks.\r\n\r\n"
alternative_title:
- Advances in Neural Information Processing Systems
article_processing_charge: No
arxiv: 1
author:
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
- first_name: Dirk
  full_name: Weissenborn, Dirk
  last_name: Weissenborn
- first_name: Thomas
  full_name: Unterthiner, Thomas
  last_name: Unterthiner
- first_name: Aravindh
  full_name: Mahendran, Aravindh
  last_name: Mahendran
- first_name: Georg
  full_name: Heigold, Georg
  last_name: Heigold
- first_name: Jakob
  full_name: Uszkoreit, Jakob
  last_name: Uszkoreit
- first_name: Alexey
  full_name: Dosovitskiy, Alexey
  last_name: Dosovitskiy
- first_name: Thomas
  full_name: Kipf, Thomas
  last_name: Kipf
citation:
  ama: 'Locatello F, Weissenborn D, Unterthiner T, et al. Object-centric learning
    with slot attention. In: <i>34th International Conference on Neural Information
    Processing Systems</i>. Vol 33. Neural Information Processing Systems Foundation;
    2020:11525-11538.'
  apa: 'Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G.,
    Uszkoreit, J., … Kipf, T. (2020). Object-centric learning with slot attention.
    In <i>34th International Conference on Neural Information Processing Systems</i>
    (Vol. 33, pp. 11525–11538). Virtual: Neural Information Processing Systems Foundation.'
  chicago: Locatello, Francesco, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran,
    Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. “Object-Centric
    Learning with Slot Attention.” In <i>34th International Conference on Neural Information
    Processing Systems</i>, 33:11525–38. Neural Information Processing Systems Foundation,
    2020.
  ieee: F. Locatello <i>et al.</i>, “Object-centric learning with slot attention,”
    in <i>34th International Conference on Neural Information Processing Systems</i>,
    Virtual, 2020, vol. 33, pp. 11525–11538.
  ista: 'Locatello F, Weissenborn D, Unterthiner T, Mahendran A, Heigold G, Uszkoreit
    J, Dosovitskiy A, Kipf T. 2020. Object-centric learning with slot attention. 34th
    International Conference on Neural Information Processing Systems. NeurIPS: Neural
    Information Processing Systems, Advances in Neural Information Processing Systems,
    vol. 33, 11525–11538.'
  mla: Locatello, Francesco, et al. “Object-Centric Learning with Slot Attention.”
    <i>34th International Conference on Neural Information Processing Systems</i>,
    vol. 33, Neural Information Processing Systems Foundation, 2020, pp. 11525–38.
  short: F. Locatello, D. Weissenborn, T. Unterthiner, A. Mahendran, G. Heigold, J.
    Uszkoreit, A. Dosovitskiy, T. Kipf, in:, 34th International Conference on Neural
    Information Processing Systems, Neural Information Processing Systems Foundation,
    2020, pp. 11525–11538.
conference:
  end_date: 2020-12-12
  location: Virtual
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2020-12-06
date_created: 2023-09-13T12:03:46Z
date_published: 2020-12-20T00:00:00Z
date_updated: 2025-07-10T11:50:47Z
day: '20'
department:
- _id: FrLo
extern: '1'
external_id:
  arxiv:
  - '2006.15055'
intvolume: '        33'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2006.15055
month: '12'
oa: 1
oa_version: Preprint
page: 11525-11538
publication: 34th International Conference on Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
  isbn:
  - '9781713829546'
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
status: public
title: Object-centric learning with slot attention
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 33
year: '2020'
...
---
_id: '14202'
abstract:
- lang: eng
  text: "Approximating a probability density in a tractable manner is a central task\r\nin
    Bayesian statistics. Variational Inference (VI) is a popular technique that\r\nachieves
    tractability by choosing a relatively simple variational family.\r\nBorrowing
    ideas from the classic boosting framework, recent approaches attempt\r\nto \\emph{boost}
    VI by replacing the selection of a single density with a\r\ngreedily constructed
    mixture of densities. In order to guarantee convergence,\r\nprevious works impose
    stringent assumptions that require significant effort for\r\npractitioners. Specifically,
    they require a custom implementation of the greedy\r\nstep (called the LMO) for
    every probabilistic model with respect to an\r\nunnatural variational family of
    truncated distributions. Our work fixes these\r\nissues with novel theoretical
    and algorithmic insights. On the theoretical\r\nside, we show that boosting VI
    satisfies a relaxed smoothness assumption which\r\nis sufficient for the convergence
    of the functional Frank-Wolfe (FW) algorithm.\r\nFurthermore, we rephrase the
    LMO problem and propose to maximize the Residual\r\nELBO (RELBO) which replaces
    the standard ELBO optimization in VI. These\r\ntheoretical enhancements allow
    for black box implementation of the boosting\r\nsubroutine. Finally, we present
    a stopping criterion drawn from the duality gap\r\nin the classic FW analyses
    and exhaustive experiments to illustrate the\r\nusefulness of our theoretical
    and algorithmic contributions."
article_processing_charge: No
arxiv: 1
author:
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
- first_name: Gideon
  full_name: Dresdner, Gideon
  last_name: Dresdner
- first_name: Rajiv
  full_name: Khanna, Rajiv
  last_name: Khanna
- first_name: Isabel
  full_name: Valera, Isabel
  last_name: Valera
- first_name: Gunnar
  full_name: Rätsch, Gunnar
  last_name: Rätsch
citation:
  ama: 'Locatello F, Dresdner G, Khanna R, Valera I, Rätsch G. Boosting black box
    variational inference. In: <i>Advances in Neural Information Processing Systems</i>.
    Vol 31. Neural Information Processing Systems Foundation; 2018.'
  apa: 'Locatello, F., Dresdner, G., Khanna, R., Valera, I., &#38; Rätsch, G. (2018).
    Boosting black box variational inference. In <i>Advances in Neural Information
    Processing Systems</i> (Vol. 31). Montreal, Canada: Neural Information Processing
    Systems Foundation.'
  chicago: Locatello, Francesco, Gideon Dresdner, Rajiv Khanna, Isabel Valera, and
    Gunnar Rätsch. “Boosting Black Box Variational Inference.” In <i>Advances in Neural
    Information Processing Systems</i>, Vol. 31. Neural Information Processing Systems
    Foundation, 2018.
  ieee: F. Locatello, G. Dresdner, R. Khanna, I. Valera, and G. Rätsch, “Boosting
    black box variational inference,” in <i>Advances in Neural Information Processing
    Systems</i>, Montreal, Canada, 2018, vol. 31.
  ista: 'Locatello F, Dresdner G, Khanna R, Valera I, Rätsch G. 2018. Boosting black
    box variational inference. Advances in Neural Information Processing Systems.
    NeurIPS: Neural Information Processing Systems vol. 31.'
  mla: Locatello, Francesco, et al. “Boosting Black Box Variational Inference.” <i>Advances
    in Neural Information Processing Systems</i>, vol. 31, Neural Information Processing
    Systems Foundation, 2018.
  short: F. Locatello, G. Dresdner, R. Khanna, I. Valera, G. Rätsch, in:, Advances
    in Neural Information Processing Systems, Neural Information Processing Systems
    Foundation, 2018.
conference:
  end_date: 2018-12-08
  location: Montreal, Canada
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2018-12-03
date_created: 2023-08-22T14:15:40Z
date_published: 2018-06-06T00:00:00Z
date_updated: 2023-09-13T07:38:24Z
day: '06'
department:
- _id: FrLo
extern: '1'
external_id:
  arxiv:
  - '1806.02185'
intvolume: '        31'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/1806.02185
month: '06'
oa: 1
oa_version: Preprint
publication: Advances in Neural Information Processing Systems
publication_identifier:
  eissn:
  - 1049-5258
  isbn:
  - '9781510884472'
publication_status: published
publisher: Neural Information Processing Systems Foundation
quality_controlled: '1'
scopus_import: '1'
status: public
title: Boosting black box variational inference
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 31
year: '2018'
...
