---
OA_place: publisher
_id: '21198'
abstract:
- lang: eng
  text: "In recent years there has been a massive increase in the amount of data generated
    in a\r\ndecentralized manner. Ever more powerful edge devices, such as smartphones,
    have become\r\nubiquitous in most societies on earth. Through text typed, photos
    taken and apps used,\r\nthese devices, which we refer to as clients, generate
    enormous amounts of high quality and\r\ncomplex data. Moreover, the nature of
    these devices means the data they generate is often\r\nsensitive and privacy concerns
    prevent it being gathered and stored in a central location. This\r\npresents a
    challenge to the modern machine learning paradigm that requires central access\r\nto
    large amounts of data. Federated learning (FL) has emerged as one of the answers
    to\r\nthis problem. Rather than bringing the data to the model, FL sends the model
    to the data.\r\nModel training takes place on device, with periodically synchronized
    updates, allowing data to\r\nremain locally stored. While this approach offers
    significant privacy advantages it comes with\r\nits own set of unique challenges.
    These include: data heterogeneity, the notion that different\r\ndevices generate
    data in distinct ways which can negatively impact training dynamics; systems\r\nheterogeneity,
    meaning that different devices may have differing hardware specifications; high\r\ncommunication
    costs, which are induced by the repeated transferring of models over the\r\nnetwork
    and low device computational power, which limits the use of larger models on device.\r\nIn
    this thesis we present a range of methods for federated learning. We focus primarily
    on\r\nthe challenge of data heterogeneity, though the methods presented are designed
    to be well\r\nadapted to the other challenges of a federated setting, such as
    the constraints of limited\r\ncompute and communication overhead. We first present
    a method for explicitly modeling client\r\ndata heterogeneity. The approach formulates
    clients as samples from a certain probability\r\ndistribution and infers the parameters
    of this distribution from the available training clients.\r\nThis learned distribution
    then represents the heterogeneity present among the clients and can\r\nbe sampled
    from in order to create new simulated clients that are similar to the real clients
    we\r\nhave observed so far. Following this we present two methods for directly
    dealing with data\r\nheterogeneity through personalization. Highly heterogeneous
    client data distributions can mean\r\nthat learning a single global model becomes
    suboptimal, and some form of personalization of\r\nmodels to each individual client
    is required. Our approaches are based around hypernetworks,\r\nwhich we use to
    generate personalized model parameters without the need for additional\r\ntraining
    or finetuning. In the first approach we focus on generating full parameterizations
    of\r\nclient models using learned embeddings of client data and labels, with a
    hypernetwork located\r\non the central server. In the second approach we address
    the more challenging scenario where\r\nwe want to generate a personalized model
    for a client without any label information. The\r\nhypernetwork is trained to
    generate a low dimensional representation of a client’s personalized\r\nmodel
    parameters, allowing it to be transferred to and run on the client devices. In
    our final\r\npresented method, we change our focus and rather than aim to directly
    address the challenge\r\nof data heterogeneity, we instead ensure we are unaffected
    by it. This is done in the context\r\nof k-means clustering and we present a method
    for federated clustering with a focus on added\r\nprivacy guarantees."
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "This research was funded in part by the Austrian Science Fund (FWF)\r\n[10.55776/COE12].
  Furthermore, the candidate acknowledges the support from the Scientific\r\nService
  Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp)."
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Jonathan A
  full_name: Scott, Jonathan A
  id: e499926b-f6e0-11ea-865d-9c63db0031e8
  last_name: Scott
citation:
  ama: Scott JA. Data heterogeneity and personalization in federated learning. 2026.
    doi:<a href="https://doi.org/10.15479/AT-ISTA-21198">10.15479/AT-ISTA-21198</a>
  apa: Scott, J. A. (2026). <i>Data heterogeneity and personalization in federated
    learning</i>. Institute of Science and Technology Austria. <a href="https://doi.org/10.15479/AT-ISTA-21198">https://doi.org/10.15479/AT-ISTA-21198</a>
  chicago: Scott, Jonathan A. “Data Heterogeneity and Personalization in Federated
    Learning.” Institute of Science and Technology Austria, 2026. <a href="https://doi.org/10.15479/AT-ISTA-21198">https://doi.org/10.15479/AT-ISTA-21198</a>.
  ieee: J. A. Scott, “Data heterogeneity and personalization in federated learning,”
    Institute of Science and Technology Austria, 2026.
  ista: Scott JA. 2026. Data heterogeneity and personalization in federated learning.
    Institute of Science and Technology Austria.
  mla: Scott, Jonathan A. <i>Data Heterogeneity and Personalization in Federated Learning</i>.
    Institute of Science and Technology Austria, 2026, doi:<a href="https://doi.org/10.15479/AT-ISTA-21198">10.15479/AT-ISTA-21198</a>.
  short: J.A. Scott, Data Heterogeneity and Personalization in Federated Learning,
    Institute of Science and Technology Austria, 2026.
corr_author: '1'
date_created: 2026-02-09T14:59:53Z
date_published: 2026-02-09T00:00:00Z
date_updated: 2026-04-07T11:46:11Z
day: '09'
ddc:
- '005'
degree_awarded: PhD
department:
- _id: GradSch
- _id: ChLa
doi: 10.15479/AT-ISTA-21198
file:
- access_level: closed
  checksum: 121c1d968bd86f3630aa7e81d5bbbcb0
  content_type: application/zip
  creator: jscott
  date_created: 2026-02-17T11:46:22Z
  date_updated: 2026-02-17T11:46:22Z
  file_id: '21298'
  file_name: 2026_Scott_Jonathan_Thesis_Source.zip
  file_size: 272379252
  relation: source_file
- access_level: open_access
  checksum: 6e3e08ba474bbee8511cc8a839ab2077
  content_type: application/pdf
  creator: jscott
  date_created: 2026-02-27T10:25:41Z
  date_updated: 2026-02-27T10:25:41Z
  file_id: '21366'
  file_name: 2026_Jonathan_Scott_Thesis.pdf
  file_size: 15220298
  relation: main_file
  success: 1
file_date_updated: 2026-02-27T10:25:41Z
has_accepted_license: '1'
language:
- iso: eng
month: '02'
oa: 1
oa_version: Published Version
page: '158'
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '20819'
    relation: part_of_dissertation
    status: public
  - id: '17411'
    relation: part_of_dissertation
    status: public
  - id: '18120'
    relation: part_of_dissertation
    status: public
  - id: '21207'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
title: Data heterogeneity and personalization in federated learning
type: dissertation
user_id: ba8df636-2132-11f1-aed0-ed93e2281fdd
year: '2026'
...
---
OA_place: publisher
OA_type: gold
_id: '20819'
abstract:
- lang: eng
  text: "Clustering is a cornerstone of data analysis that is particularly suited
    to identifying coherent subgroups or substructures in unlabeled data, as are generated
    continuously in large amounts these days. However, in many cases traditional clustering
    methods are not applicable, because data are increasingly being produced and stored
    in a distributed way, e.g. on edge devices, and privacy concerns prevent it from
    being transferred to a central server. To address this challenge, we present FedDP-KMeans,
    a new algorithm for \r\n-means clustering that is fully-federated as well as differentially
    private. Our approach leverages (potentially small and out-of-distribution) server-side
    data to overcome the primary challenge of differentially private clustering methods:
    the need for a good initialization. Combining our initialization with a simple
    federated DP-Lloyds algorithm we obtain an algorithm that achieves excellent results
    on synthetic and real-world benchmark tasks. We also provide a theoretical analysis
    of our method that provides bounds on the convergence speed and cluster identification
    success."
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "This research was funded in part by the Austrian Science Fund (FWF)
  [10.55776/COE12] and supported by the Scientific Service Units (SSU) of ISTA through
  resources provided by Scientific Computing (SciComp).\r\n"
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Jonathan A
  full_name: Scott, Jonathan A
  id: e499926b-f6e0-11ea-865d-9c63db0031e8
  last_name: Scott
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: David
  full_name: Saulpic, David
  id: f8e48cf0-b0ff-11ed-b0e9-b4c35598f964
  last_name: Saulpic
citation:
  ama: 'Scott JA, Lampert C, Saulpic D. Differentially private federated k-means clustering
    with server-side data. In: <i>42nd International Conference on Machine Learning</i>.
    Vol 267. ML Research Press; 2025:53757-53790.'
  apa: 'Scott, J. A., Lampert, C., &#38; Saulpic, D. (2025). Differentially private
    federated k-means clustering with server-side data. In <i>42nd International Conference
    on Machine Learning</i> (Vol. 267, pp. 53757–53790). Vancouver, Canada: ML Research
    Press.'
  chicago: Scott, Jonathan A, Christoph Lampert, and David Saulpic. “Differentially
    Private Federated K-Means Clustering with Server-Side Data.” In <i>42nd International
    Conference on Machine Learning</i>, 267:53757–90. ML Research Press, 2025.
  ieee: J. A. Scott, C. Lampert, and D. Saulpic, “Differentially private federated
    k-means clustering with server-side data,” in <i>42nd International Conference
    on Machine Learning</i>, Vancouver, Canada, 2025, vol. 267, pp. 53757–53790.
  ista: 'Scott JA, Lampert C, Saulpic D. 2025. Differentially private federated k-means
    clustering with server-side data. 42nd International Conference on Machine Learning.
    ICML: International Conference on Machine Learning, PMLR, vol. 267, 53757–53790.'
  mla: Scott, Jonathan A., et al. “Differentially Private Federated K-Means Clustering
    with Server-Side Data.” <i>42nd International Conference on Machine Learning</i>,
    vol. 267, ML Research Press, 2025, pp. 53757–90.
  short: J.A. Scott, C. Lampert, D. Saulpic, in:, 42nd International Conference on
    Machine Learning, ML Research Press, 2025, pp. 53757–53790.
conference:
  end_date: 2025-07-19
  location: Vancouver, Canada
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2025-07-13
corr_author: '1'
date_created: 2025-12-14T23:02:05Z
date_published: 2025-05-01T00:00:00Z
date_updated: 2026-04-07T11:46:11Z
day: '01'
ddc:
- '000'
department:
- _id: ChLa
- _id: MoHe
external_id:
  arxiv:
  - '2506.05408'
file:
- access_level: open_access
  checksum: 815b32b463023ca21e569c2158745c15
  content_type: application/pdf
  creator: dernst
  date_created: 2025-12-16T12:38:29Z
  date_updated: 2025-12-16T12:38:29Z
  file_id: '20829'
  file_name: 2025_ICML_Scott.pdf
  file_size: 746612
  relation: main_file
  success: 1
file_date_updated: 2025-12-16T12:38:29Z
has_accepted_license: '1'
intvolume: '       267'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: 53757-53790
publication: 42nd International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '21198'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: Differentially private federated k-means clustering with server-side data
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 267
year: '2025'
...
---
OA_place: repository
_id: '21207'
abstract:
- lang: eng
  text: Personalized federated learning has emerged as a popular approach to training
    on devices holding statistically heterogeneous data, known as clients. However,
    most existing approaches require a client to have labeled data for training or
    finetuning in order to obtain their own personalized model. In this paper we address
    this by proposing FLowDUP, a novel method that is able to generate a personalized
    model using only a forward pass with unlabeled data. The generated model parameters
    reside in a low-dimensional subspace, enabling efficient communication and computation.
    FLowDUP's learning objective is theoretically motivated by our new transductive
    multi-task PAC-Bayesian generalization bound, that provides performance guarantees
    for unlabeled clients. The objective is structured in such a way that it allows
    both clients with labeled data and clients with only unlabeled data to contribute
    to the training process. To supplement our theoretical results we carry out a
    thorough experimental evaluation of FLowDUP, demonstrating strong empirical performance
    on a range of datasets with differing sorts of statistically heterogeneous clients.
    Through numerous ablation studies, we test the efficacy of the individual components
    of the method.
article_processing_charge: No
author:
- first_name: Hossein
  full_name: Zakerinia, Hossein
  id: 653bd8b6-f394-11eb-9cf6-c0bbf6cd78d4
  last_name: Zakerinia
  orcid: 0009-0007-3977-6462
- first_name: Jonathan A
  full_name: Scott, Jonathan A
  id: e499926b-f6e0-11ea-865d-9c63db0031e8
  last_name: Scott
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Zakerinia H, Scott JA, Lampert C. Federated learning with unlabeled clients:
    Personalization can happen in low dimensions. <i>arXiv</i>. doi:<a href="https://doi.org/10.48550/ARXIV.2505.15579">10.48550/ARXIV.2505.15579</a>'
  apa: 'Zakerinia, H., Scott, J. A., &#38; Lampert, C. (n.d.). Federated learning
    with unlabeled clients: Personalization can happen in low dimensions. <i>arXiv</i>.
    <a href="https://doi.org/10.48550/ARXIV.2505.15579">https://doi.org/10.48550/ARXIV.2505.15579</a>'
  chicago: 'Zakerinia, Hossein, Jonathan A Scott, and Christoph Lampert. “Federated
    Learning with Unlabeled Clients: Personalization Can Happen in Low Dimensions.”
    <i>ArXiv</i>, n.d. <a href="https://doi.org/10.48550/ARXIV.2505.15579">https://doi.org/10.48550/ARXIV.2505.15579</a>.'
  ieee: 'H. Zakerinia, J. A. Scott, and C. Lampert, “Federated learning with unlabeled
    clients: Personalization can happen in low dimensions,” <i>arXiv</i>. .'
  ista: 'Zakerinia H, Scott JA, Lampert C. Federated learning with unlabeled clients:
    Personalization can happen in low dimensions. arXiv, <a href="https://doi.org/10.48550/ARXIV.2505.15579">10.48550/ARXIV.2505.15579</a>.'
  mla: 'Zakerinia, Hossein, et al. “Federated Learning with Unlabeled Clients: Personalization
    Can Happen in Low Dimensions.” <i>ArXiv</i>, doi:<a href="https://doi.org/10.48550/ARXIV.2505.15579">10.48550/ARXIV.2505.15579</a>.'
  short: H. Zakerinia, J.A. Scott, C. Lampert, ArXiv (n.d.).
corr_author: '1'
date_created: 2026-02-10T08:20:59Z
date_published: 2025-05-21T00:00:00Z
date_updated: 2026-04-07T11:46:11Z
day: '21'
department:
- _id: ChLa
doi: 10.48550/ARXIV.2505.15579
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2505.15579
month: '05'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: draft
related_material:
  record:
  - id: '21198'
    relation: dissertation_contains
    status: public
status: public
title: 'Federated learning with unlabeled clients: Personalization can happen in low
  dimensions'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: preprint
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2025'
...
---
_id: '17411'
abstract:
- lang: eng
  text: "We present PeFLL, a new personalized federated learning algorithm that improves\r\nover
    the state-of-the-art in three aspects: 1) it produces more accurate models,\r\nespecially
    in the low-data regime, and not only for clients present during its\r\ntraining
    phase, but also for any that may emerge in the future; 2) it reduces the\r\namount
    of on-client computation and client-server communication by providing\r\nfuture
    clients with ready-to-use personalized models that require no additional\r\nfinetuning
    or optimization; 3) it comes with theoretical guarantees that establish\r\ngeneralization
    from the observed clients to future ones.\r\nAt the core of PeFLL lies a learning-to-learn
    approach that jointly trains an\r\nembedding network and a hypernetwork. The embedding
    network is used to\r\nrepresent clients in a latent descriptor space in a way
    that reflects their similarity\r\nto each other. The hypernetwork takes as input
    such descriptors and outputs the\r\nparameters of fully personalized client models.
    In combination, both networks\r\nconstitute a learning algorithm that achieves
    state-of-the-art performance in several\r\npersonalized federated learning benchmarks"
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "This research was supported by the Scientific Service Units (SSU)
  of ISTA through resources provided by Scientific Computing (SciComp).\r\n"
article_processing_charge: No
arxiv: 1
author:
- first_name: Jonathan A
  full_name: Scott, Jonathan A
  id: e499926b-f6e0-11ea-865d-9c63db0031e8
  last_name: Scott
- first_name: Hossein
  full_name: Zakerinia, Hossein
  id: 653bd8b6-f394-11eb-9cf6-c0bbf6cd78d4
  last_name: Zakerinia
  orcid: 0009-0007-3977-6462
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Scott JA, Zakerinia H, Lampert C. PEFLL: Personalized federated learning by
    learning to learn. In: <i>12th International Conference on Learning Representations</i>.
    OpenReview; 2024.'
  apa: 'Scott, J. A., Zakerinia, H., &#38; Lampert, C. (2024). PEFLL: Personalized
    federated learning by learning to learn. In <i>12th International Conference on
    Learning Representations</i>. Vienna, Austria: OpenReview.'
  chicago: 'Scott, Jonathan A, Hossein Zakerinia, and Christoph Lampert. “PEFLL: Personalized
    Federated Learning by Learning to Learn.” In <i>12th International Conference
    on Learning Representations</i>. OpenReview, 2024.'
  ieee: 'J. A. Scott, H. Zakerinia, and C. Lampert, “PEFLL: Personalized federated
    learning by learning to learn,” in <i>12th International Conference on Learning
    Representations</i>, Vienna, Austria, 2024.'
  ista: 'Scott JA, Zakerinia H, Lampert C. 2024. PEFLL: Personalized federated learning
    by learning to learn. 12th International Conference on Learning Representations.
    ICLR: International Conference on Learning Representations.'
  mla: 'Scott, Jonathan A., et al. “PEFLL: Personalized Federated Learning by Learning
    to Learn.” <i>12th International Conference on Learning Representations</i>, OpenReview,
    2024.'
  short: J.A. Scott, H. Zakerinia, C. Lampert, in:, 12th International Conference
    on Learning Representations, OpenReview, 2024.
conference:
  end_date: 2024-03-07
  location: Vienna, Austria
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2024-03-07
corr_author: '1'
date_created: 2024-08-11T22:01:12Z
date_published: 2024-03-07T00:00:00Z
date_updated: 2026-04-07T11:46:11Z
day: '07'
ddc:
- '000'
department:
- _id: ChLa
external_id:
  arxiv:
  - '2306.05515'
file:
- access_level: open_access
  checksum: 81b7ea2e667adaf9c7a7b6b376b1f251
  content_type: application/pdf
  creator: dernst
  date_created: 2024-08-12T07:38:06Z
  date_updated: 2024-08-12T07:38:06Z
  file_id: '17415'
  file_name: 2024_ICLR_Scott.pdf
  file_size: 1029219
  relation: main_file
  success: 1
file_date_updated: 2024-08-12T07:38:06Z
has_accepted_license: '1'
language:
- iso: eng
month: '03'
oa: 1
oa_version: Published Version
publication: 12th International Conference on Learning Representations
publication_status: published
publisher: OpenReview
quality_controlled: '1'
related_material:
  record:
  - id: '21198'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: 'PEFLL: Personalized federated learning by learning to learn'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
_id: '18120'
abstract:
- lang: eng
  text: In practice, training using federated learning can be orders of magnitude
    slower than standard centralized training. This severely limits the amount of
    experimentation and tuning that can be done, making it challenging to obtain good
    performance on a given task. Server-side proxy data can be used to run training
    simulations, for instance for hyperparameter tuning. This can greatly speed up
    the training pipeline by reducing the number of tuning runs to be performed overall
    on the true clients. However, it is challenging to ensure that these simulations
    accurately reflect the dynamics of the real federated training. In particular,
    the proxy data used for simulations often comes as a single centralized dataset
    without a partition into distinct clients, and partitioning this data in a naive
    way can lead to simulations that poorly reflect real federated training. In this
    paper we address the challenge of how to partition centralized data in a way that
    reflects the statistical heterogeneity of the true federated clients. We propose
    a fully federated, theoretically justified, algorithm that efficiently learns
    the distribution of the true clients and observe improved server-side simulations
    when using the inferred distribution to create simulated clients from the centralized
    data.
acknowledgement: 'We would like to thank: Mona Chitnis and everyone in the Private
  Federated Learning team at Apple for their help and support throughout the entire
  project; Audra McMillan, Martin Pelikan, Anosh Raj and Barry Theobold for feedback
  on the initial versions of the paper; and Christoph Lampert for valuable feedback
  on the paper structure and suggestions for additional experiments.'
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Jonathan A
  full_name: Scott, Jonathan A
  id: e499926b-f6e0-11ea-865d-9c63db0031e8
  last_name: Scott
- first_name: Áine
  full_name: Cahill, Áine
  last_name: Cahill
citation:
  ama: 'Scott JA, Cahill Á. Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials.
    In: <i>Proceedings of the 41st International Conference on Machine Learning</i>.
    Vol 235. ML Research Press; 2024:44012-44037.'
  apa: 'Scott, J. A., &#38; Cahill, Á. (2024). Improved modelling of federated datasets
    using mixtures-of-Dirichlet-multinomials. In <i>Proceedings of the 41st International
    Conference on Machine Learning</i> (Vol. 235, pp. 44012–44037). Vienna, Austria:
    ML Research Press.'
  chicago: Scott, Jonathan A, and Áine Cahill. “Improved Modelling of Federated Datasets
    Using Mixtures-of-Dirichlet-Multinomials.” In <i>Proceedings of the 41st International
    Conference on Machine Learning</i>, 235:44012–37. ML Research Press, 2024.
  ieee: J. A. Scott and Á. Cahill, “Improved modelling of federated datasets using
    mixtures-of-Dirichlet-multinomials,” in <i>Proceedings of the 41st International
    Conference on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 44012–44037.
  ista: 'Scott JA, Cahill Á. 2024. Improved modelling of federated datasets using
    mixtures-of-Dirichlet-multinomials. Proceedings of the 41st International Conference
    on Machine Learning. ICML: International Conference on Machine Learning, PMLR,
    vol. 235, 44012–44037.'
  mla: Scott, Jonathan A., and Áine Cahill. “Improved Modelling of Federated Datasets
    Using Mixtures-of-Dirichlet-Multinomials.” <i>Proceedings of the 41st International
    Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 44012–37.
  short: J.A. Scott, Á. Cahill, in:, Proceedings of the 41st International Conference
    on Machine Learning, ML Research Press, 2024, pp. 44012–44037.
conference:
  end_date: 2024-07-27
  location: Vienna, Austria
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2024-07-21
corr_author: '1'
date_created: 2024-09-22T22:01:45Z
date_published: 2024-09-01T00:00:00Z
date_updated: 2026-04-07T11:46:11Z
day: '01'
department:
- _id: ChLa
external_id:
  arxiv:
  - '2406.02416'
intvolume: '       235'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2406.02416
month: '09'
oa: 1
oa_version: Preprint
page: 44012-44037
publication: Proceedings of the 41st International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '21198'
    relation: dissertation_contains
    status: public
scopus_import: '1'
status: public
title: Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 235
year: '2024'
...
---
OA_place: repository
OA_type: green
_id: '12660'
abstract:
- lang: eng
  text: 'We present Cross-Client Label Propagation(XCLP), a new method for transductive
    federated learning. XCLP estimates a data graph jointly from the data of multiple
    clients and computes labels for the unlabeled data by propagating label information
    across the graph. To avoid clients having to share their data with anyone, XCLP
    employs two cryptographically secure protocols: secure Hamming distance computation
    and secure summation. We demonstrate two distinct applications of XCLP within
    federated learning. In the first, we use it in a one-shot way to predict labels
    for unseen test points. In the second, we use it to repeatedly pseudo-label unlabeled
    training data in a federated semi-supervised setting. Experiments on both real
    federated and standard benchmark datasets show that in both applications XCLP
    achieves higher classification accuracy than alternative approaches.'
alternative_title:
- TMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Jonathan A
  full_name: Scott, Jonathan A
  id: e499926b-f6e0-11ea-865d-9c63db0031e8
  last_name: Scott
- first_name: Michelle X
  full_name: Yeo, Michelle X
  id: 2D82B818-F248-11E8-B48F-1D18A9856A87
  last_name: Yeo
  orcid: 0009-0001-3676-4809
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Scott JA, Yeo MX, Lampert C. Cross-client label propagation for transductive
    and semi-supervised federated learning. In: <i>Transactions in Machine Learning</i>.
    Curran Associates; 2023.'
  apa: Scott, J. A., Yeo, M. X., &#38; Lampert, C. (2023). Cross-client label propagation
    for transductive and semi-supervised federated learning. In <i>Transactions in
    Machine Learning</i>. Curran Associates.
  chicago: Scott, Jonathan A, Michelle X Yeo, and Christoph Lampert. “Cross-Client
    Label Propagation for Transductive and Semi-Supervised Federated Learning.” In
    <i>Transactions in Machine Learning</i>. Curran Associates, 2023.
  ieee: J. A. Scott, M. X. Yeo, and C. Lampert, “Cross-client label propagation for
    transductive and semi-supervised federated learning,” in <i>Transactions in Machine
    Learning</i>, 2023.
  ista: Scott JA, Yeo MX, Lampert C. 2023. Cross-client label propagation for transductive
    and semi-supervised federated learning. Transactions in Machine Learning. , TMLR,
    .
  mla: Scott, Jonathan A., et al. “Cross-Client Label Propagation for Transductive
    and Semi-Supervised Federated Learning.” <i>Transactions in Machine Learning</i>,
    Curran Associates, 2023.
  short: J.A. Scott, M.X. Yeo, C. Lampert, in:, Transactions in Machine Learning,
    Curran Associates, 2023.
corr_author: '1'
date_created: 2023-02-20T08:21:50Z
date_published: 2023-11-27T00:00:00Z
date_updated: 2025-02-04T08:32:19Z
day: '27'
ddc:
- '004'
department:
- _id: ChLa
external_id:
  arxiv:
  - '2210.06434'
file:
- access_level: open_access
  checksum: aa322ad91cbd229f5cafe6733a119bd1
  content_type: application/pdf
  creator: dernst
  date_created: 2025-02-04T08:30:05Z
  date_updated: 2025-02-04T08:30:05Z
  file_id: '18990'
  file_name: 2023_TMLR_Scott.pdf
  file_size: 553717
  relation: main_file
  success: 1
file_date_updated: 2025-02-04T08:30:05Z
has_accepted_license: '1'
language:
- iso: eng
month: '11'
oa: 1
oa_version: Preprint
publication: Transactions in Machine Learning
publication_identifier:
  issn:
  - 2835-8856
publication_status: published
publisher: Curran Associates
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/jonnyascott/xclp
status: public
title: Cross-client label propagation for transductive and semi-supervised federated
  learning
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
