Data heterogeneity and personalization in federated learning

Scott, Jonathan A

Data heterogeneity and personalization in federated learning

Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.

Download

2026_Jonathan_Scott_Thesis.pdf 15.22 MB [Published Version]

DOI

10.15479/AT-ISTA-21198

Thesis | PhD | Published | English

Author

Scott, Jonathan A^ISTA

Supervisor

Lampert, Christoph^ISTA

Corresponding author has ISTA affiliation

Department

Graduate School
Lampert Group

Series Title

ISTA Thesis

Abstract

In recent years there has been a massive increase in the amount of data generated in a decentralized manner. Ever more powerful edge devices, such as smartphones, have become ubiquitous in most societies on earth. Through text typed, photos taken and apps used, these devices, which we refer to as clients, generate enormous amounts of high quality and complex data. Moreover, the nature of these devices means the data they generate is often sensitive and privacy concerns prevent it being gathered and stored in a central location. This presents a challenge to the modern machine learning paradigm that requires central access to large amounts of data. Federated learning (FL) has emerged as one of the answers to this problem. Rather than bringing the data to the model, FL sends the model to the data. Model training takes place on device, with periodically synchronized updates, allowing data to remain locally stored. While this approach offers significant privacy advantages it comes with its own set of unique challenges. These include: data heterogeneity, the notion that different devices generate data in distinct ways which can negatively impact training dynamics; systems heterogeneity, meaning that different devices may have differing hardware specifications; high communication costs, which are induced by the repeated transferring of models over the network and low device computational power, which limits the use of larger models on device. In this thesis we present a range of methods for federated learning. We focus primarily on the challenge of data heterogeneity, though the methods presented are designed to be well adapted to the other challenges of a federated setting, such as the constraints of limited compute and communication overhead. We first present a method for explicitly modeling client data heterogeneity. The approach formulates clients as samples from a certain probability distribution and infers the parameters of this distribution from the available training clients. This learned distribution then represents the heterogeneity present among the clients and can be sampled from in order to create new simulated clients that are similar to the real clients we have observed so far. Following this we present two methods for directly dealing with data heterogeneity through personalization. Highly heterogeneous client data distributions can mean that learning a single global model becomes suboptimal, and some form of personalization of models to each individual client is required. Our approaches are based around hypernetworks, which we use to generate personalized model parameters without the need for additional training or finetuning. In the first approach we focus on generating full parameterizations of client models using learned embeddings of client data and labels, with a hypernetwork located on the central server. In the second approach we address the more challenging scenario where we want to generate a personalized model for a client without any label information. The hypernetwork is trained to generate a low dimensional representation of a client’s personalized model parameters, allowing it to be transferred to and run on the client devices. In our final presented method, we change our focus and rather than aim to directly address the challenge of data heterogeneity, we instead ensure we are unaffected by it. This is done in the context of k-means clustering and we present a method for federated clustering with a focus on added privacy guarantees.

Publishing Year

2026

Date Published

2026-02-09

Publisher

Institute of Science and Technology Austria

Acknowledgement

This research was funded in part by the Austrian Science Fund (FWF) [10.55776/COE12]. Furthermore, the candidate acknowledges the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).

Acknowledged SSUs

Scientific Computing

Page

158

ISSN

2663-337X

IST-REx-ID

21198

Cite this

Scott JA. Data heterogeneity and personalization in federated learning. 2026. doi:10.15479/AT-ISTA-21198

Scott, J. A. (2026). Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria. https://doi.org/10.15479/AT-ISTA-21198

Scott, Jonathan A. “Data Heterogeneity and Personalization in Federated Learning.” Institute of Science and Technology Austria, 2026. https://doi.org/10.15479/AT-ISTA-21198.

J. A. Scott, “Data heterogeneity and personalization in federated learning,” Institute of Science and Technology Austria, 2026.

Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.

Scott, Jonathan A. Data Heterogeneity and Personalization in Federated Learning. Institute of Science and Technology Austria, 2026, doi:10.15479/AT-ISTA-21198.

All files available under the following license(s):

Copyright Statement: