Data heterogeneity and personalization in federated learning

Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.

Download
OA 2026_Jonathan_Scott_Thesis.pdf 15.22 MB [Published Version]

Thesis | PhD | Published | English

Corresponding author has ISTA affiliation

Series Title
ISTA Thesis
Abstract
In recent years there has been a massive increase in the amount of data generated in a decentralized manner. Ever more powerful edge devices, such as smartphones, have become ubiquitous in most societies on earth. Through text typed, photos taken and apps used, these devices, which we refer to as clients, generate enormous amounts of high quality and complex data. Moreover, the nature of these devices means the data they generate is often sensitive and privacy concerns prevent it being gathered and stored in a central location. This presents a challenge to the modern machine learning paradigm that requires central access to large amounts of data. Federated learning (FL) has emerged as one of the answers to this problem. Rather than bringing the data to the model, FL sends the model to the data. Model training takes place on device, with periodically synchronized updates, allowing data to remain locally stored. While this approach offers significant privacy advantages it comes with its own set of unique challenges. These include: data heterogeneity, the notion that different devices generate data in distinct ways which can negatively impact training dynamics; systems heterogeneity, meaning that different devices may have differing hardware specifications; high communication costs, which are induced by the repeated transferring of models over the network and low device computational power, which limits the use of larger models on device. In this thesis we present a range of methods for federated learning. We focus primarily on the challenge of data heterogeneity, though the methods presented are designed to be well adapted to the other challenges of a federated setting, such as the constraints of limited compute and communication overhead. We first present a method for explicitly modeling client data heterogeneity. The approach formulates clients as samples from a certain probability distribution and infers the parameters of this distribution from the available training clients. This learned distribution then represents the heterogeneity present among the clients and can be sampled from in order to create new simulated clients that are similar to the real clients we have observed so far. Following this we present two methods for directly dealing with data heterogeneity through personalization. Highly heterogeneous client data distributions can mean that learning a single global model becomes suboptimal, and some form of personalization of models to each individual client is required. Our approaches are based around hypernetworks, which we use to generate personalized model parameters without the need for additional training or finetuning. In the first approach we focus on generating full parameterizations of client models using learned embeddings of client data and labels, with a hypernetwork located on the central server. In the second approach we address the more challenging scenario where we want to generate a personalized model for a client without any label information. The hypernetwork is trained to generate a low dimensional representation of a client’s personalized model parameters, allowing it to be transferred to and run on the client devices. In our final presented method, we change our focus and rather than aim to directly address the challenge of data heterogeneity, we instead ensure we are unaffected by it. This is done in the context of k-means clustering and we present a method for federated clustering with a focus on added privacy guarantees.
Publishing Year
Date Published
2026-02-09
Publisher
Institute of Science and Technology Austria
Acknowledgement
This research was funded in part by the Austrian Science Fund (FWF) [10.55776/COE12]. Furthermore, the candidate acknowledges the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).
Acknowledged SSUs
Page
158
ISSN
IST-REx-ID

Cite this

Scott JA. Data heterogeneity and personalization in federated learning. 2026. doi:10.15479/AT-ISTA-21198
Scott, J. A. (2026). Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria. https://doi.org/10.15479/AT-ISTA-21198
Scott, Jonathan A. “Data Heterogeneity and Personalization in Federated Learning.” Institute of Science and Technology Austria, 2026. https://doi.org/10.15479/AT-ISTA-21198.
J. A. Scott, “Data heterogeneity and personalization in federated learning,” Institute of Science and Technology Austria, 2026.
Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.
Scott, Jonathan A. Data Heterogeneity and Personalization in Federated Learning. Institute of Science and Technology Austria, 2026, doi:10.15479/AT-ISTA-21198.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access
Date Uploaded
2026-02-27
MD5 Checksum
6e3e08ba474bbee8511cc8a839ab2077

Source File
Access Level
Restricted Closed Access
Date Uploaded
2026-02-17
MD5 Checksum
121c1d968bd86f3630aa7e81d5bbbcb0

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar