Data heterogeneity and personalization in federated learning
Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.
Download
Thesis
| PhD
| Published
| English
Author
Supervisor
Corresponding author has ISTA affiliation
Department
Series Title
ISTA Thesis
Abstract
In recent years there has been a massive increase in the amount of data generated in a
decentralized manner. Ever more powerful edge devices, such as smartphones, have become
ubiquitous in most societies on earth. Through text typed, photos taken and apps used,
these devices, which we refer to as clients, generate enormous amounts of high quality and
complex data. Moreover, the nature of these devices means the data they generate is often
sensitive and privacy concerns prevent it being gathered and stored in a central location. This
presents a challenge to the modern machine learning paradigm that requires central access
to large amounts of data. Federated learning (FL) has emerged as one of the answers to
this problem. Rather than bringing the data to the model, FL sends the model to the data.
Model training takes place on device, with periodically synchronized updates, allowing data to
remain locally stored. While this approach offers significant privacy advantages it comes with
its own set of unique challenges. These include: data heterogeneity, the notion that different
devices generate data in distinct ways which can negatively impact training dynamics; systems
heterogeneity, meaning that different devices may have differing hardware specifications; high
communication costs, which are induced by the repeated transferring of models over the
network and low device computational power, which limits the use of larger models on device.
In this thesis we present a range of methods for federated learning. We focus primarily on
the challenge of data heterogeneity, though the methods presented are designed to be well
adapted to the other challenges of a federated setting, such as the constraints of limited
compute and communication overhead. We first present a method for explicitly modeling client
data heterogeneity. The approach formulates clients as samples from a certain probability
distribution and infers the parameters of this distribution from the available training clients.
This learned distribution then represents the heterogeneity present among the clients and can
be sampled from in order to create new simulated clients that are similar to the real clients we
have observed so far. Following this we present two methods for directly dealing with data
heterogeneity through personalization. Highly heterogeneous client data distributions can mean
that learning a single global model becomes suboptimal, and some form of personalization of
models to each individual client is required. Our approaches are based around hypernetworks,
which we use to generate personalized model parameters without the need for additional
training or finetuning. In the first approach we focus on generating full parameterizations of
client models using learned embeddings of client data and labels, with a hypernetwork located
on the central server. In the second approach we address the more challenging scenario where
we want to generate a personalized model for a client without any label information. The
hypernetwork is trained to generate a low dimensional representation of a client’s personalized
model parameters, allowing it to be transferred to and run on the client devices. In our final
presented method, we change our focus and rather than aim to directly address the challenge
of data heterogeneity, we instead ensure we are unaffected by it. This is done in the context
of k-means clustering and we present a method for federated clustering with a focus on added
privacy guarantees.
Publishing Year
Date Published
2026-02-09
Publisher
Institute of Science and Technology Austria
Acknowledgement
This research was funded in part by the Austrian Science Fund (FWF)
[10.55776/COE12]. Furthermore, the candidate acknowledges the support from the Scientific
Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).
Acknowledged SSUs
Page
158
ISSN
IST-REx-ID
Cite this
Scott JA. Data heterogeneity and personalization in federated learning. 2026. doi:10.15479/AT-ISTA-21198
Scott, J. A. (2026). Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria. https://doi.org/10.15479/AT-ISTA-21198
Scott, Jonathan A. “Data Heterogeneity and Personalization in Federated Learning.” Institute of Science and Technology Austria, 2026. https://doi.org/10.15479/AT-ISTA-21198.
J. A. Scott, “Data heterogeneity and personalization in federated learning,” Institute of Science and Technology Austria, 2026.
Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.
Scott, Jonathan A. Data Heterogeneity and Personalization in Federated Learning. Institute of Science and Technology Austria, 2026, doi:10.15479/AT-ISTA-21198.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
2026_Jonathan_Scott_Thesis.pdf
15.22 MB
Access Level
Open Access
Date Uploaded
2026-02-27
MD5 Checksum
6e3e08ba474bbee8511cc8a839ab2077
Source File
File Name
2026_Scott_Jonathan_Thesis_Source.zip
272.38 MB
Access Level
Closed Access
Date Uploaded
2026-02-17
MD5 Checksum
121c1d968bd86f3630aa7e81d5bbbcb0
Material in ISTA:
Part of this Dissertation
Part of this Dissertation
Part of this Dissertation
Part of this Dissertation
