{"OA_place":"repository","language":[{"iso":"eng"}],"author":[{"first_name":"Jonathan A","full_name":"Scott, Jonathan A","id":"e499926b-f6e0-11ea-865d-9c63db0031e8","last_name":"Scott"}],"file":[{"date_updated":"2026-02-17T11:46:22Z","file_id":"21298","relation":"source_file","file_name":"2026_Scott_Jonathan_Thesis_Source.zip","checksum":"121c1d968bd86f3630aa7e81d5bbbcb0","date_created":"2026-02-17T11:46:22Z","content_type":"application/zip","creator":"jscott","file_size":272379252,"access_level":"closed"},{"date_created":"2026-02-27T10:25:41Z","checksum":"6e3e08ba474bbee8511cc8a839ab2077","file_name":"2026_Jonathan_Scott_Thesis.pdf","relation":"main_file","file_id":"21366","date_updated":"2026-02-27T10:25:41Z","success":1,"access_level":"open_access","content_type":"application/pdf","creator":"jscott","file_size":15220298}],"publication_status":"published","type":"dissertation","year":"2026","corr_author":"1","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","publisher":"Institute of Science and Technology Austria","doi":"10.15479/AT-ISTA-21198","degree_awarded":"PhD","oa_version":"Published Version","date_published":"2026-02-09T00:00:00Z","related_material":{"record":[{"id":"20819","status":"public","relation":"part_of_dissertation"},{"relation":"part_of_dissertation","id":"17411","status":"public"},{"id":"18120","relation":"part_of_dissertation","status":"public"},{"id":"21207","relation":"part_of_dissertation","status":"public"}]},"day":"09","title":"Data heterogeneity and personalization in federated learning","article_processing_charge":"No","publication_identifier":{"issn":["2663-337X"]},"_id":"21198","department":[{"_id":"GradSch"},{"_id":"ChLa"}],"acknowledged_ssus":[{"_id":"ScienComp"}],"supervisor":[{"orcid":"0000-0001-8622-7887","full_name":"Lampert, Christoph","first_name":"Christoph","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87","last_name":"Lampert"}],"date_created":"2026-02-09T14:59:53Z","oa":1,"ddc":["005"],"alternative_title":["ISTA Thesis"],"acknowledgement":"This research was funded in part by the Austrian Science Fund (FWF)\r\n[10.55776/COE12]. Furthermore, the candidate acknowledges the support from the Scientific\r\nService Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).","has_accepted_license":"1","month":"02","status":"public","abstract":[{"lang":"eng","text":"In recent years there has been a massive increase in the amount of data generated in a\r\ndecentralized manner. Ever more powerful edge devices, such as smartphones, have become\r\nubiquitous in most societies on earth. Through text typed, photos taken and apps used,\r\nthese devices, which we refer to as clients, generate enormous amounts of high quality and\r\ncomplex data. Moreover, the nature of these devices means the data they generate is often\r\nsensitive and privacy concerns prevent it being gathered and stored in a central location. This\r\npresents a challenge to the modern machine learning paradigm that requires central access\r\nto large amounts of data. Federated learning (FL) has emerged as one of the answers to\r\nthis problem. Rather than bringing the data to the model, FL sends the model to the data.\r\nModel training takes place on device, with periodically synchronized updates, allowing data to\r\nremain locally stored. While this approach offers significant privacy advantages it comes with\r\nits own set of unique challenges. These include: data heterogeneity, the notion that different\r\ndevices generate data in distinct ways which can negatively impact training dynamics; systems\r\nheterogeneity, meaning that different devices may have differing hardware specifications; high\r\ncommunication costs, which are induced by the repeated transferring of models over the\r\nnetwork and low device computational power, which limits the use of larger models on device.\r\nIn this thesis we present a range of methods for federated learning. We focus primarily on\r\nthe challenge of data heterogeneity, though the methods presented are designed to be well\r\nadapted to the other challenges of a federated setting, such as the constraints of limited\r\ncompute and communication overhead. We first present a method for explicitly modeling client\r\ndata heterogeneity. The approach formulates clients as samples from a certain probability\r\ndistribution and infers the parameters of this distribution from the available training clients.\r\nThis learned distribution then represents the heterogeneity present among the clients and can\r\nbe sampled from in order to create new simulated clients that are similar to the real clients we\r\nhave observed so far. Following this we present two methods for directly dealing with data\r\nheterogeneity through personalization. Highly heterogeneous client data distributions can mean\r\nthat learning a single global model becomes suboptimal, and some form of personalization of\r\nmodels to each individual client is required. Our approaches are based around hypernetworks,\r\nwhich we use to generate personalized model parameters without the need for additional\r\ntraining or finetuning. In the first approach we focus on generating full parameterizations of\r\nclient models using learned embeddings of client data and labels, with a hypernetwork located\r\non the central server. In the second approach we address the more challenging scenario where\r\nwe want to generate a personalized model for a client without any label information. The\r\nhypernetwork is trained to generate a low dimensional representation of a client’s personalized\r\nmodel parameters, allowing it to be transferred to and run on the client devices. In our final\r\npresented method, we change our focus and rather than aim to directly address the challenge\r\nof data heterogeneity, we instead ensure we are unaffected by it. This is done in the context\r\nof k-means clustering and we present a method for federated clustering with a focus on added\r\nprivacy guarantees."}],"page":"158","date_updated":"2026-03-03T08:20:57Z","citation":{"chicago":"Scott, Jonathan A. “Data Heterogeneity and Personalization in Federated Learning.” Institute of Science and Technology Austria, 2026. https://doi.org/10.15479/AT-ISTA-21198.","ista":"Scott JA. 2026. Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria.","short":"J.A. Scott, Data Heterogeneity and Personalization in Federated Learning, Institute of Science and Technology Austria, 2026.","ieee":"J. A. Scott, “Data heterogeneity and personalization in federated learning,” Institute of Science and Technology Austria, 2026.","ama":"Scott JA. Data heterogeneity and personalization in federated learning. 2026. doi:10.15479/AT-ISTA-21198","apa":"Scott, J. A. (2026). Data heterogeneity and personalization in federated learning. Institute of Science and Technology Austria. https://doi.org/10.15479/AT-ISTA-21198","mla":"Scott, Jonathan A. Data Heterogeneity and Personalization in Federated Learning. Institute of Science and Technology Austria, 2026, doi:10.15479/AT-ISTA-21198."},"file_date_updated":"2026-02-27T10:25:41Z"}