Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials
Scott JA, Cahill Á. 2024. Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 44012–44037.
Download (ext.)
https://doi.org/10.48550/arXiv.2406.02416
[Preprint]
Conference Paper
| Published
| English
Scopus indexed
Author
Scott, Jonathan AISTA;
Cahill, Áine
Corresponding author has ISTA affiliation
Department
Series Title
PMLR
Abstract
In practice, training using federated learning can be orders of magnitude slower than standard centralized training. This severely limits the amount of experimentation and tuning that can be done, making it challenging to obtain good performance on a given task. Server-side proxy data can be used to run training simulations, for instance for hyperparameter tuning. This can greatly speed up the training pipeline by reducing the number of tuning runs to be performed overall on the true clients. However, it is challenging to ensure that these simulations accurately reflect the dynamics of the real federated training. In particular, the proxy data used for simulations often comes as a single centralized dataset without a partition into distinct clients, and partitioning this data in a naive way can lead to simulations that poorly reflect real federated training. In this paper we address the challenge of how to partition centralized data in a way that reflects the statistical heterogeneity of the true federated clients. We propose a fully federated, theoretically justified, algorithm that efficiently learns the distribution of the true clients and observe improved server-side simulations when using the inferred distribution to create simulated clients from the centralized data.
Publishing Year
Date Published
2024-09-01
Proceedings Title
Proceedings of the 41st International Conference on Machine Learning
Publisher
ML Research Press
Acknowledgement
We would like to thank: Mona Chitnis and everyone in the Private Federated Learning team at Apple for their help and support throughout the entire project; Audra McMillan, Martin Pelikan, Anosh Raj and Barry Theobold for feedback on the initial versions of the paper; and Christoph Lampert for valuable feedback on the paper structure and suggestions for additional experiments.
Volume
235
Page
44012-44037
Conference
ICML: International Conference on Machine Learning
Conference Location
Vienna, Austria
Conference Date
2024-07-21 – 2024-07-27
eISSN
IST-REx-ID
Cite this
Scott JA, Cahill Á. Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials. In: Proceedings of the 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:44012-44037.
Scott, J. A., & Cahill, Á. (2024). Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 44012–44037). Vienna, Austria: ML Research Press.
Scott, Jonathan A, and Áine Cahill. “Improved Modelling of Federated Datasets Using Mixtures-of-Dirichlet-Multinomials.” In Proceedings of the 41st International Conference on Machine Learning, 235:44012–37. ML Research Press, 2024.
J. A. Scott and Á. Cahill, “Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials,” in Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 2024, vol. 235, pp. 44012–44037.
Scott JA, Cahill Á. 2024. Improved modelling of federated datasets using mixtures-of-Dirichlet-multinomials. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 44012–44037.
Scott, Jonathan A., and Áine Cahill. “Improved Modelling of Federated Datasets Using Mixtures-of-Dirichlet-Multinomials.” Proceedings of the 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 44012–37.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level
Open Access
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2406.02416