High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws
Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. 2025. High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 2967–3006.
Download
Conference Paper
| Published
| English
Scopus indexed
Author
Emrullah Ildiz, M.;
Gozeten, Halil Alperen;
Taga, Ege Onur;
Mondelli, MarcoISTA
;
Oymak, Samet

Department
Abstract
A growing number of machine learning scenarios rely on knowledge distillation where one uses the output of a surrogate model as labels to supervise the training of a target model. In this work, we provide a sharp characterization of this process for ridgeless, high-dimensional regression, under two settings: (i) model shift, where the surrogate model is arbitrary, and (ii) distribution shift, where the surrogate model is the solution of empirical risk minimization with out-of-distribution data. In both cases, we characterize the precise risk of the target model through non-asymptotic bounds in terms of sample size and data distribution under mild conditions. As a consequence, we identify the form of the optimal surrogate model, which reveals the benefits and limitations of discarding weak features in a data-dependent fashion. In the context of weak-to-strong (W2S) generalization, this has the interpretation that (i) W2S training, with the surrogate as the weak model, can provably outperform training with strong labels under the same data budget, but (ii) it is unable to improve the data scaling law. We validate our results on numerical experiments both on ridgeless regression and on neural network architectures.
Publishing Year
Date Published
2025-04-01
Proceedings Title
13th International Conference on Learning Representations
Publisher
ICLR
Acknowledgement
M.E.I., H.A.G., E.O.T., S.O. are supported by the NSF grants CCF-2046816, CCF-2403075, the Office of Naval Research grant N000142412289, an OpenAI Agentic AI Systems grant, and gifts by Open Philanthropy and Google Research. M. M. is funded by the European Union (ERC, INF2, project number 101161364). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Page
2967-3006
Conference
ICLR: International Conference on Learning Representations
Conference Location
Singapore, Singapore
Conference Date
2025-04-24 – 2025-04-28
ISBN
IST-REx-ID
Cite this
Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. In: 13th International Conference on Learning Representations. ICLR; 2025:2967-3006.
Emrullah Ildiz, M., Gozeten, H. A., Taga, E. O., Mondelli, M., & Oymak, S. (2025). High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. In 13th International Conference on Learning Representations (pp. 2967–3006). Singapore, Singapore: ICLR.
Emrullah Ildiz, M., Halil Alperen Gozeten, Ege Onur Taga, Marco Mondelli, and Samet Oymak. “High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws.” In 13th International Conference on Learning Representations, 2967–3006. ICLR, 2025.
M. Emrullah Ildiz, H. A. Gozeten, E. O. Taga, M. Mondelli, and S. Oymak, “High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 2967–3006.
Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. 2025. High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 2967–3006.
Emrullah Ildiz, M., et al. “High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws.” 13th International Conference on Learning Representations, ICLR, 2025, pp. 2967–3006.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
2025_ICLR_Ildiz.pdf
528.17 KB
Access Level

Date Uploaded
2025-08-04
MD5 Checksum
5a38b093ebb4ee4eb662ea142621a5ca
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2410.18837