Test-time training provably improves transformers as in-context learners

Gozeten, Halil Alperen; Ildiz, Muhammed Emrullah; Zhang, Xuechen; Soltanolkotabi, Mahdi; Mondelli, Marco; Oymak, Samet

Test-time training provably improves transformers as in-context learners

Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. 2025. Test-time training provably improves transformers as in-context learners. Proceedings of the 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 20266–20295.

Download

2025_ICML_Gozeten.pdf 471.18 KB [Published Version]

Conference Paper | Published | English

Author

Gozeten, Halil Alperen; Ildiz, Muhammed Emrullah; Zhang, Xuechen; Soltanolkotabi, Mahdi; Mondelli, Marco^ISTA ; Oymak, Samet

Department

Mondelli Group

Grant

Inference in High Dimensions: Light-speed Algorithms and Information Limits

Series Title

PMLR

Abstract

Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.

Publishing Year

2025

Date Published

2025-11-30

Proceedings Title

Proceedings of the 42nd International Conference on Machine Learning

Publisher

ML Research Press

Acknowledgement

H.A.G., M.E.I., X.Z., and S.O. were supported in part by the NSF grants CCF2046816, CCF-2403075, CCF-2008020, and the Office of Naval Research grant N000142412289. M. M. is funded by the European Union (ERC, INF2 , project number 101161364). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. M.S. is supported by the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF-CAREER under award #1846369, DARPA FastNICS program, and NSF-CIF awards #1813877 and #2008443, and NIH DP2LM014564-01. The authors also acknowledge further support from Open Philanthropy, OpenAI, Amazon Research, Google Research, and Microsoft Research.

Volume

267

Page

20266-20295

Conference

ICML: International Conference on Machine Learning

Conference Location

Vancouver, Canada

Conference Date

2025-07-13 – 2025-07-19

eISSN

2640-3498

IST-REx-ID

21325

Cite this

Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. Test-time training provably improves transformers as in-context learners. In: Proceedings of the 42nd International Conference on Machine Learning. Vol 267. ML Research Press; 2025:20266-20295.

Gozeten, H. A., Ildiz, M. E., Zhang, X., Soltanolkotabi, M., Mondelli, M., & Oymak, S. (2025). Test-time training provably improves transformers as in-context learners. In Proceedings of the 42nd International Conference on Machine Learning (Vol. 267, pp. 20266–20295). Vancouver, Canada: ML Research Press.

Gozeten, Halil Alperen, Muhammed Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, and Samet Oymak. “Test-Time Training Provably Improves Transformers as in-Context Learners.” In Proceedings of the 42nd International Conference on Machine Learning, 267:20266–95. ML Research Press, 2025.

H. A. Gozeten, M. E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, and S. Oymak, “Test-time training provably improves transformers as in-context learners,” in Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada, 2025, vol. 267, pp. 20266–20295.

Gozeten, Halil Alperen, et al. “Test-Time Training Provably Improves Transformers as in-Context Learners.” Proceedings of the 42nd International Conference on Machine Learning, vol. 267, ML Research Press, 2025, pp. 20266–95.

All files available under the following license(s):

Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):