{"title":"Test-time training provably improves transformers as in-context learners","month":"11","language":[{"iso":"eng"}],"oa":1,"date_published":"2025-11-30T00:00:00Z","file_date_updated":"2026-02-19T08:15:48Z","status":"public","acknowledgement":"H.A.G., M.E.I., X.Z., and S.O. were supported in part by the NSF grants CCF2046816, CCF-2403075, CCF-2008020, and the Office of Naval Research grant N000142412289.\r\nM. M. is funded by the European Union (ERC, INF2 , project number 101161364). Views and opinions expressed are, however, those of the author(s) only and do not necessarily\r\nreflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. M.S. is supported by the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF-CAREER under award #1846369, DARPA FastNICS program, and NSF-CIF awards #1813877 and #2008443, and NIH DP2LM014564-01. The authors also\r\nacknowledge further support from Open Philanthropy, OpenAI, Amazon Research, Google Research, and Microsoft Research.","quality_controlled":"1","OA_type":"gold","abstract":[{"lang":"eng","text":"Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost."}],"OA_place":"publisher","citation":{"mla":"Gozeten, Halil Alperen, et al. “Test-Time Training Provably Improves Transformers as in-Context Learners.” <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol. 267, ML Research Press, 2025, pp. 20266–95.","ama":"Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. Test-time training provably improves transformers as in-context learners. In: <i>Proceedings of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research Press; 2025:20266-20295.","chicago":"Gozeten, Halil Alperen, Muhammed Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, and Samet Oymak. “Test-Time Training Provably Improves Transformers as in-Context Learners.” In <i>Proceedings of the 42nd International Conference on Machine Learning</i>, 267:20266–95. ML Research Press, 2025.","short":"H.A. Gozeten, M.E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, S. Oymak, in:, Proceedings of the 42nd International Conference on Machine Learning, ML Research Press, 2025, pp. 20266–20295.","apa":"Gozeten, H. A., Ildiz, M. E., Zhang, X., Soltanolkotabi, M., Mondelli, M., &#38; Oymak, S. (2025). Test-time training provably improves transformers as in-context learners. In <i>Proceedings of the 42nd International Conference on Machine Learning</i> (Vol. 267, pp. 20266–20295). Vancouver, Canada: ML Research Press.","ista":"Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. 2025. Test-time training provably improves transformers as in-context learners. Proceedings of the 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 20266–20295.","ieee":"H. A. Gozeten, M. E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, and S. Oymak, “Test-time training provably improves transformers as in-context learners,” in <i>Proceedings of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada, 2025, vol. 267, pp. 20266–20295."},"file":[{"date_created":"2026-02-19T08:15:48Z","content_type":"application/pdf","date_updated":"2026-02-19T08:15:48Z","file_id":"21336","file_name":"2025_ICML_Gozeten.pdf","access_level":"open_access","creator":"dernst","success":1,"checksum":"f774f8619a0d72f3975d9cb23942a1e9","relation":"main_file","file_size":471176}],"page":"20266-20295","project":[{"grant_number":"101161364","name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits","_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf"}],"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png","short":"CC BY (4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode"},"ddc":["000"],"_id":"21325","has_accepted_license":"1","year":"2025","oa_version":"Published Version","publisher":"ML Research Press","external_id":{"pmid":["41321376"]},"alternative_title":["PMLR"],"pmid":1,"date_updated":"2026-02-19T08:18:24Z","department":[{"_id":"MaMo"}],"volume":267,"conference":{"name":"ICML: International Conference on Machine Learning","location":"Vancouver, Canada","start_date":"2025-07-13","end_date":"2025-07-19"},"date_created":"2026-02-18T12:00:44Z","publication_identifier":{"eissn":["2640-3498"]},"publication_status":"published","day":"30","author":[{"full_name":"Gozeten, Halil Alperen","last_name":"Gozeten","first_name":"Halil Alperen"},{"full_name":"Ildiz, Muhammed Emrullah","last_name":"Ildiz","first_name":"Muhammed Emrullah"},{"full_name":"Zhang, Xuechen","last_name":"Zhang","first_name":"Xuechen"},{"last_name":"Soltanolkotabi","first_name":"Mahdi","full_name":"Soltanolkotabi, Mahdi"},{"first_name":"Marco","last_name":"Mondelli","id":"27EB676C-8706-11E9-9510-7717E6697425","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco"},{"full_name":"Oymak, Samet","first_name":"Samet","last_name":"Oymak"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","intvolume":"       267","article_processing_charge":"No","type":"conference","publication":"Proceedings of the 42nd International Conference on Machine Learning"}