{"language":[{"iso":"eng"}],"month":"07","title":"Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization","oa":1,"quality_controlled":"1","acknowledgement":"Marco Mondelli is funded by the European Union (ERC, INF2, project number 101161364). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. Simone Bombari is supported by a Google PhD fellowship. The authors would like to thank GuanWen Qiu for helpful discussions.","date_published":"2025-07-30T00:00:00Z","status":"public","file_date_updated":"2026-02-19T08:04:38Z","OA_place":"publisher","abstract":[{"text":"Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature x and a spurious feature y. Specifically, we quantify the amount of spurious correlations C learned via linear regression, in terms of the data covariance and the strength λ of the ridge regularization. As a consequence, we first capture the simplicity of y through the spectrum of its covariance, and its correlation with x through the Schur complement of the full data covariance. Next, we prove a trade-off between C and the in-distribution test loss L, by showing that the value of λ that minimizes L lies in an interval where C is increasing. Finally, we investigate the effects of over-parameterization via the random features model, by showing its equivalence to regularized linear regression. Our theoretical results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10 datasets.","lang":"eng"}],"OA_type":"gold","file":[{"relation":"main_file","file_size":887526,"checksum":"d4ba4f7717b362ca38878f45e57bd643","success":1,"creator":"dernst","access_level":"open_access","file_name":"2025_ICML_Bombari.pdf","file_id":"21335","date_updated":"2026-02-19T08:04:38Z","date_created":"2026-02-19T08:04:38Z","content_type":"application/pdf"}],"citation":{"apa":"Bombari, S., & Mondelli, M. (2025). Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization. In Proceedings of the 42nd International Conference on Machine Learning (Vol. 267, pp. 4839–4873). Vancouver, Canada: ML Research Press.","ieee":"S. Bombari and M. Mondelli, “Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization,” in Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada, 2025, vol. 267, pp. 4839–4873.","ista":"Bombari S, Mondelli M. 2025. Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization. Proceedings of the 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 4839–4873.","ama":"Bombari S, Mondelli M. Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization. In: Proceedings of the 42nd International Conference on Machine Learning. Vol 267. ML Research Press; 2025:4839-4873.","chicago":"Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.” In Proceedings of the 42nd International Conference on Machine Learning, 267:4839–73. ML Research Press, 2025.","short":"S. Bombari, M. Mondelli, in:, Proceedings of the 42nd International Conference on Machine Learning, ML Research Press, 2025, pp. 4839–4873.","mla":"Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.” Proceedings of the 42nd International Conference on Machine Learning, vol. 267, ML Research Press, 2025, pp. 4839–73."},"tmp":{"name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)","image":"/images/cc_by.png","short":"CC BY (4.0)","legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode"},"project":[{"_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits","grant_number":"101161364"},{"_id":"92099302-16d5-11f0-9cad-f9a785f54fbd","name":"Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust LLMs"}],"page":"4839-4873","ddc":["000"],"arxiv":1,"_id":"21324","year":"2025","has_accepted_license":"1","corr_author":"1","publisher":"ML Research Press","oa_version":"Published Version","alternative_title":["PMLR"],"external_id":{"arxiv":["2502.01347"]},"date_created":"2026-02-18T11:58:00Z","conference":{"start_date":"2025-07-13","location":"Vancouver, Canada","end_date":"2025-07-19","name":"ICML: International Conference on Machine Learning"},"department":[{"_id":"MaMo"}],"volume":267,"date_updated":"2026-02-19T08:08:55Z","publication_status":"published","publication_identifier":{"eissn":["2640-3498"]},"intvolume":" 267","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","author":[{"full_name":"Bombari, Simone","first_name":"Simone","last_name":"Bombari","id":"ca726dda-de17-11ea-bc14-f9da834f63aa"},{"full_name":"Mondelli, Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","orcid":"0000-0002-3242-7020","first_name":"Marco","last_name":"Mondelli"}],"day":"30","publication":"Proceedings of the 42nd International Conference on Machine Learning","type":"conference","article_processing_charge":"No"}