[{"publisher":"Elsevier","oa":1,"_id":"21488","oa_version":"Published Version","project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"},{"name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits","_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","grant_number":"101161364"},{"name":"Improving estimation and prediction of common complex disease risk","grant_number":"PCEGP3_181181","_id":"9B8D11D6-BA93-11EA-9121-9846C619BF3A"}],"department":[{"_id":"MaMo"},{"_id":"MaRo"}],"citation":{"ieee":"A. Depope, J. Bajzik, M. Mondelli, and M. R. Robinson, “Joint modeling of whole-genome sequencing data for human height via approximate message passing,” <i>Cell Genomics</i>. Elsevier, 2026.","mla":"Depope, Al, et al. “Joint Modeling of Whole-Genome Sequencing Data for Human Height via Approximate Message Passing.” <i>Cell Genomics</i>, 101162, Elsevier, 2026, doi:<a href=\"https://doi.org/10.1016/j.xgen.2026.101162\">10.1016/j.xgen.2026.101162</a>.","ista":"Depope A, Bajzik J, Mondelli M, Robinson MR. 2026. Joint modeling of whole-genome sequencing data for human height via approximate message passing. Cell Genomics., 101162.","ama":"Depope A, Bajzik J, Mondelli M, Robinson MR. Joint modeling of whole-genome sequencing data for human height via approximate message passing. <i>Cell Genomics</i>. 2026. doi:<a href=\"https://doi.org/10.1016/j.xgen.2026.101162\">10.1016/j.xgen.2026.101162</a>","apa":"Depope, A., Bajzik, J., Mondelli, M., &#38; Robinson, M. R. (2026). Joint modeling of whole-genome sequencing data for human height via approximate message passing. <i>Cell Genomics</i>. Elsevier. <a href=\"https://doi.org/10.1016/j.xgen.2026.101162\">https://doi.org/10.1016/j.xgen.2026.101162</a>","short":"A. Depope, J. Bajzik, M. Mondelli, M.R. Robinson, Cell Genomics (2026).","chicago":"Depope, Al, Jakub Bajzik, Marco Mondelli, and Matthew Richard Robinson. “Joint Modeling of Whole-Genome Sequencing Data for Human Height via Approximate Message Passing.” <i>Cell Genomics</i>. Elsevier, 2026. <a href=\"https://doi.org/10.1016/j.xgen.2026.101162\">https://doi.org/10.1016/j.xgen.2026.101162</a>."},"article_number":"101162","related_material":{"link":[{"description":"News on ISTA website","relation":"press_release","url":"https://ista.ac.at/en/news/big-data-and-human-height/"}]},"OA_place":"publisher","article_processing_charge":"Yes","quality_controlled":"1","publication_status":"epub_ahead","author":[{"last_name":"Depope","full_name":"Depope, Al","id":"0b77531d-dbcd-11ea-9d1d-a8eee0bf3830","first_name":"Al"},{"full_name":"Bajzik, Jakub","first_name":"Jakub","id":"b995e25b-8c4b-11ed-a6d8-f71b7bcd6122","last_name":"Bajzik"},{"first_name":"Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","last_name":"Mondelli"},{"last_name":"Robinson","first_name":"Matthew Richard","id":"E5D42276-F5DA-11E9-8E24-6303E6697425","full_name":"Robinson, Matthew Richard","orcid":"0000-0001-8982-8813"}],"type":"journal_article","DOAJ_listed":"1","doi":"10.1016/j.xgen.2026.101162","acknowledgement":"We thank Malgorzata Borczyk for creating the gene burden scores. We thank Robin Beaumont, Amedeo Roberto Esposito, Gareth Hawkes, Philip Schniter, Matthew Stephens, Pragya Sur, Peter Visscher, Michael Weedon, and Harry Wright for providing valuable suggestions and comments on earlier versions of the work. This project was funded by a Lopez-Loreta Prize to M.M., an SNSF Eccellenza Grant to M.R.R. (PCEGP3-181181), an ERC Starting Grant to M.M. (INF2, project number 101161364), and core funding from ISTA. High-performance computing was supported by the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp). We would like to acknowledge the participants and investigators of the UK Biobank study. We gratefully acknowledge the All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health All of Us Research Program for making available the participant data (and/or samples and/or cohort) examined in this study.","has_accepted_license":"1","status":"public","language":[{"iso":"eng"}],"corr_author":"1","main_file_link":[{"open_access":"1","url":"https://doi.org/10.1016/j.xgen.2026.101162"}],"OA_type":"gold","publication_identifier":{"eissn":["2666-979X"]},"license":"https://creativecommons.org/licenses/by-nc-nd/4.0/","abstract":[{"lang":"eng","text":"Human height is a model for the genetic analysis of complex traits, and recent studies suggest the presence of thousands of common genetic variant associations and hundreds of low-frequency/rare variants. Here, we develop a new algorithmic paradigm based on approximate message passing (genomic vector approximate message passing [gVAMP]) for identifying DNA sequence variants associated with complex traits and common diseases in large-scale whole-genome sequencing (WGS) data. We show that gVAMP accurately localizes associations to variants with the correct frequency and position in the DNA, outperforming existing fine-mapping methods in selecting the appropriate genetic variants within WGS data. We then apply gVAMP to jointly model the relationship of tens of millions of WGS variants with human height in hundreds of thousands of UK Biobank individuals. We identify 59 rare variants and gene burden scores alongside many hundreds of DNA regions containing common variant associations and show that understanding the genetic basis of complex traits will require the joint analysis of hundreds of millions of variables measured on millions of people. The polygenic risk scores obtained from gVAMP have high accuracy (including a prediction accuracy of ∼46% for human height) and outperform current methods for downstream tasks such as mixed linear model association testing across 13 UK Biobank traits. In conclusion, gVAMP offers a scalable foundation for a wider range of analyses in WGS data."}],"tmp":{"name":"Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)","image":"/images/cc_by_nc_nd.png","short":"CC BY-NC-ND (4.0)","legal_code_url":"https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode"},"ddc":["000","570"],"publication":"Cell Genomics","date_updated":"2026-04-28T12:08:37Z","date_created":"2026-03-23T15:10:03Z","date_published":"2026-02-18T00:00:00Z","month":"02","article_type":"original","year":"2026","user_id":"ba8df636-2132-11f1-aed0-ed93e2281fdd","title":"Joint modeling of whole-genome sequencing data for human height via approximate message passing","day":"18"},{"language":[{"iso":"eng"}],"file":[{"date_created":"2025-08-04T08:32:38Z","access_level":"open_access","relation":"main_file","file_size":528171,"date_updated":"2025-08-04T08:32:38Z","file_id":"20112","success":1,"content_type":"application/pdf","creator":"dernst","checksum":"5a38b093ebb4ee4eb662ea142621a5ca","file_name":"2025_ICLR_Ildiz.pdf"}],"OA_type":"diamond","license":"https://creativecommons.org/licenses/by/4.0/","publication_identifier":{"isbn":["9798331320850"]},"abstract":[{"text":"A growing number of machine learning scenarios rely on knowledge distillation where one uses the output of a surrogate model as labels to supervise the training of a target model. In this work, we provide a sharp characterization of this process for ridgeless, high-dimensional regression, under two settings: (i) model shift, where the surrogate model is arbitrary, and (ii) distribution shift, where the surrogate model is the solution of empirical risk minimization with out-of-distribution data. In both cases, we characterize the precise risk of the target model through non-asymptotic bounds in terms of sample size and data distribution under mild conditions. As a consequence, we identify the form of the optimal surrogate model, which reveals the benefits and limitations of discarding weak features in a data-dependent fashion. In the context of weak-to-strong (W2S) generalization, this has the interpretation that (i) W2S training, with the surrogate as the weak model, can provably outperform training with strong labels under the same data budget, but (ii) it is unable to improve the data scaling law. We validate our results on numerical experiments both on ridgeless regression and on neural network architectures.","lang":"eng"}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"page":"2967-3006","ddc":["000"],"publication":"13th International Conference on Learning Representations","date_updated":"2025-08-04T08:33:58Z","date_created":"2025-07-20T22:02:02Z","date_published":"2025-04-01T00:00:00Z","month":"04","year":"2025","scopus_import":"1","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws","day":"01","arxiv":1,"external_id":{"arxiv":["2410.18837"]},"publisher":"ICLR","file_date_updated":"2025-08-04T08:32:38Z","_id":"20033","oa_version":"Published Version","oa":1,"conference":{"end_date":"2025-04-28","start_date":"2025-04-24","name":"ICLR: International Conference on Learning Representations","location":"Singapore, Singapore"},"project":[{"_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","grant_number":"101161364","name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits"}],"citation":{"chicago":"Emrullah Ildiz, M., Halil Alperen Gozeten, Ege Onur Taga, Marco Mondelli, and Samet Oymak. “High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws.” In <i>13th International Conference on Learning Representations</i>, 2967–3006. ICLR, 2025.","ama":"Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. In: <i>13th International Conference on Learning Representations</i>. ICLR; 2025:2967-3006.","short":"M. Emrullah Ildiz, H.A. Gozeten, E.O. Taga, M. Mondelli, S. Oymak, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 2967–3006.","apa":"Emrullah Ildiz, M., Gozeten, H. A., Taga, E. O., Mondelli, M., &#38; Oymak, S. (2025). High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. In <i>13th International Conference on Learning Representations</i> (pp. 2967–3006). Singapore, Singapore: ICLR.","ista":"Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. 2025. High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 2967–3006.","mla":"Emrullah Ildiz, M., et al. “High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws.” <i>13th International Conference on Learning Representations</i>, ICLR, 2025, pp. 2967–3006.","ieee":"M. Emrullah Ildiz, H. A. Gozeten, E. O. Taga, M. Mondelli, and S. Oymak, “High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws,” in <i>13th International Conference on Learning Representations</i>, Singapore, Singapore, 2025, pp. 2967–3006."},"department":[{"_id":"MaMo"}],"OA_place":"publisher","article_processing_charge":"No","quality_controlled":"1","author":[{"last_name":"Emrullah Ildiz","full_name":"Emrullah Ildiz, M.","first_name":"M."},{"full_name":"Gozeten, Halil Alperen","first_name":"Halil Alperen","last_name":"Gozeten"},{"full_name":"Taga, Ege Onur","first_name":"Ege Onur","last_name":"Taga"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","last_name":"Mondelli"},{"full_name":"Oymak, Samet","first_name":"Samet","last_name":"Oymak"}],"publication_status":"published","type":"conference","acknowledgement":"M.E.I., H.A.G., E.O.T., S.O. are supported by the NSF grants CCF-2046816, CCF-2403075, the Office of Naval Research grant N000142412289, an OpenAI Agentic AI Systems grant, and gifts by Open Philanthropy and Google Research. M. M. is funded by the European Union (ERC, INF2, project number 101161364). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.","has_accepted_license":"1","status":"public"},{"language":[{"iso":"eng"}],"corr_author":"1","OA_type":"diamond","file":[{"date_created":"2025-08-04T08:45:43Z","access_level":"open_access","date_updated":"2025-08-04T08:45:43Z","relation":"main_file","file_size":1337236,"success":1,"file_id":"20114","content_type":"application/pdf","creator":"dernst","checksum":"59c48c173887139647cc9839c0801136","file_name":"2025_ICLR_Jacot.pdf"}],"publication_identifier":{"isbn":["9798331320850"]},"abstract":[{"text":"Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a geometric structure referred to as neural collapse. This empirical evidence has spurred a line of theoretical research aimed at proving the emergence of neural collapse, mostly focusing on the unconstrained features model. Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and puts into question its ability to capture DNN training. Our work addresses the issue, moving away from unconstrained features and\r\nstudying DNNs that end with at least two linear layers. We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of linear layers (for within-class variability collapse), and (ii) bounded conditioning of the features before the linear part (for orthogonality of class-means, and their alignment with weight matrices). The balancedness refers to the fact that W⊤ℓ+1Wℓ+1 ≈ WℓW⊤ℓfor any pair of consecutive weight matrices of the linear part, and the bounded conditioning requires a well-behaved ratio between largest and smallest non-zero singular values of the features. We then show that such assumptions hold for gradient descent training with weight decay: (i) for networks with a wide first layer, we prove low training error and balancedness, and (ii) for solutions that are either nearly optimal or stable under large learning rates, we additionally prove the bounded conditioning. Taken together, our results are the first to show neural collapse in the end-to-end training of DNNs.","lang":"eng"}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"ddc":["000"],"page":"1905-1931","date_created":"2025-07-20T22:02:02Z","date_updated":"2025-08-04T08:47:00Z","publication":"13th International Conference on Learning Representations","date_published":"2025-04-01T00:00:00Z","month":"04","scopus_import":"1","year":"2025","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Wide neural networks trained with weight decay provably exhibit neural collapse","day":"01","arxiv":1,"publisher":"ICLR","external_id":{"arxiv":["2410.04887"]},"file_date_updated":"2025-08-04T08:45:43Z","conference":{"end_date":"2025-04-28","start_date":"2025-04-24","name":"ICLR: International Conference on Learning Representations","location":"Singapore, Singapore"},"_id":"20035","oa":1,"oa_version":"Published Version","department":[{"_id":"MaMo"}],"citation":{"ama":"Jacot A, Súkeník P, Wang Z, Mondelli M. Wide neural networks trained with weight decay provably exhibit neural collapse. In: <i>13th International Conference on Learning Representations</i>. ICLR; 2025:1905-1931.","apa":"Jacot, A., Súkeník, P., Wang, Z., &#38; Mondelli, M. (2025). Wide neural networks trained with weight decay provably exhibit neural collapse. In <i>13th International Conference on Learning Representations</i> (pp. 1905–1931). Singapore, Singapore: ICLR.","short":"A. Jacot, P. Súkeník, Z. Wang, M. Mondelli, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 1905–1931.","chicago":"Jacot, Arthur, Peter Súkeník, Zihan Wang, and Marco Mondelli. “Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse.” In <i>13th International Conference on Learning Representations</i>, 1905–31. ICLR, 2025.","ieee":"A. Jacot, P. Súkeník, Z. Wang, and M. Mondelli, “Wide neural networks trained with weight decay provably exhibit neural collapse,” in <i>13th International Conference on Learning Representations</i>, Singapore, Singapore, 2025, pp. 1905–1931.","mla":"Jacot, Arthur, et al. “Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse.” <i>13th International Conference on Learning Representations</i>, ICLR, 2025, pp. 1905–31.","ista":"Jacot A, Súkeník P, Wang Z, Mondelli M. 2025. Wide neural networks trained with weight decay provably exhibit neural collapse. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 1905–1931."},"project":[{"name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits","_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","grant_number":"101161364"}],"OA_place":"publisher","article_processing_charge":"No","quality_controlled":"1","author":[{"full_name":"Jacot, Arthur","first_name":"Arthur","last_name":"Jacot"},{"last_name":"Súkeník","full_name":"Súkeník, Peter","first_name":"Peter","id":"d64d6a8d-eb8e-11eb-b029-96fd216dec3c"},{"first_name":"Zihan","full_name":"Wang, Zihan","last_name":"Wang"},{"last_name":"Mondelli","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020"}],"publication_status":"published","type":"conference","has_accepted_license":"1","acknowledgement":"M. M. and P. S. are funded by the European Union (ERC, INF2, project number 101161364). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.","status":"public"},{"publication_status":"published","author":[{"orcid":"0000-0002-6465-6258","full_name":"Zhang, Yihan","first_name":"Yihan","id":"2ce5da42-b2ea-11eb-bba5-9f264e9d002c","last_name":"Zhang"},{"full_name":"Ji, Hong Chang","first_name":"Hong Chang","last_name":"Ji"},{"last_name":"Venkataramanan","first_name":"Ramji","full_name":"Venkataramanan, Ramji"},{"full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","last_name":"Mondelli"}],"type":"journal_article","OA_place":"publisher","quality_controlled":"1","article_processing_charge":"No","status":"public","issue":"3-4","has_accepted_license":"1","acknowledgement":"This work was done when Y. Z. and H. C. J. were at the Institute of Science and Technology Austria. Y. Z. thanks Hugo Latourelle-Vigeant for bringing [53] to the authors’ attention.\r\nY. Z. and M. M. are partially supported by the 2019 Lopez-Loreta Prize and by the Interdisciplinary Projects Committee (IPC) at ISTA. H. C. J. is supported by the ERC Advanced Grant “RMTBeyond” No. 101020331.","intvolume":"         8","doi":"10.4171/MSL/52","publisher":"EMS Press","citation":{"ieee":"Y. Zhang, H. C. Ji, R. Venkataramanan, and M. Mondelli, “Spectral estimators for structured generalized linear models via approximate message passing,” <i>Mathematical Statistics and Learning</i>, vol. 8, no. 3–4. EMS Press, pp. 193–304, 2025.","mla":"Zhang, Yihan, et al. “Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing.” <i>Mathematical Statistics and Learning</i>, vol. 8, no. 3–4, EMS Press, 2025, pp. 193–304, doi:<a href=\"https://doi.org/10.4171/MSL/52\">10.4171/MSL/52</a>.","ista":"Zhang Y, Ji HC, Venkataramanan R, Mondelli M. 2025. Spectral estimators for structured generalized linear models via approximate message passing. Mathematical Statistics and Learning. 8(3–4), 193–304.","short":"Y. Zhang, H.C. Ji, R. Venkataramanan, M. Mondelli, Mathematical Statistics and Learning 8 (2025) 193–304.","apa":"Zhang, Y., Ji, H. C., Venkataramanan, R., &#38; Mondelli, M. (2025). Spectral estimators for structured generalized linear models via approximate message passing. <i>Mathematical Statistics and Learning</i>. EMS Press. <a href=\"https://doi.org/10.4171/MSL/52\">https://doi.org/10.4171/MSL/52</a>","ama":"Zhang Y, Ji HC, Venkataramanan R, Mondelli M. Spectral estimators for structured generalized linear models via approximate message passing. <i>Mathematical Statistics and Learning</i>. 2025;8(3-4):193-304. doi:<a href=\"https://doi.org/10.4171/MSL/52\">10.4171/MSL/52</a>","chicago":"Zhang, Yihan, Hong Chang Ji, Ramji Venkataramanan, and Marco Mondelli. “Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing.” <i>Mathematical Statistics and Learning</i>. EMS Press, 2025. <a href=\"https://doi.org/10.4171/MSL/52\">https://doi.org/10.4171/MSL/52</a>."},"PlanS_conform":"1","department":[{"_id":"MaMo"}],"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"file_date_updated":"2025-12-09T13:50:03Z","volume":8,"_id":"20734","oa_version":"Published Version","oa":1,"date_published":"2025-09-02T00:00:00Z","month":"09","ddc":["000"],"page":"193-304","date_created":"2025-12-07T23:02:02Z","date_updated":"2025-12-09T13:53:31Z","publication":"Mathematical Statistics and Learning","title":"Spectral estimators for structured generalized linear models via approximate message passing","day":"02","scopus_import":"1","article_type":"original","year":"2025","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","OA_type":"diamond","file":[{"relation":"main_file","date_updated":"2025-12-09T13:50:03Z","file_size":1379626,"access_level":"open_access","date_created":"2025-12-09T13:50:03Z","file_name":"2025_MathStatLearning_Zhang.pdf","checksum":"55a1bd9c1b6b0198c42504fb94f4ad4c","creator":"dernst","content_type":"application/pdf","success":1,"file_id":"20752"}],"publication_identifier":{"issn":["2520-2316"],"eissn":["2520-2324"]},"abstract":[{"text":"We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d. Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix Σ. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.","lang":"eng"}],"language":[{"iso":"eng"}],"corr_author":"1","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"}},{"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"corr_author":"1","language":[{"iso":"eng"}],"publication_identifier":{"eissn":["1096-603X"],"issn":["1063-5203"]},"abstract":[{"lang":"eng","text":"The identification of the parameters of a neural network from finite samples of input-output pairs is often referred to as the teacher-student model, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP-complete in the worst case, a rapidly growing literature – after adding suitable distributional assumptions – has established finite sample identification of two-layer networks with a number of neurons (math. formula), D being the input dimension. For the range (math. formula) the problem becomes harder, and truly little is known for networks parametrized by biases as well. This paper fills the gap by providing efficient algorithms and rigorous theoretical guarantees of finite sample identification for such wider shallow networks with biases. Our approach is based on a two-step pipeline: first, we recover the direction of the weights, by exploiting second order information; next, we identify the signs by suitable algebraic evaluations, and we recover the biases by empirical risk minimization via gradient descent. Numerical results demonstrate the effectiveness of our approach."}],"file":[{"content_type":"application/pdf","file_id":"20131","success":1,"checksum":"657f258af0f7ca135e69959fd13e2d63","creator":"dernst","file_name":"2025_ApplCompAnalysis_Fornasier.pdf","file_size":2223350,"relation":"main_file","date_updated":"2025-08-05T12:22:04Z","access_level":"open_access","date_created":"2025-08-05T12:22:04Z"}],"OA_type":"hybrid","user_id":"317138e5-6ab7-11ef-aa6d-ffef3953e345","scopus_import":"1","article_type":"original","year":"2025","day":"01","title":"Efficient identification of wide shallow neural networks with biases","publication":"Applied and Computational Harmonic Analysis","date_updated":"2025-09-30T10:35:09Z","date_created":"2025-02-23T23:01:54Z","ddc":["000"],"month":"06","date_published":"2025-06-01T00:00:00Z","oa":1,"_id":"19065","oa_version":"Published Version","volume":77,"file_date_updated":"2025-08-05T12:22:04Z","PlanS_conform":"1","citation":{"chicago":"Fornasier, Massimo, Timo Klock, Marco Mondelli, and Michael Rauchensteiner. “Efficient Identification of Wide Shallow Neural Networks with Biases.” <i>Applied and Computational Harmonic Analysis</i>. Elsevier, 2025. <a href=\"https://doi.org/10.1016/j.acha.2025.101749\">https://doi.org/10.1016/j.acha.2025.101749</a>.","apa":"Fornasier, M., Klock, T., Mondelli, M., &#38; Rauchensteiner, M. (2025). Efficient identification of wide shallow neural networks with biases. <i>Applied and Computational Harmonic Analysis</i>. Elsevier. <a href=\"https://doi.org/10.1016/j.acha.2025.101749\">https://doi.org/10.1016/j.acha.2025.101749</a>","short":"M. Fornasier, T. Klock, M. Mondelli, M. Rauchensteiner, Applied and Computational Harmonic Analysis 77 (2025).","ama":"Fornasier M, Klock T, Mondelli M, Rauchensteiner M. Efficient identification of wide shallow neural networks with biases. <i>Applied and Computational Harmonic Analysis</i>. 2025;77. doi:<a href=\"https://doi.org/10.1016/j.acha.2025.101749\">10.1016/j.acha.2025.101749</a>","ista":"Fornasier M, Klock T, Mondelli M, Rauchensteiner M. 2025. Efficient identification of wide shallow neural networks with biases. Applied and Computational Harmonic Analysis. 77, 101749.","mla":"Fornasier, Massimo, et al. “Efficient Identification of Wide Shallow Neural Networks with Biases.” <i>Applied and Computational Harmonic Analysis</i>, vol. 77, 101749, Elsevier, 2025, doi:<a href=\"https://doi.org/10.1016/j.acha.2025.101749\">10.1016/j.acha.2025.101749</a>.","ieee":"M. Fornasier, T. Klock, M. Mondelli, and M. Rauchensteiner, “Efficient identification of wide shallow neural networks with biases,” <i>Applied and Computational Harmonic Analysis</i>, vol. 77. Elsevier, 2025."},"department":[{"_id":"MaMo"}],"external_id":{"isi":["001430202700001"]},"publisher":"Elsevier","doi":"10.1016/j.acha.2025.101749","intvolume":"        77","isi":1,"status":"public","has_accepted_license":"1","article_processing_charge":"No","quality_controlled":"1","article_number":"101749","OA_place":"publisher","type":"journal_article","author":[{"last_name":"Fornasier","first_name":"Massimo","full_name":"Fornasier, Massimo"},{"first_name":"Timo","full_name":"Klock, Timo","last_name":"Klock"},{"last_name":"Mondelli","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","first_name":"Marco","id":"27EB676C-8706-11E9-9510-7717E6697425"},{"first_name":"Michael","full_name":"Rauchensteiner, Michael","last_name":"Rauchensteiner"}],"publication_status":"published"},{"arxiv":1,"publisher":"ML Research Press","external_id":{"arxiv":["2502.01347"]},"conference":{"end_date":"2025-07-19","start_date":"2025-07-13","location":"Vancouver, Canada","name":"ICML: International Conference on Machine Learning"},"_id":"21324","oa":1,"oa_version":"Published Version","file_date_updated":"2026-02-19T08:04:38Z","volume":267,"citation":{"chicago":"Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.” In <i>Proceedings of the 42nd International Conference on Machine Learning</i>, 267:4839–73. ML Research Press, 2025.","short":"S. Bombari, M. Mondelli, in:, Proceedings of the 42nd International Conference on Machine Learning, ML Research Press, 2025, pp. 4839–4873.","ama":"Bombari S, Mondelli M. Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization. In: <i>Proceedings of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research Press; 2025:4839-4873.","apa":"Bombari, S., &#38; Mondelli, M. (2025). Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization. In <i>Proceedings of the 42nd International Conference on Machine Learning</i> (Vol. 267, pp. 4839–4873). Vancouver, Canada: ML Research Press.","mla":"Bombari, Simone, and Marco Mondelli. “Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and over-Parameterization.” <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol. 267, ML Research Press, 2025, pp. 4839–73.","ista":"Bombari S, Mondelli M. 2025. Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization. Proceedings of the 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 4839–4873.","ieee":"S. Bombari and M. Mondelli, “Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization,” in <i>Proceedings of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada, 2025, vol. 267, pp. 4839–4873."},"department":[{"_id":"MaMo"}],"project":[{"grant_number":"101161364","_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits"},{"name":"Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust LLMs","_id":"92099302-16d5-11f0-9cad-f9a785f54fbd"}],"article_processing_charge":"No","quality_controlled":"1","OA_place":"publisher","type":"conference","publication_status":"published","author":[{"last_name":"Bombari","full_name":"Bombari, Simone","id":"ca726dda-de17-11ea-bc14-f9da834f63aa","first_name":"Simone"},{"last_name":"Mondelli","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","first_name":"Marco","id":"27EB676C-8706-11E9-9510-7717E6697425"}],"intvolume":"       267","status":"public","has_accepted_license":"1","acknowledgement":"Marco Mondelli is funded by the European Union (ERC, INF2, project number 101161364). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. Simone Bombari is supported by a Google PhD fellowship. The authors would like to thank GuanWen Qiu for helpful discussions.","corr_author":"1","language":[{"iso":"eng"}],"alternative_title":["PMLR"],"publication_identifier":{"eissn":["2640-3498"]},"abstract":[{"lang":"eng","text":"Learning models have been shown to rely on spurious correlations between non-predictive features and the associated labels in the training data, with negative implications on robustness, bias and fairness. In this work, we provide a statistical characterization of this phenomenon for high-dimensional regression, when the data contains a predictive core feature x and a spurious feature y. Specifically, we quantify the amount of spurious correlations C learned via linear regression, in terms of the data covariance and the strength λ of the ridge regularization. As a consequence, we first capture the simplicity of y through the spectrum of its covariance, and its correlation with x through the Schur complement of the full data covariance. Next, we prove a trade-off between C and the in-distribution test loss L, by showing that the value of λ that minimizes L lies in an interval where C is increasing. Finally, we investigate the effects of over-parameterization via the random features model, by showing its equivalence to regularized linear regression. Our theoretical results are supported by numerical experiments on Gaussian, Color-MNIST, and CIFAR-10 datasets."}],"OA_type":"gold","file":[{"date_created":"2026-02-19T08:04:38Z","access_level":"open_access","relation":"main_file","file_size":887526,"date_updated":"2026-02-19T08:04:38Z","success":1,"file_id":"21335","content_type":"application/pdf","file_name":"2025_ICML_Bombari.pdf","creator":"dernst","checksum":"d4ba4f7717b362ca38878f45e57bd643"}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"date_created":"2026-02-18T11:58:00Z","publication":"Proceedings of the 42nd International Conference on Machine Learning","date_updated":"2026-02-19T08:08:55Z","ddc":["000"],"page":"4839-4873","month":"07","date_published":"2025-07-30T00:00:00Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2025","day":"30","title":"Spurious correlations in high dimensional regression: The roles of regularization, simplicity bias and over-parameterization"},{"external_id":{"pmid":["41321376"]},"publisher":"ML Research Press","project":[{"_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","grant_number":"101161364","name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits"}],"department":[{"_id":"MaMo"}],"citation":{"mla":"Gozeten, Halil Alperen, et al. “Test-Time Training Provably Improves Transformers as in-Context Learners.” <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol. 267, ML Research Press, 2025, pp. 20266–95.","ista":"Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. 2025. Test-time training provably improves transformers as in-context learners. Proceedings of the 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 20266–20295.","ieee":"H. A. Gozeten, M. E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, and S. Oymak, “Test-time training provably improves transformers as in-context learners,” in <i>Proceedings of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada, 2025, vol. 267, pp. 20266–20295.","chicago":"Gozeten, Halil Alperen, Muhammed Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, and Samet Oymak. “Test-Time Training Provably Improves Transformers as in-Context Learners.” In <i>Proceedings of the 42nd International Conference on Machine Learning</i>, 267:20266–95. ML Research Press, 2025.","short":"H.A. Gozeten, M.E. Ildiz, X. Zhang, M. Soltanolkotabi, M. Mondelli, S. Oymak, in:, Proceedings of the 42nd International Conference on Machine Learning, ML Research Press, 2025, pp. 20266–20295.","ama":"Gozeten HA, Ildiz ME, Zhang X, Soltanolkotabi M, Mondelli M, Oymak S. Test-time training provably improves transformers as in-context learners. In: <i>Proceedings of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research Press; 2025:20266-20295.","apa":"Gozeten, H. A., Ildiz, M. E., Zhang, X., Soltanolkotabi, M., Mondelli, M., &#38; Oymak, S. (2025). Test-time training provably improves transformers as in-context learners. In <i>Proceedings of the 42nd International Conference on Machine Learning</i> (Vol. 267, pp. 20266–20295). Vancouver, Canada: ML Research Press."},"volume":267,"file_date_updated":"2026-02-19T08:15:48Z","_id":"21325","oa_version":"Published Version","oa":1,"conference":{"start_date":"2025-07-13","location":"Vancouver, Canada","name":"ICML: International Conference on Machine Learning","end_date":"2025-07-19"},"publication_status":"published","author":[{"full_name":"Gozeten, Halil Alperen","first_name":"Halil Alperen","last_name":"Gozeten"},{"first_name":"Muhammed Emrullah","full_name":"Ildiz, Muhammed Emrullah","last_name":"Ildiz"},{"last_name":"Zhang","first_name":"Xuechen","full_name":"Zhang, Xuechen"},{"first_name":"Mahdi","full_name":"Soltanolkotabi, Mahdi","last_name":"Soltanolkotabi"},{"last_name":"Mondelli","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco"},{"last_name":"Oymak","full_name":"Oymak, Samet","first_name":"Samet"}],"type":"conference","OA_place":"publisher","pmid":1,"article_processing_charge":"No","quality_controlled":"1","has_accepted_license":"1","acknowledgement":"H.A.G., M.E.I., X.Z., and S.O. were supported in part by the NSF grants CCF2046816, CCF-2403075, CCF-2008020, and the Office of Naval Research grant N000142412289.\r\nM. M. is funded by the European Union (ERC, INF2 , project number 101161364). Views and opinions expressed are, however, those of the author(s) only and do not necessarily\r\nreflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. M.S. is supported by the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship in Mathematics, an NSF-CAREER under award #1846369, DARPA FastNICS program, and NSF-CIF awards #1813877 and #2008443, and NIH DP2LM014564-01. The authors also\r\nacknowledge further support from Open Philanthropy, OpenAI, Amazon Research, Google Research, and Microsoft Research.","status":"public","intvolume":"       267","file":[{"file_id":"21336","success":1,"content_type":"application/pdf","file_name":"2025_ICML_Gozeten.pdf","creator":"dernst","checksum":"f774f8619a0d72f3975d9cb23942a1e9","access_level":"open_access","date_created":"2026-02-19T08:15:48Z","relation":"main_file","date_updated":"2026-02-19T08:15:48Z","file_size":471176}],"OA_type":"gold","publication_identifier":{"eissn":["2640-3498"]},"abstract":[{"lang":"eng","text":"Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost."}],"language":[{"iso":"eng"}],"alternative_title":["PMLR"],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"date_published":"2025-11-30T00:00:00Z","month":"11","page":"20266-20295","ddc":["000"],"publication":"Proceedings of the 42nd International Conference on Machine Learning","date_updated":"2026-02-19T08:18:24Z","date_created":"2026-02-18T12:00:44Z","title":"Test-time training provably improves transformers as in-context learners","day":"30","year":"2025","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87"},{"quality_controlled":"1","article_processing_charge":"No","OA_place":"publisher","type":"conference","publication_status":"published","author":[{"full_name":"Wu, Diyuan","first_name":"Diyuan","id":"1a5914c2-896a-11ed-bdf8-fb80621a0635","last_name":"Wu"},{"last_name":"Mondelli","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco"}],"intvolume":"       267","status":"public","has_accepted_license":"1","acknowledgement":"This research was funded in whole or in part by the Austrian Science Fund (FWF) 10.55776/COE12. For the purpose of open access, the authors have applied a CC BY public\r\ncopyright license to any Author Accepted Manuscript version arising from this submission. The authors would like to thank Peter Sukenık for general helpful discussions and for pointing out that all the stationary points are approximately proportional in the case without entropic regularization. ","arxiv":1,"publisher":"ML Research Press","external_id":{"arxiv":["2501.19104"]},"conference":{"end_date":"2025-07-19","location":"Vancouver, Canada","name":"ICML: International Conference on Machine Learning","start_date":"2025-07-13"},"_id":"21326","oa":1,"oa_version":"Published Version","volume":267,"file_date_updated":"2026-02-19T08:28:22Z","citation":{"apa":"Wu, D., &#38; Mondelli, M. (2025). Neural collapse beyond the unconstrained features model: Landscape, dynamics, and generalization in the mean-field regime. In <i>Proceedings of the 42nd International Conference on Machine Learning</i> (Vol. 267, pp. 67499–67536). Vancouver, Canada: ML Research Press.","short":"D. Wu, M. Mondelli, in:, Proceedings of the 42nd International Conference on Machine Learning, ML Research Press, 2025, pp. 67499–67536.","ama":"Wu D, Mondelli M. Neural collapse beyond the unconstrained features model: Landscape, dynamics, and generalization in the mean-field regime. In: <i>Proceedings of the 42nd International Conference on Machine Learning</i>. Vol 267. ML Research Press; 2025:67499-67536.","chicago":"Wu, Diyuan, and Marco Mondelli. “Neural Collapse beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime.” In <i>Proceedings of the 42nd International Conference on Machine Learning</i>, 267:67499–536. ML Research Press, 2025.","ieee":"D. Wu and M. Mondelli, “Neural collapse beyond the unconstrained features model: Landscape, dynamics, and generalization in the mean-field regime,” in <i>Proceedings of the 42nd International Conference on Machine Learning</i>, Vancouver, Canada, 2025, vol. 267, pp. 67499–67536.","ista":"Wu D, Mondelli M. 2025. Neural collapse beyond the unconstrained features model: Landscape, dynamics, and generalization in the mean-field regime. Proceedings of the 42nd International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 267, 67499–67536.","mla":"Wu, Diyuan, and Marco Mondelli. “Neural Collapse beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime.” <i>Proceedings of the 42nd International Conference on Machine Learning</i>, vol. 267, ML Research Press, 2025, pp. 67499–536."},"department":[{"_id":"MaMo"}],"date_created":"2026-02-18T12:02:45Z","date_updated":"2026-02-19T08:30:42Z","publication":"Proceedings of the 42nd International Conference on Machine Learning","ddc":["000"],"page":"67499-67536","month":"07","date_published":"2025-07-30T00:00:00Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2025","day":"30","title":"Neural collapse beyond the unconstrained features model: Landscape, dynamics, and generalization in the mean-field regime","corr_author":"1","alternative_title":["PMLR"],"language":[{"iso":"eng"}],"publication_identifier":{"eissn":["2640-3498"]},"abstract":[{"lang":"eng","text":"Neural Collapse is a phenomenon where the last-layer representations of a well-trained neural network converge to a highly structured geometry. In this paper, we focus on its first (and most basic) property, known as NC1: the within-class variability vanishes. While prior theoretical studies establish the occurrence of NC1 via the data-agnostic unconstrained features model, our work adopts a data-specific perspective, analyzing NC1 in a three-layer neural network, with the first two layers operating in the mean-field regime and followed by a linear layer. In particular, we establish a fundamental connection between NC1 and the loss landscape: we prove that points with small empirical loss and gradient norm (thus, close to being stationary) approximately satisfy NC1, and the closeness to NC1 is controlled by the residual loss and gradient norm. We then show that (i) gradient flow on the mean squared error converges to NC1 solutions with small empirical loss, and (ii) for well-separated data distributions, both NC1 and vanishing test loss are achieved simultaneously. This aligns with the empirical observation that NC1 emerges during training while models attain near-zero test error. Overall, our results demonstrate that NC1 arises from gradient training due to the properties of the loss landscape, and they show the co-occurrence of NC1 and small test error for certain data distributions."}],"OA_type":"gold","file":[{"content_type":"application/pdf","success":1,"file_id":"21337","checksum":"c5ce8b1c83e33dc3a11122f4910deb67","creator":"dernst","file_name":"2025_ICML_Wu.pdf","relation":"main_file","date_updated":"2026-02-19T08:28:22Z","file_size":3994385,"date_created":"2026-02-19T08:28:22Z","access_level":"open_access"}],"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"}},{"OA_place":"publisher","quality_controlled":"1","article_processing_charge":"No","publication_status":"published","author":[{"last_name":"Kovačević","id":"d0258e7b-50b8-11ef-ad56-8b9f537b6b1b","first_name":"Filip","full_name":"Kovačević, Filip"},{"last_name":"Yihan","first_name":"Zhang","full_name":"Yihan, Zhang"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","last_name":"Mondelli"}],"type":"conference","intvolume":"       291","status":"public","acknowledgement":"This work was done when Y. Z. was at the Institute of Science and Technology Austria. Y. Z. and\r\nM. M. are funded by the European Union (ERC, INF2, project number 101161364). Views and\r\nopinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. The authors would like to acknowledge (in alphabetical order) discussions with Yatin Dandi, Leonardo Defilippis and Bruno Loureiro concerning their parallel work (Defilippis et al., 2025).","has_accepted_license":"1","arxiv":1,"external_id":{"arxiv":["2502.01583"]},"publisher":"ML Research Press","file_date_updated":"2026-02-19T09:03:43Z","volume":291,"oa":1,"_id":"21328","oa_version":"Published Version","conference":{"start_date":"2025-06-30","location":"Lyon, France","name":"COLT: Conference on Learning Theory","end_date":"2025-07-04"},"project":[{"name":"Inference in High Dimensions: Light-speed Algorithms and Information Limits","_id":"911e6d1f-16d5-11f0-9cad-c5c68c6a1cdf","grant_number":"101161364"}],"department":[{"_id":"MaMo"}],"citation":{"mla":"Kovačević, Filip, et al. “Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery.” <i>Proceedings of 38th Conference on Learning Theory</i>, vol. 291, ML Research Press, 2025, pp. 3354–404.","ista":"Kovačević F, Yihan Z, Mondelli M. 2025. Spectral estimators for multi-index models: Precise asymptotics and optimal weak recovery. Proceedings of 38th Conference on Learning Theory. COLT: Conference on Learning Theory, PMLR, vol. 291, 3354–3404.","ieee":"F. Kovačević, Z. Yihan, and M. Mondelli, “Spectral estimators for multi-index models: Precise asymptotics and optimal weak recovery,” in <i>Proceedings of 38th Conference on Learning Theory</i>, Lyon, France, 2025, vol. 291, pp. 3354–3404.","chicago":"Kovačević, Filip, Zhang Yihan, and Marco Mondelli. “Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery.” In <i>Proceedings of 38th Conference on Learning Theory</i>, 291:3354–3404. ML Research Press, 2025.","short":"F. Kovačević, Z. Yihan, M. Mondelli, in:, Proceedings of 38th Conference on Learning Theory, ML Research Press, 2025, pp. 3354–3404.","ama":"Kovačević F, Yihan Z, Mondelli M. Spectral estimators for multi-index models: Precise asymptotics and optimal weak recovery. In: <i>Proceedings of 38th Conference on Learning Theory</i>. Vol 291. ML Research Press; 2025:3354-3404.","apa":"Kovačević, F., Yihan, Z., &#38; Mondelli, M. (2025). Spectral estimators for multi-index models: Precise asymptotics and optimal weak recovery. In <i>Proceedings of 38th Conference on Learning Theory</i> (Vol. 291, pp. 3354–3404). Lyon, France: ML Research Press."},"page":"3354-3404","ddc":["000"],"date_updated":"2026-02-19T09:03:53Z","publication":"Proceedings of 38th Conference on Learning Theory","date_created":"2026-02-18T12:12:47Z","date_published":"2025-07-01T00:00:00Z","month":"07","year":"2025","scopus_import":"1","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Spectral estimators for multi-index models: Precise asymptotics and optimal weak recovery","day":"01","alternative_title":["PMLR"],"language":[{"iso":"eng"}],"corr_author":"1","file":[{"checksum":"19aa70ab4f57fb9067b6ebb99a5fd6f0","creator":"dernst","file_name":"2025_LearningTheory_Kovacevic.pdf","content_type":"application/pdf","file_id":"21339","success":1,"file_size":844611,"date_updated":"2026-02-19T09:03:43Z","relation":"main_file","date_created":"2026-02-19T09:03:43Z","access_level":"open_access"}],"OA_type":"gold","abstract":[{"text":"Multi-index models provide a popular framework to investigate the learnability of functions with low-dimensional structure and, also due to their connections with neural networks, they have been object of recent intensive study. In this paper, we focus on recovering the subspace spanned by the signals via spectral estimators – a family of methods routinely used in practice, often as a warm-start for iterative algorithms. Our main technical contribution is a precise asymptotic characterization of the performance of spectral methods, when sample size and input dimension grow proportionally and the dimension p of the space to recover is fixed. Specifically, we locate the top-p eigenvalues of the spectral matrix and establish the overlaps between the corresponding eigenvectors (which give the spectral estimators) and a basis of the signal subspace. Our analysis unveils a phase transition phenomenon in which, as the sample complexity grows, eigenvalues escape from the bulk of the spectrum and, when that happens, eigenvectors recover directions of the desired subspace. The precise characterization we put forward enables the optimization of the data preprocessing, thus allowing to identify the spectral estimator that requires the minimal sample size for weak recovery.","lang":"eng"}],"publication_identifier":{"eissn":["2640-3498"]},"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"}},{"quality_controlled":"1","article_processing_charge":"Yes","article_number":"013081","OA_place":"publisher","related_material":{"link":[{"url":"https://github.com/xu-yz19/spiked-matrix-models-with-structured-noise","relation":"software"}]},"DOAJ_listed":"1","type":"journal_article","publication_status":"published","author":[{"last_name":"Barbier","first_name":"Jean","full_name":"Barbier, Jean"},{"last_name":"Camilli","full_name":"Camilli, Francesco","first_name":"Francesco"},{"full_name":"Xu, Yizhou","first_name":"Yizhou","last_name":"Xu"},{"last_name":"Mondelli","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","first_name":"Marco","id":"27EB676C-8706-11E9-9510-7717E6697425"}],"doi":"10.1103/PhysRevResearch.7.013081","intvolume":"         7","APC_amount":"3272,21 EUR","has_accepted_license":"1","acknowledgement":"J.B., F.C., and Y.X. were funded by the European Union (ERC, CHORAL, Project No. 101039794). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. M.M. was supported by the 2019 Lopez-Loreta Prize. J.B. acknowledges discussions with TianQi Hou at the initial stage of the project, as well as with Antoine Bodin.","status":"public","arxiv":1,"external_id":{"arxiv":["2405.20993"]},"publisher":"American Physical Society","oa":1,"_id":"18986","oa_version":"Published Version","volume":7,"file_date_updated":"2025-02-03T08:27:59Z","project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"citation":{"ieee":"J. Barbier, F. Camilli, Y. Xu, and M. Mondelli, “Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise,” <i>Physical Review Research</i>, vol. 7. American Physical Society, 2025.","ista":"Barbier J, Camilli F, Xu Y, Mondelli M. 2025. Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise. Physical Review Research. 7, 013081.","mla":"Barbier, Jean, et al. “Information Limits and Thouless-Anderson-Palmer Equations for Spiked Matrix Models with Structured Noise.” <i>Physical Review Research</i>, vol. 7, 013081, American Physical Society, 2025, doi:<a href=\"https://doi.org/10.1103/PhysRevResearch.7.013081\">10.1103/PhysRevResearch.7.013081</a>.","apa":"Barbier, J., Camilli, F., Xu, Y., &#38; Mondelli, M. (2025). Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise. <i>Physical Review Research</i>. American Physical Society. <a href=\"https://doi.org/10.1103/PhysRevResearch.7.013081\">https://doi.org/10.1103/PhysRevResearch.7.013081</a>","ama":"Barbier J, Camilli F, Xu Y, Mondelli M. Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise. <i>Physical Review Research</i>. 2025;7. doi:<a href=\"https://doi.org/10.1103/PhysRevResearch.7.013081\">10.1103/PhysRevResearch.7.013081</a>","short":"J. Barbier, F. Camilli, Y. Xu, M. Mondelli, Physical Review Research 7 (2025).","chicago":"Barbier, Jean, Francesco Camilli, Yizhou Xu, and Marco Mondelli. “Information Limits and Thouless-Anderson-Palmer Equations for Spiked Matrix Models with Structured Noise.” <i>Physical Review Research</i>. American Physical Society, 2025. <a href=\"https://doi.org/10.1103/PhysRevResearch.7.013081\">https://doi.org/10.1103/PhysRevResearch.7.013081</a>."},"department":[{"_id":"MaMo"}],"date_updated":"2026-05-06T12:57:36Z","publication":"Physical Review Research","date_created":"2025-02-02T23:01:54Z","ddc":["530"],"month":"01","date_published":"2025-01-22T00:00:00Z","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","scopus_import":"1","article_type":"original","year":"2025","day":"22","title":"Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise","language":[{"iso":"eng"}],"abstract":[{"text":"We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still remains challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide suboptimal algorithms or are limited to a special class of noise ensembles. In this paper, using tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals) we establish the characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. Remarkably, our analysis unveils the asymptotic equivalence between the rotationally invariant model and a surrogate Gaussian one. Finally, we show how to saturate the predicted statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations.","lang":"eng"}],"publication_identifier":{"issn":["2643-1564"]},"file":[{"success":1,"file_id":"18988","content_type":"application/pdf","file_name":"2025_PhysReviewResearch_Barbier.pdf","creator":"dernst","checksum":"52c5f72d80ffc928542469114fcdb62b","date_created":"2025-02-03T08:27:59Z","access_level":"open_access","relation":"main_file","date_updated":"2025-02-03T08:27:59Z","file_size":702543}],"OA_type":"gold","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"}},{"tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"publication_identifier":{"issn":["0027-8424"],"eissn":["1091-6490"]},"abstract":[{"text":"Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data. In the last decade, the problem of understanding its performance cost with respect to standard GD has received remarkable attention from the research community, which formally derived upper bounds on the excess population risk  RP  in different learning settings. However, existing bounds typically degrade with over-parameterization, i.e., as the number of parameters  p  gets larger than the number of training samples  n  -- a regime which is ubiquitous in current deep-learning practice. As a result, the lack of theoretical insights leaves practitioners without clear guidance, leading some to reduce the effective number of trainable parameters to improve performance, while others use larger models to achieve better results through scale. In this work, we show that in the popular random features model with quadratic loss, for any sufficiently large  p , privacy can be obtained for free, i.e.,  |RP|=o(1) , not only when the privacy parameter  ε  has constant order, but also in the strongly private setting  ε=o(1) . This challenges the common wisdom that over-parameterization inherently hinders performance in private learning.","lang":"eng"}],"file":[{"date_created":"2025-05-05T07:27:54Z","access_level":"open_access","relation":"main_file","date_updated":"2025-05-05T07:27:54Z","file_size":2328320,"file_name":"2025_PNAS_Bombari.pdf","creator":"dernst","checksum":"1ac6f78e368d35a0cafb4d2d9bd63443","file_id":"19648","success":1,"content_type":"application/pdf"}],"OA_type":"hybrid","corr_author":"1","language":[{"iso":"eng"}],"day":"15","title":"Privacy for free in the overparameterized regime","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","year":"2025","scopus_import":"1","article_type":"original","month":"04","date_published":"2025-04-15T00:00:00Z","publication":"Proceedings of the National Academy of Sciences","date_updated":"2026-05-20T08:23:19Z","date_created":"2025-04-27T22:02:13Z","ddc":["000"],"project":[{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"},{"_id":"92099302-16d5-11f0-9cad-f9a785f54fbd","name":"Trustworthy Deep Learning Theory: Private Over-Parameterized Models and Robust LLMs"}],"department":[{"_id":"MaMo"}],"citation":{"mla":"Bombari, Simone, and Marco Mondelli. “Privacy for Free in the Overparameterized Regime.” <i>Proceedings of the National Academy of Sciences</i>, vol. 122, no. 15, e2423072122, National Academy of Sciences, 2025, doi:<a href=\"https://doi.org/10.1073/pnas.2423072122\">10.1073/pnas.2423072122</a>.","ista":"Bombari S, Mondelli M. 2025. Privacy for free in the overparameterized regime. Proceedings of the National Academy of Sciences. 122(15), e2423072122.","ieee":"S. Bombari and M. Mondelli, “Privacy for free in the overparameterized regime,” <i>Proceedings of the National Academy of Sciences</i>, vol. 122, no. 15. National Academy of Sciences, 2025.","chicago":"Bombari, Simone, and Marco Mondelli. “Privacy for Free in the Overparameterized Regime.” <i>Proceedings of the National Academy of Sciences</i>. National Academy of Sciences, 2025. <a href=\"https://doi.org/10.1073/pnas.2423072122\">https://doi.org/10.1073/pnas.2423072122</a>.","short":"S. Bombari, M. Mondelli, Proceedings of the National Academy of Sciences 122 (2025).","apa":"Bombari, S., &#38; Mondelli, M. (2025). Privacy for free in the overparameterized regime. <i>Proceedings of the National Academy of Sciences</i>. National Academy of Sciences. <a href=\"https://doi.org/10.1073/pnas.2423072122\">https://doi.org/10.1073/pnas.2423072122</a>","ama":"Bombari S, Mondelli M. Privacy for free in the overparameterized regime. <i>Proceedings of the National Academy of Sciences</i>. 2025;122(15). doi:<a href=\"https://doi.org/10.1073/pnas.2423072122\">10.1073/pnas.2423072122</a>"},"_id":"19627","oa_version":"Published Version","oa":1,"file_date_updated":"2025-05-05T07:27:54Z","volume":122,"external_id":{"pmid":["40215275"],"arxiv":["2410.14787"],"isi":["001471214000001"]},"publisher":"National Academy of Sciences","arxiv":1,"status":"public","issue":"15","acknowledgement":"This research was funded in whole, or in part, by the Austrian Science Fund (FWF) Grant number COE 12. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. The authors were also supported by the 2019 Lopez-Loreta prize, and Simone Bombari was supported by a Google PhD fellowship. We thank Diyuan Wu, Edwige Cyffers, Francesco Pedrotti, Inbar Seroussi, Nikita P. Kalinin, Pietro Pelliconi, Roodabeh Safavi, Yizhe Zhu, and Zhichao Wang for helpful discussions.","has_accepted_license":"1","doi":"10.1073/pnas.2423072122","intvolume":"       122","APC_amount":"2754,32 EUR","isi":1,"type":"journal_article","publication_status":"published","author":[{"last_name":"Bombari","id":"ca726dda-de17-11ea-bc14-f9da834f63aa","first_name":"Simone","full_name":"Bombari, Simone"},{"full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","first_name":"Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","last_name":"Mondelli"}],"pmid":1,"article_processing_charge":"Yes (in subscription journal)","quality_controlled":"1","article_number":"e2423072122","OA_place":"publisher"},{"acknowledgement":"We acknowledge support from the National Science Foundation (NSF) and the Simons Foundation for the Collaboration on the Theoretical Foundations of Deep Learning through awards DMS-2031883 and #814639 as well as the TILOS institute (NSF CCF-2112665). This work used the programs (1) XSEDE (Extreme science and engineering discovery environment) which is supported by NSF grant numbers ACI-1548562, and (2) ACCESS (Advanced cyberinfrastructure coordination ecosystem: services & support) which is supported by NSF grants numbers #2138259, #2138286, #2138307, #2137603, and #2138296. Specifically, we used the resources from SDSC Expanse GPU compute nodes, and NCSA Delta system, via allocations TG-CIS220009. Marco Mondelli is supported by the 2019 Lopez-Loreta prize. We also acknowledge useful feedback from anonymous reviewers. ","status":"public","intvolume":"        37","publication_status":"published","author":[{"first_name":"Daniel","full_name":"Beaglehole, Daniel","last_name":"Beaglehole"},{"id":"d64d6a8d-eb8e-11eb-b029-96fd216dec3c","first_name":"Peter","full_name":"Súkeník, Peter","last_name":"Súkeník"},{"full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","last_name":"Mondelli"},{"full_name":"Belkin, Mikhail","first_name":"Mikhail","last_name":"Belkin"}],"type":"conference","OA_place":"repository","article_processing_charge":"No","quality_controlled":"1","department":[{"_id":"GradSch"},{"_id":"MaMo"}],"citation":{"chicago":"Beaglehole, Daniel, Peter Súkeník, Marco Mondelli, and Mikhail Belkin. “Average Gradient Outer Product as a Mechanism for Deep Neural Collapse.” In <i>38th Annual Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation, 2024.","apa":"Beaglehole, D., Súkeník, P., Mondelli, M., &#38; Belkin, M. (2024). Average gradient outer product as a mechanism for deep neural collapse. In <i>38th Annual Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing Systems Foundation.","ama":"Beaglehole D, Súkeník P, Mondelli M, Belkin M. Average gradient outer product as a mechanism for deep neural collapse. In: <i>38th Annual Conference on Neural Information Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation; 2024.","short":"D. Beaglehole, P. Súkeník, M. Mondelli, M. Belkin, in:, 38th Annual Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2024.","mla":"Beaglehole, Daniel, et al. “Average Gradient Outer Product as a Mechanism for Deep Neural Collapse.” <i>38th Annual Conference on Neural Information Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024.","ista":"Beaglehole D, Súkeník P, Mondelli M, Belkin M. 2024. Average gradient outer product as a mechanism for deep neural collapse. 38th Annual Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 37.","ieee":"D. Beaglehole, P. Súkeník, M. Mondelli, and M. Belkin, “Average gradient outer product as a mechanism for deep neural collapse,” in <i>38th Annual Conference on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37."},"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"volume":37,"conference":{"start_date":"2024-12-16","name":"NeurIPS: Neural Information Processing Systems","location":"Vancouver, Canada","end_date":"2024-12-16"},"oa":1,"_id":"18890","oa_version":"Preprint","publisher":"Neural Information Processing Systems Foundation","external_id":{"arxiv":["2402.13728"]},"arxiv":1,"title":"Average gradient outer product as a mechanism for deep neural collapse","day":"01","year":"2024","scopus_import":"1","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","date_published":"2024-12-01T00:00:00Z","month":"12","date_created":"2025-01-27T11:11:40Z","publication":"38th Annual Conference on Neural Information Processing Systems","date_updated":"2025-05-14T11:29:45Z","OA_type":"green","main_file_link":[{"open_access":"1","url":"https://openreview.net/forum?id=lJ1jdl2K9k"}],"publication_identifier":{"eissn":["1049-5258"]},"abstract":[{"text":"Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. The Deep Recursive Feature Machine (Deep RFM) is a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard settings as a consequence of the projection with the AGOP matrix computed at each layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting and as a result of kernel learning. We then provide evidence that this mechanism holds for neural networks more generally. In particular, we show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.","lang":"eng"}],"alternative_title":["Advances in Neural Information Processing Systems"],"language":[{"iso":"eng"}],"corr_author":"1"},{"file_date_updated":"2025-02-04T08:11:25Z","volume":37,"conference":{"end_date":"2024-12-16","start_date":"2024-12-16","name":"NeurIPS: Neural Information Processing Systems","location":"Vancouver, Canada"},"_id":"18891","oa_version":"Published Version","oa":1,"citation":{"short":"P. Súkeník, C. Lampert, M. Mondelli, in:, 38th Annual Conference on Neural Information Processing Systems, Neural Information Processing Systems Foundation, 2024.","apa":"Súkeník, P., Lampert, C., &#38; Mondelli, M. (2024). Neural collapse versus low-rank bias: Is deep neural collapse really optimal? In <i>38th Annual Conference on Neural Information Processing Systems</i> (Vol. 37). Vancouver, Canada: Neural Information Processing Systems Foundation.","ama":"Súkeník P, Lampert C, Mondelli M. Neural collapse versus low-rank bias: Is deep neural collapse really optimal? In: <i>38th Annual Conference on Neural Information Processing Systems</i>. Vol 37. Neural Information Processing Systems Foundation; 2024.","chicago":"Súkeník, Peter, Christoph Lampert, and Marco Mondelli. “Neural Collapse versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal?” In <i>38th Annual Conference on Neural Information Processing Systems</i>, Vol. 37. Neural Information Processing Systems Foundation, 2024.","ieee":"P. Súkeník, C. Lampert, and M. Mondelli, “Neural collapse versus low-rank bias: Is deep neural collapse really optimal?,” in <i>38th Annual Conference on Neural Information Processing Systems</i>, Vancouver, Canada, 2024, vol. 37.","ista":"Súkeník P, Lampert C, Mondelli M. 2024. Neural collapse versus low-rank bias: Is deep neural collapse really optimal? 38th Annual Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, Advances in Neural Information Processing Systems, vol. 37.","mla":"Súkeník, Peter, et al. “Neural Collapse versus Low-Rank Bias: Is Deep Neural Collapse Really Optimal?” <i>38th Annual Conference on Neural Information Processing Systems</i>, vol. 37, Neural Information Processing Systems Foundation, 2024."},"department":[{"_id":"GradSch"},{"_id":"MaMo"},{"_id":"ChLa"}],"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"arxiv":1,"publisher":"Neural Information Processing Systems Foundation","external_id":{"arxiv":["2405.14468"]},"acknowledged_ssus":[{"_id":"ScienComp"}],"intvolume":"        37","acknowledgement":"Marco Mondelli is partially supported by the 2019 Lopez-Loreta prize. This research was supported by the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp).","has_accepted_license":"1","status":"public","OA_place":"publisher","quality_controlled":"1","article_processing_charge":"No","publication_status":"published","author":[{"id":"d64d6a8d-eb8e-11eb-b029-96fd216dec3c","first_name":"Peter","full_name":"Súkeník, Peter","last_name":"Súkeník"},{"first_name":"Christoph","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-8622-7887","full_name":"Lampert, Christoph","last_name":"Lampert"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","last_name":"Mondelli"}],"type":"conference","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"language":[{"iso":"eng"}],"alternative_title":["Advances in Neural Information Processing Systems"],"corr_author":"1","OA_type":"gold","file":[{"access_level":"open_access","date_created":"2025-02-04T08:11:25Z","date_updated":"2025-02-04T08:11:25Z","file_size":1784118,"relation":"main_file","file_name":"2024_NeurIPS_Sukenik.pdf","creator":"dernst","checksum":"b7b79f1ea3ac1e9e11b3d91faaeb0780","file_id":"18989","success":1,"content_type":"application/pdf"}],"abstract":[{"lang":"eng","text":"Deep neural networks (DNNs) exhibit a surprising structure in their final layer\r\nknown as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs – a phenomenon\r\ncalled deep neural collapse (DNC). However, existing theoretical results are restricted to special cases: linear models, only two layers or binary classification.\r\nIn contrast, we focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift. As soon as we go beyond two\r\nlayers or two classes, DNC stops being optimal for the deep unconstrained features\r\nmodel (DUFM) – the standard theoretical framework for the analysis of collapse.\r\nThe main culprit is a low-rank bias of multi-layer regularization schemes: this bias\r\nleads to optimal solutions of even lower rank than the neural collapse. We support\r\nour theoretical findings with experiments on both DUFM and real data, which show\r\nthe emergence of the low-rank structure in the solution found by gradient descent."}],"year":"2024","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Neural collapse versus low-rank bias: Is deep neural collapse really optimal?","day":"01","ddc":["000"],"date_created":"2025-01-27T11:15:18Z","date_updated":"2025-06-04T07:19:21Z","publication":"38th Annual Conference on Neural Information Processing Systems","date_published":"2024-12-01T00:00:00Z","month":"12"},{"scopus_import":"1","year":"2024","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Improved convergence of score-based diffusion models via prediction-correction","day":"01","ddc":["000"],"publication":"Transactions on Machine Learning Research","date_updated":"2025-04-15T08:31:35Z","date_created":"2025-01-27T12:18:05Z","date_published":"2024-06-01T00:00:00Z","month":"06","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"language":[{"iso":"eng"}],"alternative_title":["TMLR"],"corr_author":"1","file":[{"date_created":"2025-01-27T12:19:44Z","access_level":"open_access","relation":"main_file","date_updated":"2025-01-27T12:19:44Z","file_size":780315,"file_name":"2024_TMLR_Pedrotti.pdf","creator":"dernst","checksum":"76a1fd5afd8ee6f7ae0e5912d7dbf6b4","file_id":"18898","success":1,"content_type":"application/pdf"}],"OA_type":"gold","publication_identifier":{"issn":["2835-8856"]},"abstract":[{"text":"Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time T1 by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis paradigm requires T1→∞. This is however problematic: from a theoretical viewpoint, for a given precision of the score approximation, the convergence guarantee fails as T1 diverges; from a practical viewpoint, a large T1 increases computational costs and leads to error propagation. This paper addresses the issue by considering a version of the popular predictor-corrector scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time T1. Our bounds exhibit a mild logarithmic dependence on the input dimension and the subgaussian norm of the target distribution, have minimal assumptions on the data, and require only to control the L2 loss on the score approximation, which is the quantity minimized in practice.","lang":"eng"}],"acknowledgement":"Francesco Pedrotti and Jan Maas acknowledge support by the Austrian Science Fund (FWF) project 10.55776/F65. Marco Mondelli acknowledges support by the 2019 Lopez-Loreta prize.\r\n","status":"public","has_accepted_license":"1","related_material":{"record":[{"id":"17350","status":"public","relation":"earlier_version"}]},"OA_place":"publisher","quality_controlled":"1","article_processing_charge":"No","author":[{"last_name":"Pedrotti","id":"d3ac8ac6-dc8d-11ea-abe3-e2a9628c4c3c","first_name":"Francesco","full_name":"Pedrotti, Francesco"},{"last_name":"Maas","orcid":"0000-0002-0845-1338","full_name":"Maas, Jan","first_name":"Jan","id":"4C5696CE-F248-11E8-B48F-1D18A9856A87"},{"last_name":"Mondelli","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020"}],"publication_status":"published","type":"conference","file_date_updated":"2025-01-27T12:19:44Z","_id":"18897","oa_version":"Published Version","oa":1,"project":[{"name":"Taming Complexity in Partial Differential Systems","grant_number":"F6504","_id":"fc31cba2-9c52-11eb-aca3-ff467d239cd2"},{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"}],"citation":{"short":"F. Pedrotti, J. Maas, M. Mondelli, in:, Transactions on Machine Learning Research, 2024.","apa":"Pedrotti, F., Maas, J., &#38; Mondelli, M. (2024). Improved convergence of score-based diffusion models via prediction-correction. In <i>Transactions on Machine Learning Research</i>.","ama":"Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion models via prediction-correction. In: <i>Transactions on Machine Learning Research</i>. ; 2024.","chicago":"Pedrotti, Francesco, Jan Maas, and Marco Mondelli. “Improved Convergence of Score-Based Diffusion Models via Prediction-Correction.” In <i>Transactions on Machine Learning Research</i>, 2024.","ieee":"F. Pedrotti, J. Maas, and M. Mondelli, “Improved convergence of score-based diffusion models via prediction-correction,” in <i>Transactions on Machine Learning Research</i>, 2024.","mla":"Pedrotti, Francesco, et al. “Improved Convergence of Score-Based Diffusion Models via Prediction-Correction.” <i>Transactions on Machine Learning Research</i>, 2024.","ista":"Pedrotti F, Maas J, Mondelli M. 2024. Improved convergence of score-based diffusion models via prediction-correction. Transactions on Machine Learning Research. , TMLR, ."},"department":[{"_id":"JaMa"},{"_id":"MaMo"}],"arxiv":1,"external_id":{"arxiv":["2305.14164"]}},{"month":"07","date_published":"2024-07-30T00:00:00Z","date_created":"2025-01-30T07:29:47Z","date_updated":"2025-04-15T07:50:12Z","publication":"41st International Conference on Machine Learning","page":"4267-4299","day":"30","title":"How spurious features are memorized: Precise analysis for random and NTK features","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","scopus_import":"1","year":"2024","abstract":[{"lang":"eng","text":"Deep learning models are known to overfit and memorize spurious features in the training dataset. While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized via two separate terms: (i) the stability of the model with respect to individual training samples, and (ii) the feature alignment between the spurious pattern and the full sample. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result gives a precise characterization of the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. We prove that the memorization of spurious features weakens as the generalization capability increases and, through the analysis of the feature alignment, we unveil the role of the model and of its activation function. Numerical experiments show the predictive power of our theory on standard datasets (MNIST, CIFAR-10)."}],"publication_identifier":{"eissn":["2640-3498"]},"OA_type":"green","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2305.12100","open_access":"1"}],"corr_author":"1","language":[{"iso":"eng"}],"alternative_title":["PMLR"],"type":"conference","publication_status":"published","author":[{"last_name":"Bombari","full_name":"Bombari, Simone","id":"ca726dda-de17-11ea-bc14-f9da834f63aa","first_name":"Simone"},{"last_name":"Mondelli","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020"}],"quality_controlled":"1","article_processing_charge":"No","OA_place":"repository","status":"public","acknowledgement":"The authors were partially supported by the 2019 LopezLoreta prize, and they would like to thank (in alphabetical order) Grigorios Chrysos, Simone Maria Giancola, Mahyar\r\nJafari Nodeh, Christoph Lampert, Marco Miani, GuanWen Qiu, and Peter Sukenık for helpful discussions.","intvolume":"       235","publisher":"ML Research Press","external_id":{"arxiv":["2305.12100"]},"arxiv":1,"department":[{"_id":"MaMo"}],"citation":{"ista":"Bombari S, Mondelli M. 2024. How spurious features are memorized: Precise analysis for random and NTK features. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 4267–4299.","mla":"Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features.” <i>41st International Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4267–99.","ieee":"S. Bombari and M. Mondelli, “How spurious features are memorized: Precise analysis for random and NTK features,” in <i>41st International Conference on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4267–4299.","chicago":"Bombari, Simone, and Marco Mondelli. “How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features.” In <i>41st International Conference on Machine Learning</i>, 235:4267–99. ML Research Press, 2024.","short":"S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning, ML Research Press, 2024, pp. 4267–4299.","apa":"Bombari, S., &#38; Mondelli, M. (2024). How spurious features are memorized: Precise analysis for random and NTK features. In <i>41st International Conference on Machine Learning</i> (Vol. 235, pp. 4267–4299). Vienna, Austria: ML Research Press.","ama":"Bombari S, Mondelli M. How spurious features are memorized: Precise analysis for random and NTK features. In: <i>41st International Conference on Machine Learning</i>. Vol 235. ML Research Press; 2024:4267-4299."},"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"conference":{"end_date":"2024-07-27","location":"Vienna, Austria","name":"ICML: International Conference on Machine Learning","start_date":"2024-07-21"},"_id":"18972","oa_version":"Preprint","oa":1,"volume":235},{"year":"2024","scopus_import":"1","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Towards understanding the word sensitivity of attention layers: A study via random features","day":"30","page":"4300-4328","date_created":"2025-01-30T07:35:49Z","publication":"41st International Conference on Machine Learning","date_updated":"2025-04-15T07:50:12Z","date_published":"2024-07-30T00:00:00Z","month":"07","alternative_title":["PMLR"],"language":[{"iso":"eng"}],"corr_author":"1","OA_type":"green","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2402.02969"}],"publication_identifier":{"eissn":["2640-3498"]},"abstract":[{"text":"Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depends on one or few words, even if the sentence is long. Our work studies this key property, dubbed word sensitivity (WS), in the prototypical setting of random features. We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map. The argument critically exploits the role of the softmax in the attention layer, highlighting its benefit compared to other activations (e.g., ReLU). In contrast, the WS of standard random features is of order 1/n−−√, n being the number of words in the textual sample, and thus it decays with the length of the context. We then translate these results on the word sensitivity into generalization bounds: due to their low WS, random features provably cannot learn to distinguish between two sentences that differ only in a single word; in contrast, due to their high WS, random attention features have higher generalization capabilities. We validate our theoretical results with experimental evidence over the BERT-Base word embeddings of the imdb review dataset.","lang":"eng"}],"intvolume":"       235","acknowledgement":"The authors were partially supported by the 2019 LopezLoreta prize, and they would like to thank Mohammad Hossein Amani, Lorenzo Beretta, and Clement Rebuffel for helpful discussions.","status":"public","OA_place":"repository","quality_controlled":"1","article_processing_charge":"No","author":[{"full_name":"Bombari, Simone","id":"ca726dda-de17-11ea-bc14-f9da834f63aa","first_name":"Simone","last_name":"Bombari"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","last_name":"Mondelli"}],"publication_status":"published","type":"conference","volume":235,"conference":{"end_date":"2024-07-27","name":"ICML: International Conference on Machine Learning","location":"Vienna, Austria","start_date":"2024-07-21"},"oa":1,"_id":"18973","oa_version":"Preprint","department":[{"_id":"MaMo"}],"citation":{"ista":"Bombari S, Mondelli M. 2024. Towards understanding the word sensitivity of attention layers: A study via random features. 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 4300–4328.","mla":"Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features.” <i>41st International Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 4300–28.","ieee":"S. Bombari and M. Mondelli, “Towards understanding the word sensitivity of attention layers: A study via random features,” in <i>41st International Conference on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 4300–4328.","chicago":"Bombari, Simone, and Marco Mondelli. “Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features.” In <i>41st International Conference on Machine Learning</i>, 235:4300–4328. ML Research Press, 2024.","apa":"Bombari, S., &#38; Mondelli, M. (2024). Towards understanding the word sensitivity of attention layers: A study via random features. In <i>41st International Conference on Machine Learning</i> (Vol. 235, pp. 4300–4328). Vienna, Austria: ML Research Press.","short":"S. Bombari, M. Mondelli, in:, 41st International Conference on Machine Learning, ML Research Press, 2024, pp. 4300–4328.","ama":"Bombari S, Mondelli M. Towards understanding the word sensitivity of attention layers: A study via random features. In: <i>41st International Conference on Machine Learning</i>. Vol 235. ML Research Press; 2024:4300-4328."},"project":[{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"}],"arxiv":1,"publisher":"ML Research Press","external_id":{"arxiv":["2402.02969"]}},{"external_id":{"isi":["001230181100001"],"arxiv":["2303.07245"]},"publisher":"IEEE","arxiv":1,"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"department":[{"_id":"MaMo"}],"citation":{"ieee":"A. R. Esposito and M. Mondelli, “Concentration without independence via information measures,” <i>IEEE Transactions on Information Theory</i>, vol. 70, no. 6. IEEE, pp. 3823–3839, 2024.","mla":"Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence via Information Measures.” <i>IEEE Transactions on Information Theory</i>, vol. 70, no. 6, IEEE, 2024, pp. 3823–39, doi:<a href=\"https://doi.org/10.1109/TIT.2024.3367767\">10.1109/TIT.2024.3367767</a>.","ista":"Esposito AR, Mondelli M. 2024. Concentration without independence via information measures. IEEE Transactions on Information Theory. 70(6), 3823–3839.","ama":"Esposito AR, Mondelli M. Concentration without independence via information measures. <i>IEEE Transactions on Information Theory</i>. 2024;70(6):3823-3839. doi:<a href=\"https://doi.org/10.1109/TIT.2024.3367767\">10.1109/TIT.2024.3367767</a>","short":"A.R. Esposito, M. Mondelli, IEEE Transactions on Information Theory 70 (2024) 3823–3839.","apa":"Esposito, A. R., &#38; Mondelli, M. (2024). Concentration without independence via information measures. <i>IEEE Transactions on Information Theory</i>. IEEE. <a href=\"https://doi.org/10.1109/TIT.2024.3367767\">https://doi.org/10.1109/TIT.2024.3367767</a>","chicago":"Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence via Information Measures.” <i>IEEE Transactions on Information Theory</i>. IEEE, 2024. <a href=\"https://doi.org/10.1109/TIT.2024.3367767\">https://doi.org/10.1109/TIT.2024.3367767</a>."},"volume":70,"_id":"15172","oa_version":"Preprint","oa":1,"publication_status":"published","author":[{"full_name":"Esposito, Amedeo Roberto","id":"9583e921-e1ad-11ec-9862-cef099626dc9","first_name":"Amedeo Roberto","last_name":"Esposito"},{"last_name":"Mondelli","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco"}],"type":"journal_article","related_material":{"record":[{"relation":"earlier_version","id":"14922","status":"public"}]},"quality_controlled":"1","article_processing_charge":"No","issue":"6","status":"public","isi":1,"doi":"10.1109/TIT.2024.3367767","intvolume":"        70","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2303.07245"}],"publication_identifier":{"issn":["0018-9448"],"eissn":["1557-9654"]},"abstract":[{"lang":"eng","text":"We propose a novel approach to concentration for non-independent random variables. The main idea is to “pretend” that the random variables are independent and pay a multiplicative price measuring how far they are from actually being independent. This price is encapsulated in the Hellinger integral between the joint and the product of the marginals, which is then upper bounded leveraging tensorisation properties. Our bounds represent a natural generalisation of concentration inequalities in the presence of dependence: we recover exactly the classical bounds (McDiarmid’s inequality) when the random variables are independent. Furthermore, in a “large deviations” regime, we obtain the same decay in the probability as for the independent case, even when the random variables display non-trivial dependencies. To show this, we consider a number of applications of interest. First, we provide a bound for Markov chains with finite state space. Then, we consider the Simple Symmetric Random Walk, which is a non-contracting Markov chain, and a non-Markovian setting in which the stochastic process depends on its entire past. To conclude, we propose an application to Markov Chain Monte Carlo methods, where our approach leads to an improved lower bound on the minimum burn-in period required to reach a certain accuracy. In all of these settings, we provide a regime of parameters in which our bound fares better than what the state of the art can provide."}],"language":[{"iso":"eng"}],"corr_author":"1","date_published":"2024-06-01T00:00:00Z","month":"06","page":"3823-3839","publication":"IEEE Transactions on Information Theory","date_updated":"2025-09-04T13:06:53Z","date_created":"2024-03-24T23:01:00Z","title":"Concentration without independence via information measures","day":"01","article_type":"original","year":"2024","scopus_import":"1","user_id":"317138e5-6ab7-11ef-aa6d-ffef3953e345"},{"isi":1,"doi":"10.1109/ICASSP48485.2024.10447198","acknowledgement":"This work was supported by a Lopez-Loreta Prize to MM, an SNSF Eccellenza Grant to MRR (PCEGP3-181181), and core funding from ISTA. The authors thank Philip Schniter, Matthew Stephens and Pragya Sur for valuable suggestions on an early version of the work. The authors acknowledge the participants and investigators of the UK Biobank study. High-performance\r\ncomputing was supported by the Scientific Service Units (SSU) of IST Austria through resources provided by Scientific Computing (SciComp).","status":"public","OA_place":"repository","quality_controlled":"1","article_processing_charge":"No","publication_status":"published","author":[{"id":"0b77531d-dbcd-11ea-9d1d-a8eee0bf3830","first_name":"Al","full_name":"Depope, Al","last_name":"Depope"},{"full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020","id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","last_name":"Mondelli"},{"id":"E5D42276-F5DA-11E9-8E24-6303E6697425","first_name":"Matthew Richard","full_name":"Robinson, Matthew Richard","orcid":"0000-0001-8982-8813","last_name":"Robinson"}],"type":"conference","_id":"17147","oa_version":"Submitted Version","oa":1,"conference":{"end_date":"2024-04-19","name":"ICASSP: International Conference on Acoustics, Speech and Signal Processing","location":"Seoul, Korea","start_date":"2024-04-14"},"project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"},{"name":"Improving estimation and prediction of common complex disease risk","_id":"9B8D11D6-BA93-11EA-9121-9846C619BF3A","grant_number":"PCEGP3_181181"}],"department":[{"_id":"MaMo"},{"_id":"MaRo"}],"citation":{"mla":"Depope, Al, et al. “Inference of Genetic Effects via Approximate Message Passing.” <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing</i>, IEEE, 2024, pp. 13151–55, doi:<a href=\"https://doi.org/10.1109/ICASSP48485.2024.10447198\">10.1109/ICASSP48485.2024.10447198</a>.","ista":"Depope A, Mondelli M, Robinson MR. 2024. Inference of genetic effects via approximate message passing. 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP: International Conference on Acoustics, Speech and Signal Processing, 13151–13155.","ieee":"A. Depope, M. Mondelli, and M. R. Robinson, “Inference of genetic effects via approximate message passing,” in <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing</i>, Seoul, Korea, 2024, pp. 13151–13155.","chicago":"Depope, Al, Marco Mondelli, and Matthew Richard Robinson. “Inference of Genetic Effects via Approximate Message Passing.” In <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing</i>, 13151–55. IEEE, 2024. <a href=\"https://doi.org/10.1109/ICASSP48485.2024.10447198\">https://doi.org/10.1109/ICASSP48485.2024.10447198</a>.","ama":"Depope A, Mondelli M, Robinson MR. Inference of genetic effects via approximate message passing. In: <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing</i>. IEEE; 2024:13151-13155. doi:<a href=\"https://doi.org/10.1109/ICASSP48485.2024.10447198\">10.1109/ICASSP48485.2024.10447198</a>","short":"A. Depope, M. Mondelli, M.R. Robinson, in:, 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, 2024, pp. 13151–13155.","apa":"Depope, A., Mondelli, M., &#38; Robinson, M. R. (2024). Inference of genetic effects via approximate message passing. In <i>2024 IEEE International Conference on Acoustics, Speech, and Signal Processing</i> (pp. 13151–13155). Seoul, Korea: IEEE. <a href=\"https://doi.org/10.1109/ICASSP48485.2024.10447198\">https://doi.org/10.1109/ICASSP48485.2024.10447198</a>"},"external_id":{"isi":["001396233806078"]},"publisher":"IEEE","acknowledged_ssus":[{"_id":"ScienComp"}],"year":"2024","scopus_import":"1","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Inference of genetic effects via approximate message passing","day":"19","page":"13151-13155","publication":"2024 IEEE International Conference on Acoustics, Speech, and Signal Processing","date_updated":"2025-11-05T07:21:31Z","date_created":"2024-06-16T22:01:07Z","date_published":"2024-04-19T00:00:00Z","month":"04","language":[{"iso":"eng"}],"corr_author":"1","main_file_link":[{"url":"https://openreview.net/forum?id=aQYCDxfZV0","open_access":"1"}],"OA_type":"green","publication_identifier":{"isbn":["9798350344851"],"issn":["1520-6149"]},"abstract":[{"text":"Efficient utilization of large-scale biobank data is crucial for inferring the genetic basis of disease and predicting health outcomes from the DNA. Yet we lack efficient, accurate methods that scale to data where electronic health records are linked to whole genome sequence information. To address this issue, our paper develops a new algorithmic paradigm based on Approximate Message Passing (AMP), which is specifically tailored for genomic prediction and association testing. Our method yields comparable out-of-sample prediction accuracy to the state of the art on UK Biobank traits, whilst dramatically improving computational complexity, with a 8x-speed up in the run time. In addition, AMP theory provides a joint association testing framework, which outperforms the currently used REGENIE method, in roughly a third of the compute time. This first, truly large-scale application of the AMP framework lays the foundations for a far wider range of statistical analyses for hundreds of millions of variables measured on millions of people.","lang":"eng"}]},{"citation":{"ieee":"F. Pedrotti, J. Maas, and M. Mondelli, “Improved convergence of score-based diffusion models via prediction-correction,” <i>arXiv</i>. .","ista":"Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion models via prediction-correction. arXiv, <a href=\"https://doi.org/10.48550/arXiv.2305.14164\">10.48550/arXiv.2305.14164</a>.","mla":"Pedrotti, Francesco, et al. “Improved Convergence of Score-Based Diffusion Models via Prediction-Correction.” <i>ArXiv</i>, doi:<a href=\"https://doi.org/10.48550/arXiv.2305.14164\">10.48550/arXiv.2305.14164</a>.","short":"F. Pedrotti, J. Maas, M. Mondelli, ArXiv (n.d.).","apa":"Pedrotti, F., Maas, J., &#38; Mondelli, M. (n.d.). Improved convergence of score-based diffusion models via prediction-correction. <i>arXiv</i>. <a href=\"https://doi.org/10.48550/arXiv.2305.14164\">https://doi.org/10.48550/arXiv.2305.14164</a>","ama":"Pedrotti F, Maas J, Mondelli M. Improved convergence of score-based diffusion models via prediction-correction. <i>arXiv</i>. doi:<a href=\"https://doi.org/10.48550/arXiv.2305.14164\">10.48550/arXiv.2305.14164</a>","chicago":"Pedrotti, Francesco, Jan Maas, and Marco Mondelli. “Improved Convergence of Score-Based Diffusion Models via Prediction-Correction.” <i>ArXiv</i>, n.d. <a href=\"https://doi.org/10.48550/arXiv.2305.14164\">https://doi.org/10.48550/arXiv.2305.14164</a>."},"department":[{"_id":"JaMa"},{"_id":"MaMo"}],"project":[{"name":"Taming Complexity in Partial Differential Systems","_id":"fc31cba2-9c52-11eb-aca3-ff467d239cd2","grant_number":"F6504"},{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"_id":"17350","oa_version":"Preprint","oa":1,"abstract":[{"text":"Score-based generative models (SGMs) are powerful tools to sample from\r\ncomplex data distributions. Their underlying idea is to (i) run a forward\r\nprocess for time $T_1$ by adding noise to the data, (ii) estimate its score\r\nfunction, and (iii) use such estimate to run a reverse process. As the reverse\r\nprocess is initialized with the stationary distribution of the forward one, the\r\nexisting analysis paradigm requires $T_1\\to\\infty$. This is however\r\nproblematic: from a theoretical viewpoint, for a given precision of the score\r\napproximation, the convergence guarantee fails as $T_1$ diverges; from a\r\npractical viewpoint, a large $T_1$ increases computational costs and leads to\r\nerror propagation. This paper addresses the issue by considering a version of\r\nthe popular predictor-corrector scheme: after running the forward process, we\r\nfirst estimate the final distribution via an inexact Langevin dynamics and then\r\nrevert the process. Our key technical contribution is to provide convergence\r\nguarantees which require to run the forward process only for a fixed finite\r\ntime $T_1$. Our bounds exhibit a mild logarithmic dependence on the input\r\ndimension and the subgaussian norm of the target distribution, have minimal\r\nassumptions on the data, and require only to control the $L^2$ loss on the\r\nscore approximation, which is the quantity minimized in practice.","lang":"eng"}],"external_id":{"arxiv":["2305.14164"]},"main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2305.14164","open_access":"1"}],"corr_author":"1","arxiv":1,"language":[{"iso":"eng"}],"day":"06","title":"Improved convergence of score-based diffusion models via prediction-correction","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","doi":"10.48550/arXiv.2305.14164","year":"2024","type":"preprint","month":"06","date_published":"2024-06-06T00:00:00Z","author":[{"last_name":"Pedrotti","id":"d3ac8ac6-dc8d-11ea-abe3-e2a9628c4c3c","first_name":"Francesco","full_name":"Pedrotti, Francesco"},{"last_name":"Maas","id":"4C5696CE-F248-11E8-B48F-1D18A9856A87","first_name":"Jan","full_name":"Maas, Jan","orcid":"0000-0002-0845-1338"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","first_name":"Marco","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","last_name":"Mondelli"}],"publication_status":"draft","date_created":"2024-07-31T07:56:40Z","article_processing_charge":"No","date_updated":"2026-04-07T13:00:02Z","publication":"arXiv","related_material":{"record":[{"relation":"later_version","status":"public","id":"18897"},{"id":"17336","status":"public","relation":"dissertation_contains"}]},"OA_place":"repository"},{"related_material":{"record":[{"id":"17465","status":"public","relation":"dissertation_contains"}]},"article_processing_charge":"No","quality_controlled":"1","publication_status":"published","author":[{"last_name":"Kögler","full_name":"Kögler, Kevin","first_name":"Kevin","id":"94ec913c-dc85-11ea-9058-e5051ab2428b"},{"first_name":"Aleksandr","id":"F2B06EC2-C99E-11E9-89F0-752EE6697425","full_name":"Shevchenko, Aleksandr","last_name":"Shevchenko"},{"last_name":"Hassani","first_name":"Hamed","full_name":"Hassani, Hamed"},{"last_name":"Mondelli","first_name":"Marco","id":"27EB676C-8706-11E9-9510-7717E6697425","full_name":"Mondelli, Marco","orcid":"0000-0002-3242-7020"}],"type":"conference","intvolume":"       235","acknowledgement":"Kevin Kogler, Alexander Shevchenko and Marco Mondelli are supported by the 2019 Lopez-Loreta Prize. Hamed\r\nHassani acknowledges the support by the NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science (EnCORE).","status":"public","arxiv":1,"publisher":"ML Research Press","external_id":{"arxiv":["2402.05013"]},"volume":235,"conference":{"end_date":"2024-07-27","start_date":"2024-07-21","location":"Vienna, Austria","name":"ICML: International Conference on Machine Learning"},"oa":1,"_id":"17469","oa_version":"Published Version","department":[{"_id":"DaAl"},{"_id":"MaMo"}],"citation":{"ama":"Kögler K, Shevchenko A, Hassani H, Mondelli M. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. In: <i>Proceedings of the 41st International Conference on Machine Learning</i>. Vol 235. ML Research Press; 2024:24964-25015.","apa":"Kögler, K., Shevchenko, A., Hassani, H., &#38; Mondelli, M. (2024). Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. In <i>Proceedings of the 41st International Conference on Machine Learning</i> (Vol. 235, pp. 24964–25015). Vienna, Austria: ML Research Press.","short":"K. Kögler, A. Shevchenko, H. Hassani, M. Mondelli, in:, Proceedings of the 41st International Conference on Machine Learning, ML Research Press, 2024, pp. 24964–25015.","chicago":"Kögler, Kevin, Alexander Shevchenko, Hamed Hassani, and Marco Mondelli. “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth.” In <i>Proceedings of the 41st International Conference on Machine Learning</i>, 235:24964–15. ML Research Press, 2024.","ieee":"K. Kögler, A. Shevchenko, H. Hassani, and M. Mondelli, “Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth,” in <i>Proceedings of the 41st International Conference on Machine Learning</i>, Vienna, Austria, 2024, vol. 235, pp. 24964–25015.","mla":"Kögler, Kevin, et al. “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth.” <i>Proceedings of the 41st International Conference on Machine Learning</i>, vol. 235, ML Research Press, 2024, pp. 24964–5015.","ista":"Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 24964–25015."},"project":[{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"}],"page":"24964-25015","date_created":"2024-08-29T11:47:57Z","publication":"Proceedings of the 41st International Conference on Machine Learning","date_updated":"2026-06-07T22:30:05Z","date_published":"2024-07-01T00:00:00Z","month":"07","scopus_import":"1","year":"2024","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","title":"Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth","day":"01","alternative_title":["PMLR"],"language":[{"iso":"eng"}],"corr_author":"1","main_file_link":[{"url":"https://proceedings.mlr.press/v235/kogler24a.html","open_access":"1"}],"abstract":[{"lang":"eng","text":"Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST."}]}]