Average gradient outer product as a mechanism for deep neural collapse
Download (ext.)
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Series Title
Advances in Neural Information Processing Systems
Abstract
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. The Deep Recursive Feature Machine (Deep RFM) is a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard settings as a consequence of the projection with the AGOP matrix computed at each layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting and as a result of kernel learning. We then provide evidence that this mechanism holds for neural networks more generally. In particular, we show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.
Publishing Year
Date Published
2024-12-01
Proceedings Title
38th Annual Conference on Neural Information Processing Systems
Publisher
Neural Information Processing Systems Foundation
Acknowledgement
We acknowledge support from the National Science Foundation (NSF) and the Simons Foundation for the Collaboration on the Theoretical Foundations of Deep Learning through awards DMS-2031883 and #814639 as well as the TILOS institute (NSF CCF-2112665). This work used the programs (1) XSEDE (Extreme science and engineering discovery environment) which is supported by NSF grant numbers ACI-1548562, and (2) ACCESS (Advanced cyberinfrastructure coordination ecosystem: services & support) which is supported by NSF grants numbers #2138259, #2138286, #2138307, #2137603, and #2138296. Specifically, we used the resources from SDSC Expanse GPU compute nodes, and NCSA Delta system, via allocations TG-CIS220009. Marco Mondelli is supported by the 2019 Lopez-Loreta prize. We also acknowledge useful feedback from anonymous reviewers.
Volume
37
Conference
NeurIPS: Neural Information Processing Systems
Conference Location
Vancouver, Canada
Conference Date
2024-12-16 – 2024-12-16
eISSN
IST-REx-ID
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Link(s) to Main File(s)
Access Level

Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2402.13728