Scaling laws for sparsely-connected foundation models

Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. 2024. Scaling laws for sparsely-connected foundation models. The Twelfth International Conference on Learning Representations. ICLR: International Conference on Learning Representations.

Download (ext.)
Conference Paper | Published | English

Scopus indexed
Author
Frantar, EliasISTA; Ruiz, Carlos Riquelme; Houlsby, Neil; Alistarh, Dan-AdrianISTA ; Evci, Utku

Corresponding author has ISTA affiliation

Department
Abstract
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i.e., "foundation models"), in both vision and language domains. In this setting, we identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data, which we validate empirically across model and data scales; on ViT/JFT-4B and T5/C4. These results allow us to characterize the "optimal sparsity", the sparsity level which yields the best performance for a given effective model size and training budget. For a fixed number of non-zero parameters, we identify that the optimal sparsity increases with the amount of data used for training. We also extend our study to different sparsity structures (such as the hardware-friendly n:m pattern) and strategies (such as starting from a pretrained dense model). Our findings shed light on the power and limitations of weight sparsity across various parameter and computational settings, offering both theoretical understanding and practical implications for leveraging sparsity towards computational efficiency improvements. We provide pruning and scaling law fitting code at: github.com/google-research/jaxpruner/tree/main/jaxpruner/projects/bigsparse.
Publishing Year
Date Published
2024-01-16
Proceedings Title
The Twelfth International Conference on Learning Representations
Conference
ICLR: International Conference on Learning Representations
Conference Location
Vienna, Austria
Conference Date
2024-05-07 – 2024-05-07
IST-REx-ID

Cite this

Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. Scaling laws for sparsely-connected foundation models. In: The Twelfth International Conference on Learning Representations. ; 2024.
Frantar, E., Ruiz, C. R., Houlsby, N., Alistarh, D.-A., & Evci, U. (2024). Scaling laws for sparsely-connected foundation models. In The Twelfth International Conference on Learning Representations. Vienna, Austria.
Frantar, Elias, Carlos Riquelme Ruiz, Neil Houlsby, Dan-Adrian Alistarh, and Utku Evci. “Scaling Laws for Sparsely-Connected Foundation Models.” In The Twelfth International Conference on Learning Representations, 2024.
E. Frantar, C. R. Ruiz, N. Houlsby, D.-A. Alistarh, and U. Evci, “Scaling laws for sparsely-connected foundation models,” in The Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.
Frantar E, Ruiz CR, Houlsby N, Alistarh D-A, Evci U. 2024. Scaling laws for sparsely-connected foundation models. The Twelfth International Conference on Learning Representations. ICLR: International Conference on Learning Representations.
Frantar, Elias, et al. “Scaling Laws for Sparsely-Connected Foundation Models.” The Twelfth International Conference on Learning Representations, 2024.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
OA Open Access
Material in ISTA:
Dissertation containing ISTA record

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2309.08520

Search this title in

Google Scholar