Does SGD implicitly optimize for smoothness?
Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.
Download
Conference Paper
| Published
| English
Scopus indexed
Department
Abstract
Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being “overfit” in this way, they tend to generalize well to future data, thereby defying the classic bias–variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood.
In this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives.
Publishing Year
Date Published
2021-03-17
Proceedings Title
42nd German Conference on Pattern Recognition
Publisher
Springer
Volume
12544
Page
246-259
Conference
DAGM GCPR: German Conference on Pattern Recognition
Conference Location
Tübingen, Germany
Conference Date
2020-09-28 – 2020-10-01
ISBN
ISSN
eISSN
IST-REx-ID
Cite this
Volhejn V, Lampert C. Does SGD implicitly optimize for smoothness? In: 42nd German Conference on Pattern Recognition. Vol 12544. LNCS. Springer; 2021:246-259. doi:10.1007/978-3-030-71278-5_18
Volhejn, V., & Lampert, C. (2021). Does SGD implicitly optimize for smoothness? In 42nd German Conference on Pattern Recognition (Vol. 12544, pp. 246–259). Tübingen, Germany: Springer. https://doi.org/10.1007/978-3-030-71278-5_18
Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” In 42nd German Conference on Pattern Recognition, 12544:246–59. LNCS. Springer, 2021. https://doi.org/10.1007/978-3-030-71278-5_18.
V. Volhejn and C. Lampert, “Does SGD implicitly optimize for smoothness?,” in 42nd German Conference on Pattern Recognition, Tübingen, Germany, 2021, vol. 12544, pp. 246–259.
Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.
Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” 42nd German Conference on Pattern Recognition, vol. 12544, Springer, 2021, pp. 246–59, doi:10.1007/978-3-030-71278-5_18.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
2020_GCPR_submitted_Volhejn.pdf
420.23 KB
Access Level
Open Access
Date Uploaded
2022-08-12
MD5 Checksum
3e3628ab1cf658d82524963f808004ea