Does SGD implicitly optimize for smoothness?

Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.

Download
OA 2020_GCPR_submitted_Volhejn.pdf 420.23 KB [Submitted Version]

Conference Paper | Published | English

Scopus indexed
Department
Abstract
Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being “overfit” in this way, they tend to generalize well to future data, thereby defying the classic bias–variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives.
Publishing Year
Date Published
2021-03-17
Proceedings Title
42nd German Conference on Pattern Recognition
Publisher
Springer
Volume
12544
Page
246-259
Conference
DAGM GCPR: German Conference on Pattern Recognition
Conference Location
Tübingen, Germany
Conference Date
2020-09-28 – 2020-10-01
ISSN
eISSN
IST-REx-ID

Cite this

Volhejn V, Lampert C. Does SGD implicitly optimize for smoothness? In: 42nd German Conference on Pattern Recognition. Vol 12544. LNCS. Springer; 2021:246-259. doi:10.1007/978-3-030-71278-5_18
Volhejn, V., & Lampert, C. (2021). Does SGD implicitly optimize for smoothness? In 42nd German Conference on Pattern Recognition (Vol. 12544, pp. 246–259). Tübingen, Germany: Springer. https://doi.org/10.1007/978-3-030-71278-5_18
Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” In 42nd German Conference on Pattern Recognition, 12544:246–59. LNCS. Springer, 2021. https://doi.org/10.1007/978-3-030-71278-5_18.
V. Volhejn and C. Lampert, “Does SGD implicitly optimize for smoothness?,” in 42nd German Conference on Pattern Recognition, Tübingen, Germany, 2021, vol. 12544, pp. 246–259.
Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.
Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” 42nd German Conference on Pattern Recognition, vol. 12544, Springer, 2021, pp. 246–59, doi:10.1007/978-3-030-71278-5_18.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
Access Level
OA Open Access
Date Uploaded
2022-08-12
MD5 Checksum
3e3628ab1cf658d82524963f808004ea


Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar
ISBN Search