Does SGD implicitly optimize for smoothness?

Volhejn, Vaclav; Lampert, Christoph

Does SGD implicitly optimize for smoothness?

Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.

Download

2020_GCPR_submitted_Volhejn.pdf 420.23 KB [Submitted Version]

DOI

10.1007/978-3-030-71278-5_18

Conference Paper | Published | English

Scopus indexed

Author

Volhejn, Vaclav^ISTA; Lampert , Christoph^ISTA

Department

Lampert Group

Abstract

Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being “overfit” in this way, they tend to generalize well to future data, thereby defying the classic bias–variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives.

Publishing Year

2021

Date Published

2021-03-17

Proceedings Title

42nd German Conference on Pattern Recognition

Publisher

Springer

Volume

12544

Page

246-259

Conference

DAGM GCPR: German Conference on Pattern Recognition

Conference Location

Tübingen, Germany

Conference Date

2020-09-28 – 2020-10-01

ISBN

9783030712778

ISSN

0302-9743

eISSN

1611-3349

IST-REx-ID

9210

Cite this

Volhejn V, Lampert C. Does SGD implicitly optimize for smoothness? In: 42nd German Conference on Pattern Recognition. Vol 12544. LNCS. Springer; 2021:246-259. doi:10.1007/978-3-030-71278-5_18

Volhejn, V., & Lampert, C. (2021). Does SGD implicitly optimize for smoothness? In 42nd German Conference on Pattern Recognition (Vol. 12544, pp. 246–259). Tübingen, Germany: Springer. https://doi.org/10.1007/978-3-030-71278-5_18

Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” In 42nd German Conference on Pattern Recognition, 12544:246–59. LNCS. Springer, 2021. https://doi.org/10.1007/978-3-030-71278-5_18.

V. Volhejn and C. Lampert, “Does SGD implicitly optimize for smoothness?,” in 42nd German Conference on Pattern Recognition, Tübingen, Germany, 2021, vol. 12544, pp. 246–259.

Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.

Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” 42nd German Conference on Pattern Recognition, vol. 12544, Springer, 2021, pp. 246–59, doi:10.1007/978-3-030-71278-5_18.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Main File(s)

File Name

2020_GCPR_submitted_Volhejn.pdf 420.23 KB

Access Level

Open Access

Date Uploaded

2022-08-12

MD5 Checksum

3e3628ab1cf658d82524963f808004ea

Export

Marked Publications

Open Data ISTA Research Explorer

Search this title in

Google Scholar
ISBN Search