Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth
Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 24964–25015.
Download (ext.)
          
        
            
            
            Conference Paper
            
            
            
            | Published
            
            
              |              English
              
            
          
        Scopus indexed
Author
        Corresponding author has ISTA affiliation
Department
    Series Title
    
    PMLR
Abstract
    Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
    
  Publishing Year
    
  Date Published
    2024-07-01
  Proceedings Title
    Proceedings of the 41st International Conference on Machine Learning
  Publisher
    ML Research Press
  Acknowledgement
    Kevin Kogler, Alexander Shevchenko and Marco Mondelli are supported by the 2019 Lopez-Loreta Prize. Hamed
Hassani acknowledges the support by the NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science (EnCORE).
  Volume
      235
    Page
      24964-25015
    Conference
    
      ICML: International Conference on Machine Learning
    
  Conference Location
    
      Vienna, Austria
    
  Conference Date
    
      2024-07-21 – 2024-07-27
    
  IST-REx-ID
    
  Cite this
Kögler K, Shevchenko A, Hassani H, Mondelli M. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. In: Proceedings of the 41st International Conference on Machine Learning. Vol 235. ML Research Press; 2024:24964-25015.
    Kögler, K., Shevchenko, A., Hassani, H., & Mondelli, M. (2024). Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. In Proceedings of the 41st International Conference on Machine Learning (Vol. 235, pp. 24964–25015). Vienna, Austria: ML Research Press.
    Kögler, Kevin, Alexander Shevchenko, Hamed Hassani, and Marco Mondelli. “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth.” In Proceedings of the 41st International Conference on Machine Learning, 235:24964–15. ML Research Press, 2024.
    K. Kögler, A. Shevchenko, H. Hassani, and M. Mondelli, “Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth,” in Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 2024, vol. 235, pp. 24964–25015.
    Kögler K, Shevchenko A, Hassani H, Mondelli M. 2024. Compression of structured data with autoencoders: Provable benefit of nonlinearities and depth. Proceedings of the 41st International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 235, 24964–25015.
    Kögler, Kevin, et al. “Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth.” Proceedings of the 41st International Conference on Machine Learning, vol. 235, ML Research Press, 2024, pp. 24964–5015.
  
      All files available under the following license(s):
      
      
        
          
        
          
          
      
      
    
  
            Copyright Statement:
          
        
            This Item is protected by copyright and/or related rights. [...]
          
        
      Link(s) to Main File(s)
    
  Access Level
     Open Access
 Open Access
    
      Material in ISTA:
    
  
      Dissertation containing ISTA record
    
  Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
 arXiv 2402.05013
arXiv 2402.05013


 Google Scholar
Google Scholar