Wasserstein distances, neuronal entanglement, and sparsity
Sawmya S, Kong L, Markov I, Alistarh D-A, Shavit N. 2025. Wasserstein distances, neuronal entanglement, and sparsity. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 26244–26274.
Download
Conference Paper
| Published
| English
Scopus indexed
Author
Corresponding author has ISTA affiliation
Department
Abstract
Disentangling polysemantic neurons is at the core of many current approaches to interpretability of large language models. Here we attempt to study how disentanglement can be used to understand performance, particularly under weight sparsity, a leading post-training optimization technique. We suggest a novel measure for estimating neuronal entanglement: the Wasserstein distance of a neuron's output distribution to a Gaussian. Moreover, we show the existence of a small number of highly entangled "Wasserstein Neurons" in each linear layer of an LLM, characterized by their highly non-Gaussian output distributions, their role in mapping similar inputs to dissimilar outputs, and their significant impact on model accuracy. To study these phenomena, we propose a new experimental framework for disentangling polysemantic neurons. Our framework separates each layer's inputs to create a mixture of experts where each neuron's output is computed by a mixture of neurons of lower Wasserstein distance, each better at maintaining accuracy when sparsified without retraining. We provide strong evidence that this is because the mixture of sparse experts is effectively disentangling the input-output relationship of individual neurons, in particular the difficult Wasserstein neurons.
Publishing Year
Date Published
2025-04-01
Proceedings Title
13th International Conference on Learning Representations
Publisher
ICLR
Acknowledgement
The authors would like to extend their gratitude to Lori Leu for her insightful comments on the
application of the Wasserstein distance metric. We also wish to thank Elias Frantar for his help in
working with the SparseGPT implementation and his advice for the project. Additionally, we would like to thank Tony Tong Wang and Thomas Athey for their valuable feedback and constructive discussions.
This work was supported by an NIH Brains CONNECTS U01 grant and AMD’s AI & HPC Fund.
Page
26244-26274
Conference
ICLR: International Conference on Learning Representations
Conference Location
Singapore, Singapore
Conference Date
2025-04-24 – 2025-04-28
ISBN
IST-REx-ID
Cite this
Sawmya S, Kong L, Markov I, Alistarh D-A, Shavit N. Wasserstein distances, neuronal entanglement, and sparsity. In: 13th International Conference on Learning Representations. ICLR; 2025:26244-26274.
Sawmya, S., Kong, L., Markov, I., Alistarh, D.-A., & Shavit, N. (2025). Wasserstein distances, neuronal entanglement, and sparsity. In 13th International Conference on Learning Representations (pp. 26244–26274). Singapore, Singapore: ICLR.
Sawmya, Shashata, Linghao Kong, Ilia Markov, Dan-Adrian Alistarh, and Nir Shavit. “Wasserstein Distances, Neuronal Entanglement, and Sparsity.” In 13th International Conference on Learning Representations, 26244–74. ICLR, 2025.
S. Sawmya, L. Kong, I. Markov, D.-A. Alistarh, and N. Shavit, “Wasserstein distances, neuronal entanglement, and sparsity,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 26244–26274.
Sawmya S, Kong L, Markov I, Alistarh D-A, Shavit N. 2025. Wasserstein distances, neuronal entanglement, and sparsity. 13th International Conference on Learning Representations. ICLR: International Conference on Learning Representations, 26244–26274.
Sawmya, Shashata, et al. “Wasserstein Distances, Neuronal Entanglement, and Sparsity.” 13th International Conference on Learning Representations, ICLR, 2025, pp. 26244–74.
All files available under the following license(s):
Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):
Main File(s)
File Name
2025_ICLR_Sawmya.pdf
5.45 MB
Access Level

Date Uploaded
2025-08-04
MD5 Checksum
39a8fa7dbdd7029859e156f53f20f6bc
Export
Marked PublicationsOpen Data ISTA Research Explorer
Sources
arXiv 2405.15756