Please note that ISTA Research Explorer no longer supports Internet Explorer versions 8 or 9 (or earlier).
We recommend upgrading to the latest Internet Explorer, Google Chrome, or Firefox.
7 Publications
2025 | Published | Conference Paper | IST-REx-ID: 20038 |
Jin, T., Humayun, A. I., Evci, U., Subramanian, S., Yazdanbakhsh, A., Alistarh, D.-A., & Dziugaite, G. K. (2025). The journey matters: Average parameter count over pre-training unifies sparse and dense scaling laws. In 13th International Conference on Learning Representations (pp. 85165–85181). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20033 |
Emrullah Ildiz, M., Gozeten, H. A., Taga, E. O., Mondelli, M., & Oymak, S. (2025). High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. In 13th International Conference on Learning Representations (pp. 2967–3006). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20037 |
Sawmya, S., Kong, L., Markov, I., Alistarh, D.-A., & Shavit, N. (2025). Wasserstein distances, neuronal entanglement, and sparsity. In 13th International Conference on Learning Representations (pp. 26244–26274). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20036 |
Pariza, V., Salehi, M., Burghouts, G., Locatello, F., & Asano, Y. M. (2025). Near, far: Patch-ordering enhances vision foundation models’ scene understanding. In 13th International Conference on Learning Representations (pp. 72303–72330). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20032 |
Chen, J., Yao, D., Pervez, A. A., Alistarh, D.-A., & Locatello, F. (2025). Scalable mechanistic neural networks. In 13th International Conference on Learning Representations (pp. 63716–63737). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20035 |
Jacot, A., Súkeník, P., Wang, Z., & Mondelli, M. (2025). Wide neural networks trained with weight decay provably exhibit neural collapse. In 13th International Conference on Learning Representations (pp. 1905–1931). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20034 |
Robert, T., Safaryan, M., Modoranu, I.-V., & Alistarh, D.-A. (2025). LDAdam: Adaptive optimization from low-dimensional gradient statistics. In 13th International Conference on Learning Representations (pp. 101877–101913). Singapore, Singapore: ICLR.
[Published Version]
View
| Files available
| arXiv