Please note that ISTA Research Explorer no longer supports Internet Explorer versions 8 or 9 (or earlier).
We recommend upgrading to the latest Internet Explorer, Google Chrome, or Firefox.
7 Publications
2025 | Published | Conference Paper | IST-REx-ID: 20038 |
Jin T, Humayun AI, Evci U, et al. The journey matters: Average parameter count over pre-training unifies sparse and dense scaling laws. In: 13th International Conference on Learning Representations. ICLR; 2025:85165-85181.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20033 |
Emrullah Ildiz M, Gozeten HA, Taga EO, Mondelli M, Oymak S. High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws. In: 13th International Conference on Learning Representations. ICLR; 2025:2967-3006.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20037 |
Sawmya S, Kong L, Markov I, Alistarh D-A, Shavit N. Wasserstein distances, neuronal entanglement, and sparsity. In: 13th International Conference on Learning Representations. ICLR; 2025:26244-26274.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20036 |
Pariza V, Salehi M, Burghouts G, Locatello F, Asano YM. Near, far: Patch-ordering enhances vision foundation models’ scene understanding. In: 13th International Conference on Learning Representations. ICLR; 2025:72303-72330.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20032 |
Chen J, Yao D, Pervez AA, Alistarh D-A, Locatello F. Scalable mechanistic neural networks. In: 13th International Conference on Learning Representations. ICLR; 2025:63716-63737.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20035 |
Jacot A, Súkeník P, Wang Z, Mondelli M. Wide neural networks trained with weight decay provably exhibit neural collapse. In: 13th International Conference on Learning Representations. ICLR; 2025:1905-1931.
[Published Version]
View
| Files available
| arXiv
2025 | Published | Conference Paper | IST-REx-ID: 20034 |
Robert T, Safaryan M, Modoranu I-V, Alistarh D-A. LDAdam: Adaptive optimization from low-dimensional gradient statistics. In: 13th International Conference on Learning Representations. ICLR; 2025:101877-101913.
[Published Version]
View
| Files available
| arXiv