Please note that ISTA Research Explorer no longer supports Internet Explorer versions 8 or 9 (or earlier).
We recommend upgrading to the latest Internet Explorer, Google Chrome, or Firefox.
7 Publications
2025 | Published | Conference Paper | IST-REx-ID: 20038 |

The journey matters: Average parameter count over pre-training unifies sparse and dense scaling laws
T. Jin, A.I. Humayun, U. Evci, S. Subramanian, A. Yazdanbakhsh, D.-A. Alistarh, G.K. Dziugaite, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 85165–85181.
[Published Version]
View
| Files available
| arXiv
T. Jin, A.I. Humayun, U. Evci, S. Subramanian, A. Yazdanbakhsh, D.-A. Alistarh, G.K. Dziugaite, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 85165–85181.
2025 | Published | Conference Paper | IST-REx-ID: 20033 |

High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws
M. Emrullah Ildiz, H.A. Gozeten, E.O. Taga, M. Mondelli, S. Oymak, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 2967–3006.
[Published Version]
View
| Files available
| arXiv
M. Emrullah Ildiz, H.A. Gozeten, E.O. Taga, M. Mondelli, S. Oymak, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 2967–3006.
2025 | Published | Conference Paper | IST-REx-ID: 20037 |

Wasserstein distances, neuronal entanglement, and sparsity
S. Sawmya, L. Kong, I. Markov, D.-A. Alistarh, N. Shavit, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 26244–26274.
[Published Version]
View
| Files available
| arXiv
S. Sawmya, L. Kong, I. Markov, D.-A. Alistarh, N. Shavit, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 26244–26274.
2025 | Published | Conference Paper | IST-REx-ID: 20036 |

Near, far: Patch-ordering enhances vision foundation models' scene understanding
V. Pariza, M. Salehi, G. Burghouts, F. Locatello, Y.M. Asano, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 72303–72330.
[Published Version]
View
| Files available
| arXiv
V. Pariza, M. Salehi, G. Burghouts, F. Locatello, Y.M. Asano, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 72303–72330.
2025 | Published | Conference Paper | IST-REx-ID: 20032 |

Scalable mechanistic neural networks
J. Chen, D. Yao, A.A. Pervez, D.-A. Alistarh, F. Locatello, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 63716–63737.
[Published Version]
View
| Files available
| arXiv
J. Chen, D. Yao, A.A. Pervez, D.-A. Alistarh, F. Locatello, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 63716–63737.
2025 | Published | Conference Paper | IST-REx-ID: 20035 |

Wide neural networks trained with weight decay provably exhibit neural collapse
A. Jacot, P. Súkeník, Z. Wang, M. Mondelli, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 1905–1931.
[Published Version]
View
| Files available
| arXiv
A. Jacot, P. Súkeník, Z. Wang, M. Mondelli, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 1905–1931.
2025 | Published | Conference Paper | IST-REx-ID: 20034 |

LDAdam: Adaptive optimization from low-dimensional gradient statistics
T. Robert, M. Safaryan, I.-V. Modoranu, D.-A. Alistarh, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–101913.
[Published Version]
View
| Files available
| arXiv
T. Robert, M. Safaryan, I.-V. Modoranu, D.-A. Alistarh, in:, 13th International Conference on Learning Representations, ICLR, 2025, pp. 101877–101913.