ISTA Research Explorer

Please note that ISTA Research Explorer no longer supports Internet Explorer versions 8 or 9 (or earlier).

We recommend upgrading to the latest Internet Explorer, Google Chrome, or Firefox.

7 Publications

Search / Filter

2025 | Published | Conference Paper | IST-REx-ID: 20032 |

J. Chen, D. Yao, A. A. Pervez, D.-A. Alistarh, and F. Locatello, “Scalable mechanistic neural networks,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 63716–63737.

[Published Version] View | Files available | arXiv

2025 | Published | Conference Paper | IST-REx-ID: 20033 |

M. Emrullah Ildiz, H. A. Gozeten, E. O. Taga, M. Mondelli, and S. Oymak, “High-dimensional analysis of knowledge distillation: Weak-to-Strong generalization and scaling laws,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 2967–3006.

[Published Version] View | Files available | arXiv

2025 | Published | Conference Paper | IST-REx-ID: 20034 |

T. Robert, M. Safaryan, I.-V. Modoranu, and D.-A. Alistarh, “LDAdam: Adaptive optimization from low-dimensional gradient statistics,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 101877–101913.

[Published Version] View | Files available | arXiv

2025 | Published | Conference Paper | IST-REx-ID: 20035 |

A. Jacot, P. Súkeník, Z. Wang, and M. Mondelli, “Wide neural networks trained with weight decay provably exhibit neural collapse,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 1905–1931.

[Published Version] View | Files available | arXiv

2025 | Published | Conference Paper | IST-REx-ID: 20036 |

V. Pariza, M. Salehi, G. Burghouts, F. Locatello, and Y. M. Asano, “Near, far: Patch-ordering enhances vision foundation models’ scene understanding,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 72303–72330.

[Published Version] View | Files available | arXiv

2025 | Published | Conference Paper | IST-REx-ID: 20037 |

S. Sawmya, L. Kong, I. Markov, D.-A. Alistarh, and N. Shavit, “Wasserstein distances, neuronal entanglement, and sparsity,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 26244–26274.

[Published Version] View | Files available | arXiv

2025 | Published | Conference Paper | IST-REx-ID: 20038 |

T. Jin et al., “The journey matters: Average parameter count over pre-training unifies sparse and dense scaling laws,” in 13th International Conference on Learning Representations, Singapore, Singapore, 2025, pp. 85165–85181.

[Published Version] View | Files available | arXiv