C2M3: Cycle-consistent multi-model merging

Download (ext.)
Conference Paper | Published | English

Scopus indexed
Author
Crisostomi, Donato; Fumero, MarcoISTA; Baieri, Daniele; Bernard, Florian; Rodolà, Emanuele

Corresponding author has ISTA affiliation

Department
Series Title
Advances in Neural Information Processing Systems
Abstract
In this paper, we present a novel data-free method for merging neural networks in weight space. Differently from most existing works, our method optimizes for the permutations of network neurons globally across all layers. This allows us to enforce cycle consistency of the permutations when merging n ≥ 3 models, allowing circular compositions of permutations to be computed without accumulating error along the path. We qualitatively and quantitatively motivate the need for such a constraint, showing its benefits when merging sets of models in scenarios spanning varying architectures and datasets. We finally show that, when coupled with activation renormalization, our approach yields the best results in the task.
Publishing Year
Date Published
2024-12-20
Proceedings Title
38th Conference on Neural Information Processing Systems
Publisher
Neural Information Processing Systems Foundation
Acknowledgement
This work is supported by the ERC grant no.802554 (SPECGEO), PRIN 2020 project no.2020TA3K9N (LEGO.AI), and PNRR MUR project PE0000013-FAIR. Marco Fumero is supported by the MSCA IST-Bridge fellowship which has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 101034413. We thank Simone Scardapane for the helpful feedback on the paper.
Volume
37
Conference
NeurIPS: Neural Information Processing Systems
Conference Location
Vancouver, Canada
Conference Date
2024-12-09 – 2024-12-15
ISSN
IST-REx-ID
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2405.17897

Search this title in

Google Scholar