Model Merging via Averaged Representational Similarity

Christopher Wang; Vighnesh Subramaniam; Dan Gutfreund; Boris Katz; Phillip Isola; Brian Cheung

ICML 2026

Workshop

06 Jul 2026

Model Merging via Averaged Representational Similarity

Abstract

If multiple artists are asked to draw a circle by hand, each one will produce something slightly imperfect. Yet, the average of their sketches can look strikingly close to ideal. We investigate whether knowledge from different models can be combined in the same way. We propose to average models based on their \textit{kernel}: the matrix of all dot products between model embeddings per data sample. Compared to techniques such as weight-averaging, this has the advantage of allowing merging between models that have been trained separately, i.e., from different initializations. We take models that have been trained on disjoint, skewed sets of data and show that simple averaging produces a kernel that trends representationally towards that of a more accurate model. Empirically, we even find the similarity landscape with respect to teacher kernels to be convex. We then use a differentiable version of Mutual -Nearest Neighbors (MKNN), to directly optimize a student network for representational similarity with the average kernel. We find that this provides consistent gains in performance. These findings open the door for a new type of model-merging that does not rely on weight-averaging, and is thus able to accommodate models that are trained from scratch independently. Going further, they hint at a more general framing for model-merging techniques, in which models can be thought to lie in the same loss basin with respect to their representations.

Workshop