Shyam Marjit, Harshit Singh, et al.
WACV 2025
Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.
Shyam Marjit, Harshit Singh, et al.
WACV 2025
Daniel Karl I. Weidele, Hendrik Strobelt, et al.
SysML 2019
Harsha Kokel, Aamod Khatiwada, et al.
VLDB 2025
Seung Gu Kang, Jeff Weber, et al.
ACS Fall 2023