Can a single tabular embedding model service different tasks?

Niharika DSouza; Liane Vogel; Kavitha Srinivas; Sola Shirai; Oktie Hassanzadeh; Horst Samulowitz

VLDB 2026

Workshop paper

31 Aug 2026

Can a single tabular embedding model service different tasks?

Abstract

Universal text embedding models demonstrate that a single pre- trained model can produce representations effective across tasks such as classification, clustering, and retrieval. In contrast, existing tabular foundation models are largely task-specific. In this work, we investigate whether a single tabular embedding model can generalize effectively across tasks. We propose an initial approach that first aligns heterogeneous table-cell representations into a shared space using Hirschfeld–Gebelein–Rényi maximal correlation(HGR), enabling numerical and non-numerical cells co-occurring in the same row to be mapped consistently. We then perform message passing with All-Set Transformer modules within a Hypergraph Transformer architecture to preserve row and column permutation invariance. Both stages are trained using self-supervised objectives to learn consistent representations at multiple granularities, including the cell, row, column, and table levels. Without additional training on new datasets, the model produces table embeddings that generalize across tasks. We evaluate the approach on row similarity, column similarity, and predictive machine learning tasks, and find that it generalizes competitively compared to models specifically designed for individual tasks.

Conference paper