Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Universal text embedding models demonstrate that a single pre- trained model can produce representations effective across tasks such as classification, clustering, and retrieval. In contrast, existing tabular foundation models are largely task-specific. In this work, we investigate whether a single tabular embedding model can generalize effectively across tasks. We propose an initial approach that first aligns heterogeneous table-cell representations into a shared space using Hirschfeld–Gebelein–Rényi maximal correlation(HGR), enabling numerical and non-numerical cells co-occurring in the same row to be mapped consistently. We then perform message passing with All-Set Transformer modules within a Hypergraph Transformer architecture to preserve row and column permutation invariance. Both stages are trained using self-supervised objectives to learn consistent representations at multiple granularities, including the cell, row, column, and table levels. Without additional training on new datasets, the model produces table embeddings that generalize across tasks. We evaluate the approach on row similarity, column similarity, and predictive machine learning tasks, and find that it generalizes competitively compared to models specifically designed for individual tasks.
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Gaetano Rossiello, Shankar Subramaniam
ACM CAIS 2026
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025