Compress Once, Decode Anywhere: Embedding Workflows with TerraTorch

Bianca Zadrozny; Isabelle Wittmann; Joao Lucas de Sousa Almeida; Johannes Jakubik; Thomas Brunschwiler; Romeo Kienzler

AGU 2025

Talk

15 Dec 2025

Compress Once, Decode Anywhere: Embedding Workflows with TerraTorch

View code

Abstract

The rapid growth of Earth Observation (EO) data introduces major challenges in data transfer and compute costs for downstream applications. At first glance, the inference cost issue is accentuated with the introduction of large-scale foundation models (FM). However, these models can generate expressive embeddings applicable to diverse down-stream tasks. Thus, FMs can be applied to generate embeddings once, store them, and share only these compact representations with users. Lightweight decoders can then operate on the embeddings, drastically mitigating compute demands, and enabling fast, scalable inference over large spatiotemporal regions.

To support this paradigm, we extend the TerraTorch foundation model toolkit (https://ibm.github.io/terratorch/) with new functionality for embedding-centric workflows. Users can now encode EO data and store embeddings using simple configuration files that specify the pretrained foundation model. We also introduce quantized foundation models optimized for satellite data compression, further reducing storage and transmission requirements.

Precomputed embeddings are over 100 times smaller than raw EO data, and depending on the downstream task, the performance drop compared to fully fine-tuned models can be as small as 5%. This enables country-scale assessments to run in minutes instead of hours—and at a fraction of the cost. With this extension, TerraTorch empowers the EO community to scale foundation models efficiently and make embedding-based applications accessible and practical for real-world use.

This research is carried out as part of the Embed2Scale project and is co-funded by the EU Horizon Europe program under Grant Agreement No. 101131841.

Conference paper