Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
The rapid growth of Earth Observation (EO) data introduces major challenges in data transfer and compute costs for downstream applications. At first glance, the inference cost issue is accentuated with the introduction of large-scale foundation models (FM). However, these models can generate expressive embeddings applicable to diverse down-stream tasks. Thus, FMs can be applied to generate embeddings once, store them, and share only these compact representations with users. Lightweight decoders can then operate on the embeddings, drastically mitigating compute demands, and enabling fast, scalable inference over large spatiotemporal regions.
To support this paradigm, we extend the TerraTorch foundation model toolkit (https://ibm.github.io/terratorch/) with new functionality for embedding-centric workflows. Users can now encode EO data and store embeddings using simple configuration files that specify the pretrained foundation model. We also introduce quantized foundation models optimized for satellite data compression, further reducing storage and transmission requirements.
Precomputed embeddings are over 100 times smaller than raw EO data, and depending on the downstream task, the performance drop compared to fully fine-tuned models can be as small as 5%. This enables country-scale assessments to run in minutes instead of hours—and at a fraction of the cost. With this extension, TerraTorch empowers the EO community to scale foundation models efficiently and make embedding-based applications accessible and practical for real-world use.
This research is carried out as part of the Embed2Scale project and is co-funded by the EU Horizon Europe program under Grant Agreement No. 101131841.
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025
Gang Liu, Michael Sun, et al.
ICLR 2025
Daniel Karl I. Weidele, Hendrik Strobelt, et al.
SysML 2019