TerraStackAI: Bringing Earth and space AI to Red Hat and the world
This is a companion piece to the Red Hat blog announcing enterprises can now serve Earth and space AI models on Red Hat AI Inference Server.
Across disciplines, advances in computational mathematics are transforming how large-scale scientific data is analyzed, interpreted, and translated into deployable workflows. In the Earth and space monitoring domains, extreme weather risk, disaster response, precision agriculture, and solar activity forecasting rely on turning petabytes of satellite and sensor data into actionable insights.
Foundation models like IBM-NASA's Prithvi-EO and IBM-ESA's TerraMind enable unprecedented multimodal representations of the Earth system. However, a critical gap remains: the tooling required to effectively use these models is fragmented, complex, and inaccessible to many who need to use them. This led our team to develop TerraStackAI.
The TerraStackAI ecosystem
TerraStackAI is an integrated, open-source technology stack that spans the entire Earth and space geospatial AI workflow. There are two new components to the TerraStackAI ecosystem: TerraKit, for creating AI-ready data, and the Geospatial Studio, for deploying production-ready services. These are in addition to TerraTorch and Iterate which we’ve introduced previously and are now part of the TerraStackAI family.
TerraStackAI's architecture reflects a layered approach that mirrors typical geospatial workflows for Earth and space AI:
- TerraKit for Data: The foundation of any machine learning project is high-quality, properly formatted data. Query, align and prepare data for machine learning. TerraKit handles multi-source ingestion, spatiotemporal alignment and labeling while abstracting away formats, projections, and preprocessing complexity.
- TerraTorch for Models: Fine-tune and evaluate foundation models with a modular, config-driven framework built on PyTorch Lightning and TorchGeo. Mix and match backbones and task heads, including pretrained models. More information is available here.
- Iterate for Optimization: Automate hyperparameter search with Bayesian optimization. Integrated with MLFlow and Ray, it parallelizes experiments and replaces weeks of manual tuning. More information is available here.
- Geospatial Studio for Production: This top layer brings everything together in an accessible platform. Operationalize models through guided workflows for data curation, fine-tuning, deployment, and visualization. Supports both no-code interfaces and programmatic APIs for scalable AI services.
TerraKit: AI-ready geospatial dataset generation
Creating high-quality training datasets is often the most time-consuming aspect of Earth and space AI projects. TerraKit addresses this challenge by providing a unified interface for accessing, processing, and preparing geospatial data from multiple sources.
TerraKit serves as the data foundation of the TerraStackAI ecosystem. It bridges the gap between raw Earth observation data — distributed across various archives, in diverse formats, requiring complex preprocessing — and the standardized, machine learning-ready datasets that training frameworks expect. While tools exist for accessing individual data sources, TerraKit provides a consistent API that abstracts away source-specific details, while handling the spatial-temporal alignment and preprocessing challenges unique to geospatial data.
TerraKit's capabilities span the entire data preparation pipeline. It provides connectors to major Earth observation data sources, including the Copernicus Sentinel missions (Sentinel-1 radar and Sentinel-2 optical imagery) and NASA's Harmonized Landsat Sentinel-2 (HLS) archives. These connectors handle authentication, query construction, and data download, shielding users from provider-specific APIs.
The library excels at multi-source data integration. A typical geospatial AI application might combine optical imagery with radar data and elevation information. TerraKit handles the complexities of unifying these different modalities into coherent multi-modal samples.
Automated preprocessing pipelines handle common transformations, including cloud masking for optical imagery, normalization and standardization, and gap filling for missing data. These pipelines are configurable and extensible, allowing users to implement custom preprocessing logic while benefiting from the framework's orchestration capabilities.
To get started, you can simply install TerraKit from PyPi:
bash-3.2$ pip install terrakit
And find Sentinel-2 data and download it to your local machine:
import terrakit
# Start from a set of raster or vector labels
terrakit.process_labels("./my_labels")
# Download EO data corresponding to the temporal and spatial
# information processed from those labels. Easily extend this
# to multiple data sources.
terrakit.download_data()
# Process downloaded data into data/label pairs
terrakit.chip_and_label_data()
# Store your dataset in standarized geospatial formats such as
# using the TACO standard.
terrakit.taco_store_data()
For more complex workflows, users can construct custom pipelines that chain data access, preprocessing, and augmentation steps:
from terrakit import DataConnector
# Initalize the TerraKit DataConnector to connect to Sentinel AWS archive
dc = DataConnector(connector_type = "sentinel_aws")
# Search for available data
unique_dates, results = dc.connector.find_data(
data_collection_name="sentinel-2-l2a",
date_start="2024-01-01",
date_end="2024-01-31",
bands=["blue", "green", "red"],
bbox=[34.671440, -0.090887, 34.706448, -0.087678],
)
print(unique_dates) # List of the dates where data available
# Download the available data
da = dc.connector.get_data(
data_collection_name="sentinel-2-l2a",
date_start="2024-01-01",
date_end="2024-01-31",
bbox=[34.671440, -0.090887, 34.706448, -0.087678],
bands=["blue", "green", "red"],
save_file=f"output.tif",
)
By handling the complexities of geospatial data acquisition and preprocessing, TerraKit allows researchers and practitioners to focus on the unique aspects of their applications rather than spending time wrangling with data. It serves as the essential first step in the TerraStackAI workflow, producing the standardized datasets that TerraTorch and other tools downstream consume.
Geospatial Studio: An end-to-end platform for fine-tuning and inference
Geospatial Studio represents the culmination of the TerraStackAI vision: an accessible, end-to-end platform that brings together data curation, model fine-tuning, deployment, and inference in a unified environment. While TerraKit, TerraTorch, and Iterate can be used independently via command-line interfaces and Python APIs, Geospatial Studio provides both visual no-code interfaces and programmatic access that make the entire workflow accessible.
At its core, Geospatial Studio orchestrates the complete lifecycle of geospatial AI applications. It guides users through each stage of the workflow with appropriate interfaces for their level of expertise. Domain experts can use visual interfaces to prepare data, configure and train models, and deploy models without writing code. Data scientists can access the same functionality through Python SDKs and Jupyter notebooks. Developers can integrate via RESTful APIs and deploy custom applications using the platform's inference infrastructure.
The platform's data management layer extends TerraKit with orchestration and persistent storage across the full workflow. This feeds directly into the fine-tuning interface that translates dropdown selections and sliders, or user selections through the Python SDK, into TerraTorch configurations, while monitoring training progress through real-time dashboards.
Once models are trained, the inference infrastructure supports both batch processing for large-scale analysis and real-time endpoints for interactive applications, managing multiple model versions with configurable scaling policies. Interactive maps overlay predictions on satellite imagery, temporal visualizations reveal trends over time, and exportable performance metrics enable validation — all accessible through the web UI, Python SDK, or QGIS plugin integration.
The Geospatial Studio leverages a cloud-native microservices design with a web-based frontend, RESTful backend services for authentication, data management, and job orchestration, and a container-based execution layer. This architecture supports deployment from local workstations (via Lima VM) to institutional compute clusters or cloud infrastructure, with horizontal scaling to serve both individual researchers and multi-team organizational deployments. The platform integrates with standard container orchestration systems for efficient GPU resource utilization across training and inference workloads.
Getting started with Geospatial Studio
Getting started with Geospatial Studio begins with deployment, which can be done either locally on your workstation using Lima VM or on a Kubernetes/OpenShift cluster for production environments. Local deployment using Lima VM is ideal for learning, testing, development, and workshop participation, automatically provisioning all required services (PostgreSQL, MinIO, Keycloak, Redis, MLflow, and GeoServer) within a local Kubernetes environment.
Cluster deployment offers production-grade scalability with support for external cloud services like IBM Cloud Databases, object storage, and enterprise authentication providers. The deployment process takes 10 to 20 minutes for local setups and varies for cluster deployments based on your infrastructure. For detailed instructions, visit the Local Deployment Guide or Cluster Deployment Guide.
After deployment, you can immediately start exploring Geospatial Studio through three flexible interaction methods: the browser-based UI for visual exploration, the REST API for automation, and Python SDK for data science workflows. Begin by accessing the UI, generating your API key, and installing the Python SDK. Running inference is straightforward with just a few lines of code:
from geostudio import Client
# Initialize client
client = Client(geostudio_config_file=".geostudio_config_file")
# Run inference on satellite imagery
inference = client.run_inference(
model_id="your-model-id",
bbox=[-122.5, 37.7, -122.3, 37.9],
start_date="2024-01-01",
end_date="2024-01-31"
)
print(f"Inference status: {inference['status']}")
Our comprehensive hands-on workshop guides you from basic navigation to advanced workflows including dataset onboarding, model checkpoint uploads, and custom model training for real-world applications, such as flood detection and wildfire burn scar mapping. Complete workshop materials are available on the TerraStackAI GitHub page.
Deploying TerraStackAI models through Red Hat AI Inference Server
Models developed and fine-tuned with TerraStackAI can now be deployed at production using Red Hat AI Inference Server (RHAIIS) 3.3. We have contributed a TerraTorch backend to vLLM, the engine underlying RHAIIS, as well as extending its capabilities, so TerraTorch-compatible segmentation/pixelwise regression tasks models, including Prithvi-EO-2.0 and its fine-tuned variants, can be served via RHAIIS 3.3.
This delivers enterprise-grade inference purpose-built for bursty, event-driven Earth and space AI workloads. OpenShift AI autoscaling capabilities ensure the serving infrastructure can scale up rapidly during extreme events and scale back down when idle, to help manage GPU costs.
The integration is fully aligned with TerraStackAI. Models fine-tuned in TerraTorch or Geospatial Studio can be served via RHAIIS, as a bring-your-own model, and the existing Studio APIs and visualization layers will continue to operate seamlessly with the RHAIIS endpoint. The result is a unified path from research innovation to hardened, scalable production deployment, without leaving the TerraStackAI ecosystem.
What’s next
- Try TerraStackAI via the command line: ingest data with TerraKit, fine-tune with TerraTorch, and optimize with Iterate on GitHub.
- Explore TerraStackAI through Geospatial Studio: use guided workflows to curate data, fine-tune models, and deploy scalable inference services on GitHub.
- Read Red Hat’s overview on how Red Hat AI Inference Server accelerates AI inference and enterprise adoption of scientific models.
- Check the official RedHat documentation for step-by-step instructions on serving on Earth and Space models with RedHat AI Inference Server.
Related posts
- Technical noteJeffrey Burns
How IBM Granite became a leader in responsible AI
ExplainerKim MartineauAI for seeing the forest — and the trees
NewsKim MartineauMeet the IBM researchers trying to raise AI’s intelligence-per-watt ratio
Q & AKim Martineau
