Workshop paper

GeoServe: Leveraging Disaggregated Data Processing for Scalable Geospatial Model Serving

Abstract

Geospatial foundation models (GFMs) operate on large, multi-band raster products (e.g., GeoTIFF) that require expensive data access and preprocessing – reprojection, decoding, normalization, and tiling – before GPU inference. In our measurements, reading and preprocessing geospatial inputs can be orders of magnitude slower than tokenization or standard image preprocessing, and constitute 31 − 43% of end-to-end request time for a representative GFM. Exist- ing inference frameworks such as vLLM execute this preprocessing inline with request handling, which under load serializes CPU I/O work, increases queueing delay, and leaves GPUs underutilized. We present GeoServe, a serving system based on Ray that disaggregates the geospatial data pipeline from GPU inference by offloading I/O- and CPU-heavy preprocessing to a scalable pool of CPU workers while keeping GPU nodes dedicated to model forward passes. We show experimentally that GeoServe reduces the p90 request latency by up to 262.8× at high load and improves throughput by up to 4.89× compared to vanilla vLLM, while increasing the achieved model forward-pass rate from ∼ 16 inf./sec to ∼ 74 inf./sec via better batching opportunities.