Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Geospatial foundation models (GFMs) operate on large, multi-band raster products (e.g., GeoTIFF) that require expensive data access and preprocessing – reprojection, decoding, normalization, and tiling – before GPU inference. In our measurements, reading and preprocessing geospatial inputs can be orders of magnitude slower than tokenization or standard image preprocessing, and constitute 31 − 43% of end-to-end request time for a representative GFM. Exist- ing inference frameworks such as vLLM execute this preprocessing inline with request handling, which under load serializes CPU I/O work, increases queueing delay, and leaves GPUs underutilized. We present GeoServe, a serving system based on Ray that disaggregates the geospatial data pipeline from GPU inference by offloading I/O- and CPU-heavy preprocessing to a scalable pool of CPU workers while keeping GPU nodes dedicated to model forward passes. We show experimentally that GeoServe reduces the p90 request latency by up to 262.8× at high load and improves throughput by up to 4.89× compared to vanilla vLLM, while increasing the achieved model forward-pass rate from ∼ 16 inf./sec to ∼ 74 inf./sec via better batching opportunities.
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Miao Guo, Yong Tao Pei, et al.
WCITS 2011