SatWellMCQ: A Vision–Language Satellite Dataset for MCQ‑Based Image Grounding of Oil Wells

Ahmed Emam; Sultan Alrowili; Mathan Kumar Eswaran; Romeo Kienzler; Younes Samih

IGARSS 2026

Conference paper

09 Aug 2026

SatWellMCQ: A Vision–Language Satellite Dataset for MCQ‑Based Image Grounding of Oil Wells

Abstract

Monitoring oil and gas wells is essential for assessing environmental degradation in natural and semi‑natural landscapes and for mitigating long‑term impacts such as methane emissions from abandoned and orphaned wells. While satellite imagery and machine learning provide scalable monitoring capabilities, progress in monitoring oil and gas wells remains limited by the absence of multimodal, multiple‑choice (MCQ) vision–language datasets; most resources are visual‑only, which prevents grounded evaluation and post‑training of vision–language models (VLMs) for well interpretation. To address this gap, we introduce SatWellMCQ, the first MCQ‑based vision–language dataset designed for image‑grounded identification and localization of oil and gas wells. SatWellMCQ pairs satellite imagery with structured textual descriptions and MCQ options for grounded reasoning and systematic evaluation of VLMs, with all samples expert‑verified to ensure that the image, labels, and text are correctly matched. To the best of our knowledge, SatWellMCQ is the first dataset to introduce an MCQ‑based vision–language framework for grounded interpretation of oil and gas well infrastructure. We evaluate SatWellMCQ as a post‑training resource for VLMs and find that while the best zero‑shot large model achieves 0.670 accuracy, a compact model fine‑tuned on our dataset reaches 0.730. This improvement demonstrates that structured MCQ supervision enables effective domain adaptation, allowing smaller VLMs to match or surpass much larger models in oil and gas well interpretation.

Workshop paper