Language Agnostic Code Embeddings
Saiteja Utpala, Alex Gu, et al.
NAACL 2024
As part of an effort to develop Foundation Models (FMs) for each NASA Science Mission Directorate division, members of the Lunar FM First-Look Science Team (FLST) for the Planetary Science Division (PSD) are developing lunar science benchmark datasets relevant to downstream science applications. Benchmark datasets are publicly available, standardized collections of data specially formatted for use by machine learning models that provide a means of comparing model architectures and performance for specific tasks, such as object detection, image classification, and image segmentation. However, to date few benchmark datasets exist that utilize data from PSD missions or directly address downstream science applications relevant to decadal level science questions. Here, we report progress by the FLST to develop benchmark datasets for common tasks relevant to lunar science applications, and lunar scientists in particular, such as: surface processes, lunar volcanism, and polar volatiles. These benchmark datasets are being constructed from datasets in the literature and new datasets constructed in this effort including crater mapping and counting results in various modalities, surface geologic maps, and digital terrain models. We aim to develop and include additional benchmark datasets relevant to unanticipated downstream applications by working with the scientific community in order to motivate the development of future planetary science FMs that better serve the needs of users in the scientific community.
Saiteja Utpala, Alex Gu, et al.
NAACL 2024
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Vidushi Sharma, Andy Tek, et al.
NeurIPS 2025
Takuma Udagawa, Aashka Trivedi, et al.
EMNLP 2023