SemCLIP: A Semantic Memory-Aligned Vision Language Model
Tanveer Syeda-Mahmood, Niharika DSouza, et al.
NeurIPS 2025
We present 3DGrid-LLM, a multimodal foundation model designed to integrate natural language with three-dimensional electron density grids for applications in molecular and materials science. The architecture extends a large decoder-only language model by incorporating discrete volumetric representations obtained through a 3D VQGAN, enabling joint token-level processing of spatial and textual modalities within a unified framework. Pre-trained on a diverse corpus of molecular and materials datasets, 3DGrid-LLM supports bidirectional text–grid generation, multimodal question answering, and retrieval-augmented 3D reconstruction. Comprehensive evaluations demonstrate consistent improvements over baseline methods in multimodal VQA, chemically informed text generation, and property-aligned retrieval tasks, yielding outputs that are both accurate and physically consistent.
Tanveer Syeda-Mahmood, Niharika DSouza, et al.
NeurIPS 2025
Giovanni De Felice, Arianna Casanova Flores, et al.
NeurIPS 2025
Sarath Swaminathan, Nathaniel Park, et al.
NeurIPS 2025
Ramon Nartallo-kaluarachchi, Robert Manson Sawko, et al.
NeurIPS 2025