BenchmarkCards: Standardized Documentation for Large Language Model BenchmarksAnna SokolElizabeth Dalyet al.2025NeurIPS 2025
Forging Time Series with Language: A Large Language Model Approach to Synthetic Data GenerationCécile RousseauTobia Boschiet al.2025NeurIPS 2025
Musings on AI Muses: Support for Human CreativityJohn RichardsJacquelyn Martinoet al.2025NeurIPS 2025
Foundation Models Enabling Multi-Scale Battery Materials Discovery: From Molecules To DevicesVidushi SharmaAndy Teket al.2025NeurIPS 2025
Uncertainty-Aware Prediction of Climate Extremes Using Fine-Tuned Time-Series Foundation ModelsImran NasimJoao Lucas de Sousa Almeida2025NeurIPS 2025
Verifiable Chemical Reasoning through Tool-Calling Agentic WorkflowGabrielle GaudeauShinnosuke Tanakaet al.2025NeurIPS 2025
Carbon-m1: a Massive, Multi-Modal Synthetic Dataset for Complex Polymeric MaterialsNathaniel Park2025NeurIPS 2025
SafeCOMM: Investigating Safety Degradation in Fine-Tuned Telecom Large Language ModelsAladin DjuheraSwanand Ravindra Kadheet al.2025NeurIPS 2025