APILOT: Improving the Security and Usability of LLM Code Suggestions via Outdated API MitigationWeiheng BaiKeyang Xuanet al.2025ACSAC 2025
CoP: Agentic Red-teaming for Large Language Models using Composition of PrinciplesChen XiongPin-Yu Chenet al.2025NeurIPS 2025
Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous EvaluationJung koo Kang2025NeurIPS 2025
Vintage Code, Modern Judges: Meta-Validation in Low Data RegimesGal AmramOra Nova Fandinaet al.2025ASE 2025
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationNoy SternlichtAriel Geraet al.2025EMNLP 2025
Agentic Process Observability: Discovering Behavioral VariabilityFabiana FournierLior Limonadet al.2025ECAI 2025
Exposing AI Bias by Crowdsourcing: Democratizing Critique of Large Language ModelsHangzhi GuoPranav Venkitet al.2025AIES 2025
The NorthPole Validator: A Cycle-Accurate Simulator for HW/SW Codesign of a Prescheduled Neural Inference AcceleratorAlexander AndreopoulosMichael Deboleet al.2025HPEC 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?Giacomo CamposampieroMichael Herscheet al.2025NeSy 2025