Retention Score: Quantifying Jailbreak Risks for Vision Language ModelsZhaitang LiPin-Yu Chenet al.2025AAAI 2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language ModelsXiaomeng XuPin-Yu Chenet al.2025AAAI 2025
Neural Reasoning Networks: Efficient interpretable neural networks with automatic textual explanationsSteve CarrowKyle Harper Erwinet al.2025AAAI 2025
Agent Trajectory Explorer: Visualizing and Providing Feedback on Agent TrajectoriesMichael DesmondJa Young Leeet al.2025AAAI 2025
Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud ResourcesAmadou BaPavithra Harshaet al.2025AAAI 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic HiringBuse KorkmazRahul Nairet al.2025AAAI 2025
Adaptive PII Mitigation Framework for Large Language ModelsShubhi AsthanaRuchi Mahindruet al.2025AAAI 2025
Epistemic Bias as a Means for the Automated Detection of Injustices in TextKenya AndrewsLamogha Chiazor2025AAAI 2025