Publications

848 results for Trustworthy AI

Retention Score: Quantifying Jailbreak Risks for Vision Language Models
- - Zhaitang Li
  - Pin-Yu Chen
  - et al.
- 2025
- AAAI 2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
- - Xiaomeng Xu
  - Pin-Yu Chen
  - et al.
- 2025
- AAAI 2025
Neural Reasoning Networks: Efficient interpretable neural networks with automatic textual explanations
- - Steve Carrow
  - Kyle Harper Erwin
  - et al.
- 2025
- AAAI 2025
Agent Trajectory Explorer: Visualizing and Providing Feedback on Agent Trajectories
- - Michael Desmond
  - Ja Young Lee
  - et al.
- 2025
- AAAI 2025
Usage Governance Advisor: from Intent to AI Governance
- - Elizabeth Daly
  - Sean Rooney
  - et al.
- 2025
- AAAI 2025
Usage Governance Advisor: From Intent to AI Governance
- - Elizabeth Daly
  - Sean Rooney
  - et al.
- 2025
- AAAI 2025
Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources
- - Amadou Ba
  - Pavithra Harsha
  - et al.
- 2025
- AAAI 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
- - Buse Korkmaz
  - Rahul Nair
  - et al.
- 2025
- AAAI 2025
Adaptive PII Mitigation Framework for Large Language Models
- - Shubhi Asthana
  - Ruchi Mahindru
  - et al.
- 2025
- AAAI 2025
Epistemic Bias as a Means for the Automated Detection of Injustices in Text
- - Kenya Andrews
  - Lamogha Chiazor
- 2025
- AAAI 2025