PRIGUARDAGENT: CONTEXT-AWARE PRIVACY GUARDRAILS FOR AGENTIC SYSTEMS

Chulin Xie; Amit Dhurandhar; Bo Li

ICLR 2026

Workshop paper

23 Apr 2026

PRIGUARDAGENT: CONTEXT-AWARE PRIVACY GUARDRAILS FOR AGENTIC SYSTEMS

Abstract

The transition from Large Language Models (LLMs) to autonomous agents capable of tool execution has introduced complex, dynamic privacy risks that traditional safeguards fail to address. While existing defenses rely on static PII filters or rigid guardrail models, they often lack the contextual reasoning required to detect subtle privacy violations in agentic workflows. To bridge this gap, we introduce PriGuardAgent, an agentic privacy guardrail framework designed to detect risks in autonomous systems proactively. PriGuardAgent leverages the Model Context Protocol (MCP) to unify diverse analysis tools—such as PII detection, data minimization, and compliance checking—into a plug-and-play architecture, enabling a dynamic planner to orchestrate specialized tools tailored to the interaction context. Furthermore, we incorporate a retrieval-augmented memory module that grounds decision-making in successful past analysis trajectories, effectively balancing precision and recall. Comprehensive evaluations on the PrivacyLens benchmark demonstrate that PriGuardAgent significantly outperforms existing guard models and single-turn detection models. Specifically, PriGuardAgent achieves an average F1 score of 0.715 across Llama3, Mistral, and Zephyr agents, surpassing prompt-engineered privacy analysis models (averaged F1 0.629) and specialized guardrails such as WildGuard (F1 0.284) and Qwen3Guard (F1 0.162). These results showcase the potential of dynamic agentic reasoning equipped workflows for safeguarding privacy in next-generation agentic applications.

Conference paper