Workshop paper

Persona-Conditioned Adversarial Prompting (PCAP): Multi-Identity Red-Teaming for Enhanced Adversarial Prompt Discovery

Abstract

Existing automated red‑teaming pipelines often miss attacks that depend on attacker identity, framing, or multi‑turn tactics. This under-coverage underestimates real‑world risk. We introduce Persona‑Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on attacker personas and strategy cards and runs parallel persona‑conditioned beam searches to discover diverse, transferable jailbreaks. PCAP is orthogonal to the underlying search algorithm and substantially increases attack success and prompt diversity (e.g., ASR on GPT‑OSS~120B from 60%98%\approx60\% \rightarrow \approx98\%), improving attack strategy coverage and diversity.