Byungchul Tak, Shu Tao, et al.
IC2E 2016
Existing automated red‑teaming pipelines often miss attacks that depend on attacker identity, framing, or multi‑turn tactics. This under-coverage underestimates real‑world risk. We introduce Persona‑Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on attacker personas and strategy cards and runs parallel persona‑conditioned beam searches to discover diverse, transferable jailbreaks. PCAP is orthogonal to the underlying search algorithm and substantially increases attack success and prompt diversity (e.g., ASR on GPT‑OSS~120B from ), improving attack strategy coverage and diversity.
Byungchul Tak, Shu Tao, et al.
IC2E 2016
Yannis Belkhiter, Seshu Tirupathi, et al.
ICML 2026
Vidushi Sharma, Andy Tek, et al.
NeurIPS 2025
Kristjan Greenewald, Yuancheng Yu, et al.
NeurIPS 2024