ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

Ido Levy; Ben Wiesel; Sami Marreed; Alon Oved; Avi Yaeli; Segev Shlomov

ICLR 2026

Conference paper

23 Apr 2026

ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

Abstract

Autonomous web agents solve complex browsing tasks, yet existing benchmarks measure only whether an agent finishes a task, ignoring whether it does so safely or in a way enterprises can trust. To integrate these agents into critical workflows, safety and trustworthiness (ST) are prerequisite conditions for adoption. We introduce ST-WebAgentBench, a configurable and extensible framework designed as a first step toward enterprise-grade evaluation. Each of its 375 tasks carries one or more ST policies (3,057 in total), concise rules encoding constraints, and is scored along six orthogonal dimensions (e.g., user consent, robustness). Tasks span three difficulty tiers for fine-grained capability profiling, and a “Modality Challenge” disentangles vision-only from DOM-only information retrieval, isolating the contribution of each perceptual modality to agent failures. Beyond raw task success, we propose the Completion Under Policy (CuP) metric, which credits only completions that respect all applicable policies, and the Risk Ratio, which quantifies ST breaches across dimensions. Evaluating three open state-of-the-art agents shows their average CuP is less than two-thirds of their nominal completion rate, revealing substantial safety gaps. To support growth and adaptation to new domains, ST-WebAgentBench provides modular code and extensible templates that enable new workflows to be incorporated with minimal effort, offering a practical foundation for advancing trustworthy web agents at scale.

Conference paper