Reducing the Scope of Language Models
David Yunis, siyu huo, et al.
AAAI 2026
Transformers based on the self-attention mechanism have become foundational models across a wide range of domains, thereby creating an urgent need for effective formal verification techniques to better understand their behavior and ensure safety guarantees. In this paper, we propose two parameterized linear abstract domains for the inner products in the self-attention module, aiming to improve verification precision. The first one constructs symbolic quadratic upper and lower bounds for the product of two scalars, and then derives parameterized affine bounds using tangents. The other one constructs parameterized bounds by interpolating affine bounds proposed in prior work. We evaluate these two parameterization methods and demonstrate that both of them outperform the state-of-the-art approach which is regarded as optimal with respect to a certain mean gap. Experimental results show that, in the context of robustness verification, our approach is able to verify many instances that cannot be verified by existing methods. In the interval analysis, our method achieves tighter results compared to the SOTA, with the strength becoming more pronounced as the network depth increases.
David Yunis, siyu huo, et al.
AAAI 2026
Wenjun Li, Kangrui Wang, et al.
AAAI 2026
Zhiyuan He, Yijun Yang, et al.
ICML 2024
Sihui Dai, Saeed Mahloujifar, et al.
ICML 2023