Gaetano Rossiello, Shankar Subramaniam
ACM CAIS 2026
Proactive large language model (LLM) agents aim to actively plan, query, and interact over mul- tiple turns, enabling efficient task completion be- yond passive instruction following and making them essential for real-world, user-centric appli- cations. Agentic reinforcement learning (RL) has recently emerged as a promising solution for train- ing such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical chal- lenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users’ intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information- gathering capabilities with behavior regulariza- tion to suppress inefficient or redundant interac- tions and align agent behavior with user expecta- tions. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proac- tive, user-aligned LLM agents in complex multi- turn scenarios. Our website: https://proactive- agentic-rl.github.io/.
Gaetano Rossiello, Shankar Subramaniam
ACM CAIS 2026
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Yannis Belkhiter, Seshu Tirupathi, et al.
ICML 2026