BOAD: Discovering Hierarchical Software Engineering Agents via Bandit OptimizationIris XuGuangtao Zenget al.2026ICLR 2026Poster
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive SearchMaohao ShenGuangtao Zenget al.2025ICML 2025Conference paper