From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production
- Segev Shlomov
- Alon Oved
- et al.
- 2026
- IAAI 2026
Ido Levy is an AI Research Scientist at IBM Research–Haifa, where he designs and builds generalist computer use agents that reason, plan, and act autonomously. He co-created IBM CUGA, the first enterprise-ready agent to outperform OpenAI Operator on standard web-navigation benchmarks, and created ST-WebAgentBench, the field’s reference suite for safety and trust evaluation.
Before IBM he was an NLP data scientist at GE Healthcare, developing drift- detection models and MLOps pipelines for clinical text. Ido is also a graduate student in Data Science (M.Sc., Technion, advisers Yonatan Belinkov & Ron Meir) and holds a fast-track B.Sc. in Data Science & Engineering.
Research interests: generative AI · multi-agent orchestration · emergent communication · trustworthy AI · large-language-model tooling.