Workshop

A Case for a Simulation-Driven Exploration of Distributed GenAI Platforms

Abstract

The rapid adoption of Generative AI (GenAI) workloads has driven the emergence of inference serving platforms like llm-d and Nvidia Dynamo. However, exploring the design space for these platforms, especially for large-scale, multi-layer optimizations, remains prohibitively expensive and slow due to limited hardware access and high engineering overheads. Current evaluation methods often focus on isolated components, failing to capture the complex interplay between hardware components, scheduling policies, and dynamic GenAI workloads.

We argue that the design space exploration of GenAI platforms can be accelerated by leveraging a simulation-based approach that offers a fast, cheap, and scalable methodology to rapidly prototype and validate new ideas. To this end, we present Opal, an open-source, discrete-event simulation framework. Unlike prior simulators, Opal models interactions across multiple layers of the inference stack from hardware to workloads, enabling a holistic analysis of system-level behaviors and trade-offs. Opal is designed to be simple, extensible, reproducible, and fast, allowing researchers to rapidly explore a wide range of deployment scenarios and optimization strategies. In this paper, we present our motivation and Opal's design, and seek feedback from the community on open challenges.