Towards a common environment for learning scheduling algorithms
Renato Luiz Cunha, Luiz Chaimowicz
MASCOTS 2020
Kubernetes has become the de facto standard for orchestrating cloud workloads, but its traditional device plugin model struggles to keep pace with the growing diversity of hardware accelerators such as GPUs, DPUs, high-speed networking devices, and emerging AI chips. Static allocation limits flexibility, resource efficiency, and multi-tenancy. This talk introduces Dynamic Resource Allocation (DRA)—a groundbreaking approach that enables fine-grained, on-demand allocation and sharing of devices across workloads, with topology-aware scheduling to optimize performance for complex hardware interconnects. We will dive into the architecture and design principles behind DRA, showcase real-world use cases, and discuss its implications for Telco, HPC and AI. Attendees will learn how DRA can unlock better utilization, scalability, and sustainability in cloud-native environments.
Renato Luiz Cunha, Luiz Chaimowicz
MASCOTS 2020
Pritish Parida, Shurong Tian, et al.
ITherm 2024
S. Hung, S. Mochizuki, et al.
VLSI Technology and Circuits 2025
Ilias Iliadis
CTRQ 2022