VFIO: Very Frightening I/O? Taming Wild Guests and their PCIe Config-Space AbuseChathura RajapakshaSandhya Koteshwaraet al.2025LPC 2025
Cost-Efficient LLM Training with Lifetime-Aware Tensor Offloading via GPUDirect StorageZiqi YuanHaoyang Zhanget al.2025NeurIPS 2025
To Virtualize or Not to Virtualize: Experiences from Building Two Generations of Virtualized Infrastructure for LLM TrainingApoorve MohanMing-Hung Chenet al.2025SC 2025
From Device Passthrough to Host Passout: Exploring RAS Risks in {High-Performance, Virtualized} AI-SystemsChathura RajapakshaSandhya Koteshwaraet al.2025OSDI 2025
STRonG: System Topology Risk Analysis on GraphsLars SchneidenbachSandhya Koteshwaraet al.2024CCGrid 2024
To virtualize or not to virtualize AI Infrastructure: A perspectiveSeetharami SeelamApoorve Mohanet al.2023ISCA 2023
How to Deploy a High-performance Distributed AI Training Cluster with NVIDIA GPUs and KVMApoorve MohanMatthew Sheard2022NVIDIA GTC 2022