From Device Passthrough to Host Passout: Exploring RAS Risks in {High-Performance, Virtualized} AI-SystemsChathura RajapakshaSandhya Koteshwaraet al.2025OSDI 2025
Vela: A Virtualized LLM Training System with GPU Direct and RoCEApoorve MohanRobert Walkupet al.2025ASPLOS 2025
STRonG: System Topology Risk Analysis on GraphsLars SchneidenbachSandhya Koteshwaraet al.2024CCGrid 2024
To virtualize or not to virtualize AI Infrastructure: A perspectiveSeetharami SeelamApoorve Mohanet al.2023ISCA 2023