GPU OPTIMIZATIONS FOR EFFICIENT AND COST-EFFECTIVE ACCESS TO DIVERSE LARGE LANGUAGE MODELS IN RESEARCH CLUSTERChen WangYue Zhuet al.2024MLSys 2024
DEFT: SLO-Driven Preemptive Scheduling for Containerized DNN ServingYitian HaoWenqing Wuet al.2023NSDI 2023
A Chaos Recommendation Tool for Reliability Testing in Large-Scale Cloud-Native SystemsMudit VermaSandeep Hanset al.2024COMSNET 2024
Dynamic- X-Y: A Tool for Learning Dynamic Alert Suppression Policies in AIOpsKaran BhukarHarshit Kumaret al.2024COMSNET 2024
A Reliability Assurance Framework for Cloud-Native Telco WorkloadsMudit VermaDushyant Behlet al.2023COMSNETS 2023
Reinforcement learning for resource management in multi-Tenant serverless platformsHaoran QiuWeichao Maoet al.2022EuroMLSys 2022
WARDEN: Warranting Robustness Against Deception in Next-Generation SystemsHazar YuekselRamon Bertranet al.2020MLSys 2020