ClusterLink: Redefining Application Connectivity for the Multi-cloud Erakfir ToledoPravein Govindan Kannanet al.2025CLOUD 2025
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM InferencePol G. RecasensFerran Agulloet al.2025CLOUD 2025
Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM InferenceYue ZhuHao Yuet al.2025CLOUD 2025