IBM at KubeCon + CloudNativeCon NA 2025

About

IBM is proud to sponsor KubeCon + CloudNativeCon North America 2025.

The Cloud Native Computing Foundation’s flagship conference brings together adopters and technologists from leading open source and cloud native communities.

Be a part of the conversation as CNCF Graduated, Incubating, and Sandbox Projects unite for four days of collaboration, learning, and innovation to drive the future of cloud native computing.

Agenda

  • Description:

    Steve Bek, IBM 

    As cloud-native architectures grow more complex, developers are under pressure to resolve incidents faster, yet root cause analysis remains a time-consuming, expertise-heavy task. In this keynote, we’ll unveil how Instana’s new Intelligent Incident Investigation, powered by agentic AI, empowers developers to ask natural language questions and instantly surface the insights they need. This breakthrough capability accelerates incident resolution by up to 80%, helping teams cut through operational noise and reduce costly downtime. Join us to explore how AI-driven observability is reshaping the developer experience and redefining what’s possible in modern incident response.

  • Description:

    James Mellinger, IBM 

    Cloud-native development continues to evolve, bringing unprecedented power, but also increasing complexity. In this keynote, we’ll explore emerging patterns and best practices for building resilient, secure applications in hybrid and regulated environments. You’ll learn how AI-driven automation is transforming operational workflows, with a spotlight on IBM Concert’s latest innovations: the CISO Agent and Resilience Agent. These tools leverage architecture diagrams, run books, and app artifacts to generate resilience profiles and monitoring strategies. Illustrating how intelligent agents can enhance reliability and compliance. Whether you're using IBM tools or other platforms, this session offers actionable insights for navigating today’s cloud-native landscape with confidence.

  • Description:

    Alessandro Pomponio, IBM 

    When you let researchers loose on your Kubernetes clusters, it doesn’t take long before the whole place turns into the Wild West: interactive GPU pods left running for days, large CPU-only jobs stampeding onto GPU nodes, and resources vanishing like water in the desert. So we did what any good admin team would: we brought in the sheriffs - Kyverno, Kueue, and Argo CD - to lay down the law and bring order to the frontier.

    In this talk, we’ll share how we used these tools to enforce fine-grained policies, implement fair-share GPU scheduling, and automate governance across our Accelerated Discovery bare-metal clusters. No custom code, no cowboy hacks - just smart policy design and GitOps discipline.

    Whether you’re managing research workloads or just trying to keep your clusters from descending into chaos, this session will show you how policy-as-code can save you a thousand headaches - and maybe a few GPUs too.

  • Description:

    Sunyanan Choochotkaew & Tatsuhiro Chiba, IBM Research 

    llm-d is a community-driven effort to modernize large language models serving at scale—natively within Kubernetes. The core is modular architecture that decouples prefill and decode operations. This disaggregated design unlocks precise tuning of computing and network resources, tailored to the unique demands of each phase.

    But here’s the twist: how precise can it be defined? A GPU unit? A MIG slice? Maybe even something finer? With a new capability proposed for Dynamic Resource Allocation (DRA) framework, resource capacities for compute and network devices can now be dynamically requested and adjusted on the fly. At the same time, the core DRA capability enables device selection based on fine-grained attributes—including topology awareness—eliminating the need for clunky hacks or rigid resource pools.

    In this talk, we will demonstrate how a new capability of DRA makes the llm-d framework more feasible and cost-effective, explore the remaining challenges, share practical insights.

  • Description:

    Carlos Sanchez, Adobe & Kevin Dubois, IBM 

    Your software rollouts to production are probably always flawless, right? For the rest of us, once in a while we do run into issues when releasing code to production. Argo Rollouts is a great tool to help mitigate these issues by progressively delivering software to production, and automatically rolling back new features if anything doesn’t go right.

    Wouldn’t it be nice if we can take this functionality to the next level? We can take advantage of the advances made in Agentic AI and instruct a model to analyze the logs when a rollout fails. Then thanks to the use of agents, it can take action on our behalf, such as fixing the code or the deployment manifests on the fly, creating new PRs and sending notifications. The sky is really the limit.

    Come to this session to learn how to combine Argo Rollouts with Agentic AI to achieve the most seamless release experience yet.

  • Description:

    Martin Hickey, IBM & Junchen Jiang, University of Chicago 

    LLMs are powering copilots, search engines, document understanding, and chatbots. Most real-world AI apps route their workloads through GPU clusters running high-throughput inference engines. For enterprises however, the key concerns are still cost and return on investment (ROI). Welcome to LMCache, an open source LLM serving engine extension which reduces Time to First Token (TTFT) and increases throughput. In this talk, we’ll demonstrate how you can reduce GPU costs and token latency using LMCache. We'll demonstrate LMCache's high-performance KV cache management layer and its integration with well known production inference engines like vLLM and KServe, deployed on a Kubernetes cluster. We'll use real world examples like document analysis and high-speed RAG support. Get a glimpse into the growing community which is the OSS KV caching layer impacting ROI for companies like RedHat, IBM, Google, Nvidia, and CoreWeave.

  • Description:

    Mariusz Sabath & Maia Iyer, IBM Research

    Agentic workflows in cloud-native environments demand robust identity and authorization. This session explores how to move beyond hard-coded credentials by assigning trusted, granular identities to agents acting on behalf of users. We'll dive into strategies for establishing traceability, enforcing least privilege, and enabling auditable decision-making within a zero-trust architecture.

    Focusing on shared agents and tool-calling patterns, we'll demonstrate how SPIRE’s workload identity integrates with user identity to support secure delegation and dynamic, context-aware authorization. You’ll learn how to safeguard agent interactions with external tools and data sources through identity propagation and policy enforcement.

    Through a real-world case study using Llama Stack and the extended Model Context Protocol (MCP), attendees will gain actionable insights to build secure, identity-aware agentic platforms ready for production use.

  • Description:

    Alex Scammon, G-Research; Abhishek Malvankar, IBM Research; Marlow Warnicke, SchedMD; Dan Desjardins, Distributive 

    Data transfer is slow -- so in AI and HPC, data locality matters. As workloads scale, optimizing where and how to run data-heavy workloads in Kubernetes becomes critical. Yet this area remains underexplored. The CNCF Batch Subproject shares findings from our work on data-locality-aware scheduling across clusters. Should we move compute to the data or the data to compute? What are the trade-offs in latency, cost, and efficiency?

    We present methods to test potential policies: splitting jobs, exposing location-aware metadata from compute/storage, and basing scheduling on historical data and pricing. We share early discoveries from real-world tests across regions with limited bandwidth, storage, and power.

    If your workloads are bottlenecked by data gravity -- or you’re chasing GPU efficiency across sites -- join us to explore emerging patterns for intelligent, cost-aware data placement in Kubernetes.

Upcoming events