IBM at Open Source Summit NA 2025

About

Open Source Summit is the premier event for open source developers, technologists, and community leaders to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. It is the gathering place for open-source code and community contributors.

Visit the conference website

Agenda

  • In the rapidly evolving landscape of Generative AI, it is becoming clearer every day that open strategies are key drivers of innovation and widespread adoption. This session will present the Generative AI Commons, its activities, and deliverables to date, including the Model Openness Framework (MOF) and the Responsible Generative AI Framework,

    LF AI & Data initiative dedicated to fostering the democratization, advancement and adoption of efficient, secure, reliable, and ethical Generative AI open source innovations through neutral governance, open and transparent collaboration and education.

    Speakers: Arnaud Le Hors, IBM Research & Ofer Hermoni, iForAI

  • Software supply chain attack is an emerging threat for today’s enterprises. An attacker first gets an internal network access of the target enterprise, typically by using social engineering. Next the attacker gets administrator access to a software supply chain of the enterprise. Finally the attacker injects a backdoor into a built artifact and steals confidential information or digital assets from the enterprise, or even worse from customers.

    A critical attack surface here is the administrator of the software supply chain. Confidential Containers is an open source project to protect containers from administrators by using trusted execution environments (TEEs). It protects a Kubernetes pod from a cluster administrator by running the pod inside of a TEE and validating the pod by remote attestation.

    This talk presents a use case of Confidential Containers to protect a Tekton task. You will understand how Confidential Containers protects a task and artifacts even when the cluster administrator is compromised.

    Speaker: Tatsushi Inagaki, IBM Research

  • Large Language Models (LLM) require preprocessing vast amounts of data, a process that can span days due to its complexity and scale, often involving PetaBytes of data. This talk demonstrates how Kubeflow Pipelines (KFP) simplify LLM data processing with flexibility, repeatability, and scalability. These pipelines are being used daily at IBM Research to build indemnified LLMs tailored for enterprise applications.

    Different data preparation toolkits are built on Kubernetes, Rust, Slurm, or Spark. How would you choose one for your own LLM experiments or enterprise use cases and why should you consider Kubernetes and KFP?

    This talk describes how open source Data Prep Toolkit leverages KFP and KubeRay for scalable pipeline orchestration, e.g. deduplication, content classification, and tokenization.

    We share challenges, lessons, and insights from our experience with KFP, highlighting its applicability for diverse LLM tasks, such as data preprocessing, RAG retrieval, and model fine-tuning.

    Speakers: Anish Asthana, Red Hat & Alexey Roytman, IBM Research

Upcoming events

View all events