News
5 minute read

How IBM Research helped prepare the IBM z17 for tomorrow’s workloads

At the heart of IBM z17 lies the IBM Telum II processor and the IBM Spyre Accelerator. The culmination of years of work by dozens of researchers, these devices are designed to execute AI workloads at unprecedented speed and scale.

IBM Telum II (left) and Spyre (right) chips come out of IBM Research's efforts to build AI-centric hardware.

Today, IBM announced the release of z17, the latest version of the mainframe that powers enterprises and handles 70% of the world’s financial transactions.1 The brains of the new system’s AI accelerators were conceived and co-developed at IBM Research: the Telum II processor’s embedded AI accelerator core, and the Spyre Accelerator, available in Q4 2025.

With these powerful components, z17 will unlock an era of AI applications for enterprise. In industries where data needs to remain secured but accessible with the lowest latency possible, z17 with Spyre will enable businesses to run generative AI models and agentic AI on premises.

For z17, the 32-core Spyre accelerator will be made available as an optional PCIe card, and additional cards can be added as needed. Spyre and Telum II build on the success of Telum, which powers the z16 system and features an industry-first on-chip AI accelerator. That accelerator formed the basis for Spyre, an AI accelerator that uses low-precision computing and AI-centric architecture for low-latency inferencing.

In designing these powerful devices, the IBM Research team behind the Telum II on-chip and Spyre accelerators helped overcome a major challenge: bringing the power of AI to the point of transaction for IBM’s infrastructure clients, at unprecedented speed and scale. All the while, AI continues evolving. Regardless of how specific AI models rise or fall in relevance, they designed something to handle tomorrow’s workloads, not just today’s.

“It’s always like throwing a dart, but I think we have the best dart throwers we can imagine here,” says Leland Chang, principal research staff member and senior manager in AI Hardware Design.

A complete solution

“We built a complete accelerator,” says Jeff Burns, director of the IBM Research AI Hardware Center. “It’s a system-on-chip chip, and a PCIe card, and a compiler, and a runtime, and a device driver — and so on.” These features, Burns points out, make it so data scientists will be able to use Spyre without doing anything special.

Telum II and Spyre achieve their AI inference capabilities through software and hardware co-design, the result of collaboration across IBM teams and directly with input from IBM Z clients. There’s a whole software stack sitting behind the heart of z17, and central to the development of Telum II and Spyre was the right balance between hardware and software innovation. “Based on priorities and objectives, we determine whether a layer of software can be enhanced to exercise the compute capabilities in the hardware,” adds distinguished research scientist Viji Srinivasan.

And the results speak for themselves. In early tests, a prototype Spyre processed more than three times as many images per second per watt of electricity used than high-end GPUs. Given enormous projections of energy required to power AI workloads, the team knew they were onto a possible solution.

IBM Z Blueprints Views-01.png
The Telum II processor, mounted on a dual-chip module, in the new IBM z17.

An AI-specific chip

AI requires much more processing power than everyday applications. IBM Research has been hard at work on AI-specific chips for close to a decade and launched the IBM Research AI Hardware Center in 2019 to concentrate on fulfilling the impending energy demands of AI with more efficient technology.

Part of this AI hardware strategy involves low-precision computing, which can greatly improve the power efficiency of systems running AI computations. Before the AI Hardware Center even existed, the road to Spyre began in 2015 with a two-page white paper within IBM Research. Based on a quick write-up by Burns and his group, his team got the green light to investigate the feasibility of approximate computing for deep learning.

Their hypothesis was that designing low-precision hardware from scratch would yield greater power performance for deep learning than trying to do approximate computing on the available GPUs or CPUs.

GPUs, well suited to running multiple processes in parallel, have been popular for running AI workloads. But to Burns and his colleagues, something else was clearly needed. This idea previously proved itself with graphics processing on GPUs, which are less general than CPUs yet more general than the needs of AI. So why not a compute core specifically designed for deep learning?

This idea meshed with what IBM Infrastructure clients wanted from IBM Z — AI inferencing, enhanced security, and virtualization features. The team optimized a chip with those things in mind. Researchers called it an AIU, or artificial intelligence unit. After Burns and his team built their first working version, additional members of the IBM Infrastructure team got involved, including many on the software side, to help build out the total package.

Image 3.jpg
Telum II, shown here on a dual-chip module, builds on the success of the first Telum, an on-chip accelerator in z16.

Designed for tomorrow’s workloads

Ever since the AI Hardware Center was founded in 2019, it was clear that the tides shifted quickly, says program director John Rozen. “We saw a change coming in the compute workloads, even before ChatGPT,” he recalls. “And even though a system-on-chip wasn’t what we originally set out to do, we listened to our partners, and it paid off with the Spyre Accelerator and its 32 cores.”

Timeline is a major challenge in AI chip design, because workloads are changing quickly but chips take years to develop. For this reason, watsonx has served as a guiding light amid the shifting winds of AI, Chang says. When the team was designing Spyre to optimize for a specific AI inference benchmark, in as little as two months’ time that goal would totally change. “This has been the biggest roller coaster ride of my career,” says Chang.

The watsonx team’s AI Roadmap, developed years earlier, provided essential guidance: In 2025, it postulated that purpose-built hardware would help generative AI scale in new ways, potentially beyond transformers. And in 2026, it predicted the eminence of robust, strategic reasoning models. The IBM Infrastructure team has also been invaluable because they know their clients’ needs so well — and continue to anticipate what they’ll need years in the future.

The Spyre accelerator is designed to handle the emerging AI workloads that z17 clients will bring to the platform. It is optimized for generative and agentic AI, for example, rather than models that the field is cooling on — classification models, for example.

And while AI models have mostly gotten larger over the past decade, the world is now also moving toward smaller, fit-for-purpose models. At the same time, the industry is seeing a rise in mixture of expert models and state space models, whose ideal uses and full capabilities are still being explored. All of these developments are reflected in the road map that helped guide the development of Spyre, which in turn has these capabilities baked in. Today’s mainframes are essential for industries including financial services and healthcare. The set of AI use cases is constantly expanding, with more than 250 for IBM Z including advanced financial fraud detection, money laundering prevention, and credit risk scoring, to name a few.

In the case of Spyre, Rozen credits teams like Red Hat and watsonx for helping to develop the accelerator to where it is today, into the new IBM z17. “We didn't know it was going to be picked up as part of our product,” he says. “We just knew it was the right thing to do.”

Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

References

  1. "Operationalizing Fraud Prevention on IBM Z," an IBM commissioned report by Celent. March 2022.