Release
6 minute read

Introducing the IBM Granite 4.1 family of models

IBM’s most expansive model release to date covers new language, vision, speech, embedding, and guardian models — tailored for enterprise workloads.

AI is increasingly at the heart of enterprise applications and software workflows. But even today’s most powerful AI systems rarely rely on a single model or capability. Instead, these systems tend to combine myriad technologies and abilities, including understanding language, perception and retrieval, as well as forecasting, and rigorous safety mechanisms, such as guardrails for harm detection. All of these can work together in tightly integrated AI workflows.

That’s why today IBM released the Granite 4.1 collection, the latest versions of its family of Granite models, that reflect this reality. The release covers small language models (SLMs), as well as Granite speech, vision, embeddings, and Guardian models. The aim is for developers to easily consume these models in real-world, enterprise grade AI systems. And despite their size, these models pack a punch.

Across the collection, Granite 4.1 features impressive language model performance in tool calling and instruction following; state-of-the-art transcription accuracy performance for the Granite speech models; harm detection capabilities delivered via Granite Guardian; and high leaderboard performance for Granite vision in table and chart extraction.  

Language models with impressive instruction following and tool calling capabilities

At the heart of Granite 4.1 is a new generation of dense, decoder‑only language models, offered in 3B, 8B, and 30B parameter base and instruct model sizes. Across weight classes, the models significantly outperform similarly sized Granite 4.0 language models. The team found, for example, that the new Granite 4.1 8B instruct model consistently matches or outperforms the Granite 4.0 32B Mixture‑of‑Experts model, while using a simpler — and therefore more flexible — architecture for fine tuning for downstream tasks.

These models also perform competitively with other open-source, dense, decoder-only models on the market today, including the most recent Gemma and Qwen models, with thinking disabled, in two important metrics for enterprise use: instruction following and tool calling.

While reasoning models have grown in popularity in recent years, their abilities aren’t always the most efficient way to get a result. In enterprise settings, token costs and speed are often as important as performance. That is why turning to less expensive, non-reasoning models with similar benchmark performance for select tasks like instruction following and tool calling makes sense for enterprise users.

Screenshot 2026-04-28 at 5.23.42 PM.png

The performance breakthrough in the Granite 4.1 language models was driven by IBM’s training philosophy. The team prioritized data quality and staged refinement over just the raw amount of data used. The Granite 4.1 models are trained on approximately 15 trillion tokens across multiple phases, beginning with broad pre-training and progressively annealing toward higher-quality, technical, scientific and mathematical data that’s focused on instruction following. The last few training stages help extend the models’ context length to as much as 512K tokens, which ensures the models can work through long documents they’re presented with — without any performance hit on shorter-context tasks.

After pre-training, the models are refined through carefully curated supervised fine-tuning and a multi‑stage reinforcement learning (RL) pipeline. Each RL phase targets a distinct capability — such as how well the models can adhere to instructions, the quality of their ability to hold a conversation, factual accuracy, or mathematical reasoning. This helps to avoid the trade‑offs often introduced in single‑stage optimization. The result is a model family designed not just to answer questions, but to behave reliably across a wide range of enterprise workloads.

“Granite 4.1 delivers competitive instruction‑following and tool‑calling performance without relying on long chains of thought, offering predictable latency, stable token usage, and lower operational cost,” said Rameswar Panda, a distinguished engineer at IBM Research and the key architect of the Granite language models. “This makes it a strong, production‑ready choice for enterprise workloads, where efficiency and reliability matter most.”  

Enterprise AI workflows handle more than just text

Alongside the language models, IBM is releasing updated models across several modalities that commonly appear in end‑to‑end AI systems. These models are also more than capable of handling tasks on their own.

Granite Vision 4.1

This generation of Granite Vision is a vision-language model (VLM) that was specifically designed for document understanding tasks, and in particular understanding information in tables, charts, and key-value pair (KVP) extraction, which includes important structured business information stored in documents, such as  invoice numbers, dates, or names.

“These tasks are essential for automated enterprise pipelines,” said Eli Schwartz, a research manager with the IBM Research multimodal AI group. “Granite Vision can serve as an alternative to frontier models to perform these tasks at scale and at a fraction of the cost.”

Screenshot 2026-04-28 at 9.27.06 AM.png

There are two main components driving Granite Vision 4.1’s performance. The first is a feature injection scheme inspired by DeepStack that distributes visual information across multiple LLM layers, combining semantic grounding with fine-grained spatial detail. The second is the dataset used to train the model. Relying on real examples, as well as synthetically generated KVP, table, and chart data, the team specifically trained Granite Vision 4.1 with enterprise use cases in mind. The team took a similar approach to training these models as their previous versions, albeit with a large increase in training data. The result is models that are now outpacing any other similarly sized models available today.

Along with Granite Vision 4.1, the team also recently released ChartNet, a million-scale chart dataset that was used to train the new models.

Granite Speech 4.1

Alongside vision, IBM Research is releasing a host of Granite Speech 4.1 models. The new models introduce multilingual speech recognition and translation models tuned for use cases on the edge, offering different tradeoffs between throughput, latency, and transcription richness.

Granite Speech 4.1 2B achieves a 5.33% word-error rate (WER), placing it among the top models on the OpenASR Leaderboard. Two additional variants are being released alongside it : Granite Speech 4.1 2B Plus, which adds richer transcription features, and Granite Speech 4.1 2B NAR which trades some of those features for substantially higher throughput. Most transformer models today are autoregressive — meaning they generate one token at a time — but Granite Speech 4.1 2B NAR generates entire sequences at once. The team at IBM Research found that this new structure results in considerably better GPU utilization and a much higher throughput. The team plans to use this new format for even more models in the future.

The new speech models build on a pedigree of models that are punching above their weight. Recently, a team at IBM and Australia’s Royal Flying Doctor Service used an earlier version of Granite Speech to build a transcription engine for clinicians working in the noisy environment found on airplanes. The team chose Granite Speech because it proved in testing to be far better at handling the background noise than any other commercial models available.

Granite Guardian 4.1

Another key element of this release is Granite Guardian 4.1. This new model is a direct replacement for Granite Guardian 3.3 8B, and was fine-tuned on top of Granite 4.1 8B. It expands on its predecessor with additional risk definitions, giving developers a more nuanced signal when evaluating model inputs and outputs.

Like previous Guardian versions, it's designed to act as a moderator model within an AI system, evaluating LLM inputs and outputs for safety, quality, and correctness. This makes it well-suited for use cases like monitoring a customer-facing chatbot for harmful or 'off-policy' responses, or flagging risky outputs before they reach the end user. This approach reflects a broader shift toward treating safety, quality, and correctness as model‑driven problems that can be integrated directly into AI pipelines, rather than bolted on as afterthoughts.

Granite Guardian is designed to run with any language model, regardless of whether its weights are open or proprietary. The guardian model is trained to flag socially biased content, hateful, abusive, or profane language, hallucinations, agentic risks, attempts by users to break through an LLM’s safety controls, as well as several other dimensions catalogued in IBM’s AI Risk Atlas. Earlier versions of the model have topped independent benchmarks for evaluating guardrail models like Granite Guardian.

Granite Embedding Multilingual R2

Granite Embedding Multilingual R2 scales retrieval support to more than 200 languages while dramatically increasing context length, enabling efficient semantic search across large, multilingual document collections. At the smaller end, the 97M‑parameter embedding model shows that careful pruning and training can deliver state‑of‑the‑art retrieval performance even under tight resource constraints. Both models are expected to land at or near the top of the charts on the MTEB leaderboard for their respective sizes.  

A comprehensive approach to enterprise AI

Taken together, this expansive release of Granite 4.1 models represents a system‑level perspective on the role of foundation models, and shows how effective small models fit for purpose can help solve real problems for enterprise. The emphasis is not solely on making any single model larger or more capable, but on enabling modular, efficient, and governable AI systems for the enterprise, that can move from research to deployment with fewer gaps.

All Granite 4.1 models are released under an Apache 2.0 license, reinforcing IBM Research’s commitment to open, transparent innovation. Whether the task requires tool calling, instruction following, harm detection, state-of-the-art transcription accuracy, or table and chart extraction, Granite 4.1 is designed to serve as a practical foundation for the next generation of enterprise AI applications.

You can try out these models on watsonx, Hugging Face, and more, and start putting them to work for your enterprise tasks today.

Related posts