Mistral Small 4 Unifies Instruct, Reasoning, and Coding in One 119B MoE Model

Mistral Small 4 combines Magistral reasoning, Pixtral vision, and Devstral coding into a single multimodal model. 128 experts, 256K context, Apache 2.0.

Lisa Thoma

Monday, March 16, 2026·3 min read

Mistral released Mistral Small 4 on March 16, 2026, at NVIDIA GTC. It's the first Mistral model that unifies Magistral (reasoning), Pixtral (multimodal vision), and Devstral (coding) capabilities into a single model. At 119B total parameters with MoE architecture (128 experts, 4 active per token, ~6-8B active), it's remarkably efficient, according to Mistral AI.

One Model, Three Capabilities

Previously, developers using Mistral needed separate models for different tasks: Magistral for reasoning, Pixtral for image understanding, Devstral for coding. Small 4 merges all three into one model with a configurable reasoning_effort parameter that controls how much chain-of-thought thinking the model applies.

This is the same convergence happening across the industry — Claude merged adaptive thinking into its base models, GPT-5 unified reasoning with tool use — but Mistral achieved it at a significantly smaller active parameter count.

MoE Efficiency

128 experts with 4 active per token means only 6-8B parameters run per inference call, despite the model containing 119B total parameters. The 256K context window matches larger models. Apache 2.0 license makes it fully open-source.

This efficiency matters for deployment. Running 6-8B active parameters instead of 70B+ means lower GPU requirements, faster responses, and cheaper inference — the kind of practical advantage that determines which model enterprises actually adopt at scale.

Forge: Enterprise Custom Models

Alongside Small 4, Mistral announced Forge — an enterprise platform for building custom frontier-grade AI models grounded in proprietary data. Forge offers pre-training, post-training, and reinforcement learning capabilities with forward-deployed engineers who embed with customers.

Early adopters include Ericsson, European Space Agency, ASML, Reply, DSO, and HTX. This positions Mistral as the enterprise AI company for organizations that need custom models with data sovereignty — a niche that American competitors struggle to serve from their US-centric infrastructure.

NVIDIA Partnership

Mistral became a founding member of NVIDIA's Nemotron Coalition at GTC 2026, deepening the hardware-software integration that makes Mistral models run efficiently on NVIDIA infrastructure.

Our Take

Mistral Small 4 is what "small" should mean: maximum capability per active parameter. Unifying reasoning, vision, and coding into one efficient model is the right product decision — developers don't want to manage three separate models. The Apache 2.0 license at this capability level is genuinely generous. Forge for enterprise custom models positions Mistral uniquely in the European market, where data sovereignty isn't optional. The question is whether "efficient and open" can compete with "massive and closed" at the frontier.

FAQ

What is Mistral Small 4? Mistral Small 4 is a unified AI model released March 16, 2026, combining reasoning (Magistral), vision (Pixtral), and coding (Devstral) capabilities. It uses MoE architecture with 119B total parameters but only 6-8B active per inference.

Is Mistral Small 4 open source? Yes, Mistral Small 4 is released under the Apache 2.0 license.

What is Forge? Forge is Mistral's enterprise platform for building custom AI models grounded in proprietary data, with pre-training, post-training, and reinforcement learning capabilities.

How does Mistral Small 4 compare to GPT-5 or Claude? Mistral Small 4 is significantly smaller in active parameters (6-8B vs 100B+) and is designed for efficiency rather than maximum capability. It competes on cost-performance ratio rather than raw benchmark scores.

AI LLMs

Mistral Small 4 Unifies Instruct, Reasoning, and Coding in One 119B MoE Model

Mistral Small 4 combines Magistral reasoning, Pixtral vision, and Devstral coding into a single multimodal model. 128 experts, 256K context, Apache 2.0.

Lisa Thoma

Monday, March 16, 2026·3 min read

One Model, Three Capabilities

MoE Efficiency

Forge: Enterprise Custom Models

NVIDIA Partnership

Mistral became a founding member of NVIDIA's Nemotron Coalition at GTC 2026, deepening the hardware-software integration that makes Mistral models run efficiently on NVIDIA infrastructure.

Our Take

FAQ

Is Mistral Small 4 open source? Yes, Mistral Small 4 is released under the Apache 2.0 license.

What is Forge? Forge is Mistral's enterprise platform for building custom AI models grounded in proprietary data, with pre-training, post-training, and reinforcement learning capabilities.

Mistral Small 4 Unifies Instruct, Reasoning, and Coding in One 119B MoE Model

One Model, Three Capabilities

MoE Efficiency

Forge: Enterprise Custom Models

NVIDIA Partnership

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Mistral Small 4 Unifies Instruct, Reasoning, and Coding in One 119B MoE Model

One Model, Three Capabilities

MoE Efficiency

Forge: Enterprise Custom Models

NVIDIA Partnership

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected