Mistral Small 4 Unifies Instruct, Reasoning, and Coding in One 119B MoE Model
Mistral Small 4 combines Magistral reasoning, Pixtral vision, and Devstral coding into a single multimodal model. 128 experts, 256K context, Apache 2.0.
Sarah Mueller
Mistral released Mistral Small 4 on March 16, 2026, at NVIDIA GTC. It's the first Mistral model that unifies Magistral (reasoning), Pixtral (multimodal vision), and Devstral (coding) capabilities into a single model. At 119B total parameters with MoE architecture (128 experts, 4 active per token, ~6-8B active), it's remarkably efficient, according to Mistral AI.
One Model, Three Capabilities
Previously, developers using Mistral needed separate models for different tasks: Magistral for reasoning, Pixtral for image understanding, Devstral for coding. Small 4 merges all three into one model with a configurable reasoning_effort parameter that controls how much chain-of-thought thinking the model applies.
This is the same convergence happening across the industry — Claude merged adaptive thinking into its base models, GPT-5 unified reasoning with tool use — but Mistral achieved it at a significantly smaller active parameter count.
MoE Efficiency
128 experts with 4 active per token means only 6-8B parameters run per inference call, despite the model containing 119B total parameters. The 256K context window matches larger models. Apache 2.0 license makes it fully open-source.
This efficiency matters for deployment. Running 6-8B active parameters instead of 70B+ means lower GPU requirements, faster responses, and cheaper inference — the kind of practical advantage that determines which model enterprises actually adopt at scale.
Forge: Enterprise Custom Models
Alongside Small 4, Mistral announced Forge — an enterprise platform for building custom frontier-grade AI models grounded in proprietary data. Forge offers pre-training, post-training, and reinforcement learning capabilities with forward-deployed engineers who embed with customers.
Early adopters include Ericsson, European Space Agency, ASML, Reply, DSO, and HTX. This positions Mistral as the enterprise AI company for organizations that need custom models with data sovereignty — a niche that American competitors struggle to serve from their US-centric infrastructure.
NVIDIA Partnership
Mistral became a founding member of NVIDIA's Nemotron Coalition at GTC 2026, deepening the hardware-software integration that makes Mistral models run efficiently on NVIDIA infrastructure.
Our Take
Mistral Small 4 is what "small" should mean: maximum capability per active parameter. Unifying reasoning, vision, and coding into one efficient model is the right product decision — developers don't want to manage three separate models. The Apache 2.0 license at this capability level is genuinely generous. Forge for enterprise custom models positions Mistral uniquely in the European market, where data sovereignty isn't optional. The question is whether "efficient and open" can compete with "massive and closed" at the frontier.
FAQ
What is Mistral Small 4? Mistral Small 4 is a unified AI model released March 16, 2026, combining reasoning (Magistral), vision (Pixtral), and coding (Devstral) capabilities. It uses MoE architecture with 119B total parameters but only 6-8B active per inference.
Is Mistral Small 4 open source? Yes, Mistral Small 4 is released under the Apache 2.0 license.
What is Forge? Forge is Mistral's enterprise platform for building custom AI models grounded in proprietary data, with pre-training, post-training, and reinforcement learning capabilities.
How does Mistral Small 4 compare to GPT-5 or Claude? Mistral Small 4 is significantly smaller in active parameters (6-8B vs 100B+) and is designed for efficiency rather than maximum capability. It competes on cost-performance ratio rather than raw benchmark scores.