Meta Ships Llama 4: Scout Fits on One GPU, Maverick Beats GPT-4o

Llama 4 introduces MoE architecture with three models. Scout has a 10M token context window. Maverick's 128 experts beat GPT-4o on LMArena. Behemoth is still training.

Lisa Thoma

Saturday, April 5, 2025·3 min read

Meta released the Llama 4 family on April 5, 2025, marking a fundamental architecture shift: Llama 4 uses Mixture of Experts (MoE) for the first time. Three models shipped or were announced — Scout, Maverick, and Behemoth — each targeting a different scale point, according to Meta AI.

Llama 4 Scout: 10M Context on One GPU

Scout is the practical breakthrough. At 17B active parameters with 16 experts, it fits on a single H100 GPU while offering a 10M token context window — 50x larger than most competitors. That's enough to process entire codebases, book-length documents, or months of conversation history in a single prompt.

The 10M context window is particularly significant for enterprise applications that need to reason over massive document collections without retrieval-augmented generation (RAG) pipelines.

Llama 4 Maverick: Competing With Closed Models

Maverick scales up to 17B active parameters with 128 experts (400B total parameters). It beat GPT-4o and Gemini 2.0 Flash on LMArena with an Elo score of 1,417, making it the first open-source model to consistently outperform leading closed-source models on competitive benchmarks.

Maverick is natively multimodal — handling text, image, and video — pre-trained on 30T+ tokens across 200 languages. This is 2x the training data of Llama 3.

Behemoth: Still Training

Llama 4 Behemoth was announced but not yet released. At 288B active parameters and approximately 2T total parameters, it already outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks despite being mid-training.

The Open Source Statement

All released Llama 4 models are open-source, continuing Meta's strategy of undermining the commercial moat of closed-source providers. By March 2025, Llama had passed 1 billion cumulative downloads — making it the most widely deployed AI model family in history.

Llama 4 is used in government (GSA partnership for federal agencies), military (expanded to NATO allies and Five Eyes+ nations), and space (deployed on the International Space Station via a partnership with Booz Allen and HPE).

LlamaCon: The Ecosystem Event

Alongside the model launch, Meta held LlamaCon (April 29) where it announced the Llama API (limited preview), performance partnerships with Cerebras and Groq for faster inference, security tools (Llama Guard 4, LlamaFirewall), and the Meta AI app.

Our Take

Llama 4 is Meta's strongest argument that open-source AI can compete with and beat closed-source models. Maverick beating GPT-4o is a milestone — it means the best freely available model now outperforms what was the best model in the world just a year ago. Scout's 10M context window on a single GPU is the kind of practical innovation that enterprises actually need. The question is whether Behemoth, when it ships, can compete with Claude Opus and GPT-5 at the frontier.

FAQ

What is Llama 4 Scout? Llama 4 Scout is Meta's efficient open-source model with 17B active parameters, 16 experts, and a 10M token context window. It fits on a single H100 GPU.

How does Llama 4 Maverick compare to GPT-4o? Maverick beat GPT-4o and Gemini 2.0 Flash on LMArena with an Elo score of 1,417. It has 400B total parameters with 128 experts.

Is Llama 4 open source? Yes, all released Llama 4 models are open-source. Llama has surpassed 1 billion cumulative downloads as of March 2025.

What is Llama 4 Behemoth? Behemoth is the largest Llama 4 model with 288B active parameters and ~2T total parameters. It was announced but not yet released as of April 2025, already outperforming GPT-4.5 on STEM benchmarks during training.

AI LLMs

Meta Ships Llama 4: Scout Fits on One GPU, Maverick Beats GPT-4o

Llama 4 introduces MoE architecture with three models. Scout has a 10M token context window. Maverick's 128 experts beat GPT-4o on LMArena. Behemoth is still training.

Lisa Thoma

Saturday, April 5, 2025·3 min read

Llama 4 Scout: 10M Context on One GPU

The 10M context window is particularly significant for enterprise applications that need to reason over massive document collections without retrieval-augmented generation (RAG) pipelines.

Llama 4 Maverick: Competing With Closed Models

Maverick is natively multimodal — handling text, image, and video — pre-trained on 30T+ tokens across 200 languages. This is 2x the training data of Llama 3.

Behemoth: Still Training

The Open Source Statement

LlamaCon: The Ecosystem Event

Our Take

FAQ

What is Llama 4 Scout? Llama 4 Scout is Meta's efficient open-source model with 17B active parameters, 16 experts, and a 10M token context window. It fits on a single H100 GPU.

How does Llama 4 Maverick compare to GPT-4o? Maverick beat GPT-4o and Gemini 2.0 Flash on LMArena with an Elo score of 1,417. It has 400B total parameters with 128 experts.

Is Llama 4 open source? Yes, all released Llama 4 models are open-source. Llama has surpassed 1 billion cumulative downloads as of March 2025.

Meta Ships Llama 4: Scout Fits on One GPU, Maverick Beats GPT-4o

Llama 4 Scout: 10M Context on One GPU

Llama 4 Maverick: Competing With Closed Models

Behemoth: Still Training

The Open Source Statement

LlamaCon: The Ecosystem Event

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Meta Ships Llama 4: Scout Fits on One GPU, Maverick Beats GPT-4o

Llama 4 Scout: 10M Context on One GPU

Llama 4 Maverick: Competing With Closed Models

Behemoth: Still Training

The Open Source Statement

LlamaCon: The Ecosystem Event

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected