Claude Opus 4.5 Scores Highest on Engineering Exam, Leads Agentic Benchmarks

Anthropic's Opus 4.5 exceeded all human candidates on the company's internal engineering exam, leads SWE-bench Verified, and introduces an effort parameter for speed optimization.

Lisa Thoma

Monday, November 24, 2025·3 min read

Anthropic released Claude Opus 4.5 on November 24, 2025, calling it the "best model in the world for coding, agents, and computer use." The claim is backed by numbers: it leads SWE-bench Verified, shows a 10.6% improvement over Sonnet 4.5 on the Aider Polyglot coding benchmark, and scored 29% higher on Vending-Bench for long-horizon tasks, according to Anthropic.

The Engineering Exam Result

The standout detail: Opus 4.5 exceeded all human candidates on Anthropic's internal engineering exam. This isn't a public benchmark designed for AI — it's the actual test Anthropic gives to engineering job applicants. The model outperformed every human who took it.

That's a different kind of milestone than benchmark leaderboards. It suggests the model has crossed a threshold where it can reliably perform professional-level software engineering work, not just solve isolated coding puzzles.

The Effort Parameter

Opus 4.5 introduced a new "effort parameter" that lets developers control the speed-capability tradeoff. Lower effort settings produce faster, cheaper responses for simple tasks. Higher settings enable deeper reasoning for complex problems. This makes Opus 4.5 more practical for production use where not every query needs maximum compute.

Desktop App With Parallel Agent Sessions

The release included desktop app support with parallel agent sessions — multiple Claude agents running simultaneously on different tasks. This is a preview of the multi-agent architecture that would later become agent teams in Opus 4.6.

Pricing and Context

Opus 4.5 costs $5/$25 per million tokens — a significant drop from Opus 4's $15/$75. The 200K context window and 64K max output match Sonnet 4.5. Model ID: claude-opus-4-5-20251101.

For comparison: GPT-5 launched in August at competitive pricing, and Gemini 3 Pro was about to ship. The LLM market was entering its most competitive period, with three strong contenders releasing frontier models within weeks of each other.

Our Take

The pricing restructure is the real story here. Opus dropped from $15/$75 to $5/$25 — a 67% price cut — while getting significantly better. That's Anthropic acknowledging that the Opus tier needs to be accessible enough for production use, not just occasional hard problems. The effort parameter makes this practical: you can run Opus at low effort for routine work and high effort for the hard stuff, keeping costs manageable.

FAQ

How much does Claude Opus 4.5 cost? Opus 4.5 costs $5 per million input tokens and $25 per million output tokens. This is a 67% reduction from Opus 4's pricing of $15/$75. The model ID is claude-opus-4-5-20251101.

What is the effort parameter? The effort parameter lets developers control how much reasoning Opus 4.5 applies to each request. Lower settings produce faster, cheaper responses for simple tasks, while higher settings enable deeper reasoning for complex problems.

How does Opus 4.5 compare to Sonnet 4.5? Opus 4.5 scores 10.6% higher on the Aider Polyglot benchmark and 29% higher on Vending-Bench for long-horizon tasks. However, Sonnet 4.5 at $3/$15 offers excellent value for tasks that don't require maximum capability.

Did Opus 4.5 really beat all human engineers? Yes, according to Anthropic, Opus 4.5 exceeded all human candidates on the company's internal engineering exam — the same test used for hiring decisions. This is the actual Anthropic engineering interview, not a standardized benchmark.

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

Claude Managed Agents provides a fully managed infrastructure for running autonomous AI agents with sandboxing, tool execution, and SSE streaming. Available now to all API accounts.

Lisa Thoma·Apr 14, 2026

AI LLMs

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Reuters confirms DeepSeek V4 runs on Huawei's Ascend 950PR processors, not NVIDIA. The 1-trillion-parameter MoE model is expected in late April with an Apache 2.0 release.

Lisa Thoma·Apr 14, 2026

The Engineering Exam Result

The Effort Parameter

Pricing and Context

Opus 4.5 costs $5/$25 per million tokens — a significant drop from Opus 4's $15/$75. The 200K context window and 64K max output match Sonnet 4.5. Model ID: claude-opus-4-5-20251101.

Our Take

FAQ

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

Claude Managed Agents provides a fully managed infrastructure for running autonomous AI agents with sandboxing, tool execution, and SSE streaming. Available now to all API accounts.

Lisa Thoma·Apr 14, 2026

AI LLMs

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Reuters confirms DeepSeek V4 runs on Huawei's Ascend 950PR processors, not NVIDIA. The 1-trillion-parameter MoE model is expected in late April with an Apache 2.0 release.

Lisa Thoma·Apr 14, 2026

Claude Opus 4.5 Scores Highest on Engineering Exam, Leads Agentic Benchmarks

The Engineering Exam Result

The Effort Parameter

Desktop App With Parallel Agent Sessions

Pricing and Context

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Claude Opus 4.5 Scores Highest on Engineering Exam, Leads Agentic Benchmarks

The Engineering Exam Result

The Effort Parameter

Desktop App With Parallel Agent Sessions

Pricing and Context

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected