LLMsVideo TranslationImage GenerationVideo Generation
AI News

Independent coverage of the latest AI tool updates, releases, and comparisons.

Categories

  • AI LLMs
  • AI Video Translation
  • AI Image Generation
  • AI Video Generation

Company

  • About
  • Contact

Resources

  • Sitemap
  • AI Glossary
  • Tool Comparisons
  • Facts / Grounding
  • llms.txt
  • XML Sitemap
© 2026 AI News. Independent editorial coverage. Not affiliated with any AI company.
AI LLMs

Claude Opus 4.5 Scores Highest on Engineering Exam, Leads Agentic Benchmarks

Anthropic's Opus 4.5 exceeded all human candidates on the company's internal engineering exam, leads SWE-bench Verified, and introduces an effort parameter for speed optimization.

MJ

Maya Johnson

Monday, November 24, 2025·3 min read

Anthropic released Claude Opus 4.5 on November 24, 2025, calling it the "best model in the world for coding, agents, and computer use." The claim is backed by numbers: it leads SWE-bench Verified, shows a 10.6% improvement over Sonnet 4.5 on the Aider Polyglot coding benchmark, and scored 29% higher on Vending-Bench for long-horizon tasks, according to Anthropic.

The Engineering Exam Result

The standout detail: Opus 4.5 exceeded all human candidates on Anthropic's internal engineering exam. This isn't a public benchmark designed for AI — it's the actual test Anthropic gives to engineering job applicants. The model outperformed every human who took it.

That's a different kind of milestone than benchmark leaderboards. It suggests the model has crossed a threshold where it can reliably perform professional-level software engineering work, not just solve isolated coding puzzles.

The Effort Parameter

Opus 4.5 introduced a new "effort parameter" that lets developers control the speed-capability tradeoff. Lower effort settings produce faster, cheaper responses for simple tasks. Higher settings enable deeper reasoning for complex problems. This makes Opus 4.5 more practical for production use where not every query needs maximum compute.

Desktop App With Parallel Agent Sessions

The release included desktop app support with parallel agent sessions — multiple Claude agents running simultaneously on different tasks. This is a preview of the multi-agent architecture that would later become agent teams in Opus 4.6.

Pricing and Context

Opus 4.5 costs $5/$25 per million tokens — a significant drop from Opus 4's $15/$75. The 200K context window and 64K max output match Sonnet 4.5. Model ID: claude-opus-4-5-20251101.

For comparison: GPT-5 launched in August at competitive pricing, and Gemini 3 Pro was about to ship. The LLM market was entering its most competitive period, with three strong contenders releasing frontier models within weeks of each other.

Our Take

The pricing restructure is the real story here. Opus dropped from $15/$75 to $5/$25 — a 67% price cut — while getting significantly better. That's Anthropic acknowledging that the Opus tier needs to be accessible enough for production use, not just occasional hard problems. The effort parameter makes this practical: you can run Opus at low effort for routine work and high effort for the hard stuff, keeping costs manageable.

FAQ

How much does Claude Opus 4.5 cost? Opus 4.5 costs $5 per million input tokens and $25 per million output tokens. This is a 67% reduction from Opus 4's pricing of $15/$75. The model ID is claude-opus-4-5-20251101.

What is the effort parameter? The effort parameter lets developers control how much reasoning Opus 4.5 applies to each request. Lower settings produce faster, cheaper responses for simple tasks, while higher settings enable deeper reasoning for complex problems.

How does Opus 4.5 compare to Sonnet 4.5? Opus 4.5 scores 10.6% higher on the Aider Polyglot benchmark and 29% higher on Vending-Bench for long-horizon tasks. However, Sonnet 4.5 at $3/$15 offers excellent value for tasks that don't require maximum capability.

Did Opus 4.5 really beat all human engineers? Yes, according to Anthropic, Opus 4.5 exceeded all human candidates on the company's internal engineering exam — the same test used for hiring decisions. This is the actual Anthropic engineering interview, not a standardized benchmark.

Tools Mentioned

Claude (Anthropic)Safe, helpful AI assistant with extended context and reasoning
$20/mo (Pro)
GPT (OpenAI)Industry-leading large language models powering ChatGPT
$20/mo (ChatGPT Plus)
Gemini (Google)Google's multimodal AI model family
$19.99/mo (Advanced)

More in AI LLMs

AI LLMs

Meta Launches Muse Spark — Its First Closed-Source Model Targets 'Personal Superintelligence'

Meta Superintelligence Labs unveils Muse Spark with dual modes, 58% on Humanity's Last Exam, and multimodal reasoning. Breaking with tradition, the model is not open-source.

Alex Chen·Apr 8, 2026
AI LLMs

OpenAI, Anthropic, and Google Unite to Combat AI Model Copying From China

The three biggest Western AI labs are sharing information through the Frontier Model Forum to prevent Chinese competitors from extracting their models' capabilities.

Sarah Mueller·Apr 7, 2026
← Back to all news