LLMsVideo TranslationImage GenerationVideo Generation
AI News

Independent coverage of the latest AI tool updates, releases, and comparisons.

Categories

  • AI LLMs
  • AI Video Translation
  • AI Image Generation
  • AI Video Generation

Company

  • About
  • Contact

Resources

  • Sitemap
  • AI Glossary
  • Tool Comparisons
  • Facts / Grounding
  • llms.txt
  • XML Sitemap
© 2026 AI News. Independent editorial coverage. Not affiliated with any AI company.
AI LLMs

Claude Sonnet 4.5 Takes SWE-bench Crown With 82% Under High Compute

Anthropic's Sonnet 4.5 hits 77.2% on SWE-bench Verified at standard settings and 82% with high compute. The company also ships Claude Agent SDK and introduces ASL-3 classification.

MJ

Maya Johnson

Monday, September 29, 2025·3 min read

Anthropic released Claude Sonnet 4.5 on September 29, 2025, and the benchmark numbers speak for themselves: 77.2% on SWE-bench Verified at standard settings, climbing to 82.0% with high compute. That makes it the best coding model available by a significant margin, according to Anthropic's blog.

Where Sonnet 4.5 Leads

The SWE-bench score is the headline, but the more telling number is OSWorld: 61.4%, up from 42.2% for Sonnet 4. OSWorld tests practical computer use — navigating real desktops, operating software, completing multi-step tasks. A 19-point jump suggests genuine improvement in agent capabilities, not just benchmark optimization.

Sonnet 4.5 can maintain extended focus for 30+ hours on complex multi-step tasks. That's not a typo — Anthropic reports the model working continuously on long-horizon agent workflows for over a day without degradation.

Pricing stays at $3/$15 per million tokens, unchanged from Sonnet 4. The 200K context window and 64K max output also remain the same. Released under ASL-3, Anthropic's highest safety tier.

Claude Agent SDK

Alongside the model, Anthropic released the Claude Agent SDK — a framework for building multi-step, tool-using AI agents. Combined with Claude Code checkpoints (which let you save and resume agent sessions) and the VS Code extension, this creates a complete developer platform around Claude.

The SDK is significant because it standardizes how developers build with Claude agents, rather than everyone implementing their own orchestration logic.

Code Execution and File Creation

Claude can now execute code and create files directly within Claude apps — not just suggest code, but run it and show results. This moves Claude closer to being a development environment, not just a chat interface.

The Three-Way Race

At the time of launch, the LLM leaderboard looked like this: Claude led coding (SWE-bench), GPT-5 led general knowledge and reasoning, and Gemini 2.5 Pro led academic benchmarks. Sonnet 4.5 widened Claude's coding lead specifically.

Google had released Gemini 2.5 Pro earlier in the year with strong reasoning scores, and OpenAI had shipped GPT-5 in August with broad improvements. But neither could match Sonnet 4.5 on the benchmarks that matter most for professional developers.

Our Take

Sonnet 4.5 at $3/$15 is absurd value. It outperforms models costing 5x more on the benchmarks developers actually care about. The 30-hour sustained focus claim is bold — if it holds up in production, it fundamentally changes what's possible with AI agents. Anthropic is building a moat around the developer experience, and the Agent SDK is the foundation. The question isn't whether Claude is the best coding model. It's whether anyone else can catch up.

FAQ

How much does Claude Sonnet 4.5 cost? Claude Sonnet 4.5 costs $3 per million input tokens and $15 per million output tokens — identical pricing to Sonnet 4. It's available through the Anthropic API with the model ID claude-sonnet-4-5-20250929.

What is the Claude Agent SDK? The Claude Agent SDK is a framework released alongside Sonnet 4.5 for building multi-step AI agents that can use tools, make decisions, and work on complex tasks autonomously. It standardizes agent development patterns for the Claude ecosystem.

How does Sonnet 4.5 compare to GPT-5? Sonnet 4.5 leads on coding benchmarks like SWE-bench Verified (77.2%-82.0%) while GPT-5 leads on general reasoning and knowledge tasks. The models are competitive, with each excelling in different categories.

What is ASL-3? ASL-3 is Anthropic's AI Safety Level classification system. Sonnet 4.5 was the second model released under ASL-3 (after the Claude 4 family), indicating it meets Anthropic's most rigorous safety and deployment requirements.

Tools Mentioned

Claude (Anthropic)Safe, helpful AI assistant with extended context and reasoning
$20/mo (Pro)
GPT (OpenAI)Industry-leading large language models powering ChatGPT
$20/mo (ChatGPT Plus)
Gemini (Google)Google's multimodal AI model family
$19.99/mo (Advanced)

More in AI LLMs

AI LLMs

Meta Launches Muse Spark — Its First Closed-Source Model Targets 'Personal Superintelligence'

Meta Superintelligence Labs unveils Muse Spark with dual modes, 58% on Humanity's Last Exam, and multimodal reasoning. Breaking with tradition, the model is not open-source.

Alex Chen·Apr 8, 2026
AI LLMs

OpenAI, Anthropic, and Google Unite to Combat AI Model Copying From China

The three biggest Western AI labs are sharing information through the Frontier Model Forum to prevent Chinese competitors from extracting their models' capabilities.

Sarah Mueller·Apr 7, 2026
← Back to all news