LLMsVideo TranslationImage GenerationVideo Generation
AI News

Independent coverage of the latest AI tool updates, releases, and comparisons.

Categories

  • AI LLMs
  • AI Video Translation
  • AI Image Generation
  • AI Video Generation

Company

  • About & Contact

Resources

  • Sitemap
  • AI Glossary
  • Tool Comparisons
  • Facts / Grounding
  • llms.txt
  • XML Sitemap

© 2026 AI News. Independent editorial coverage. Not affiliated with any AI company.

Lisa Thoma|lisathoma-91@outlook.com|
AI LLMs

Claude Sonnet 4.5 Takes SWE-bench Crown With 82% Under High Compute

Anthropic's Sonnet 4.5 hits 77.2% on SWE-bench Verified at standard settings and 82% with high compute. The company also ships Claude Agent SDK and introduces ASL-3 classification.

Lisa Thoma
Lisa Thoma
Monday, September 29, 2025·3 min read

Anthropic released Claude Sonnet 4.5 on September 29, 2025, and the benchmark numbers speak for themselves: 77.2% on SWE-bench Verified at standard settings, climbing to 82.0% with high compute. That makes it the best coding model available by a significant margin, according to Anthropic's blog.

Where Sonnet 4.5 Leads

The SWE-bench score is the headline, but the more telling number is OSWorld: 61.4%, up from 42.2% for Sonnet 4. OSWorld tests practical computer use — navigating real desktops, operating software, completing multi-step tasks. A 19-point jump suggests genuine improvement in agent capabilities, not just benchmark optimization.

Sonnet 4.5 can maintain extended focus for 30+ hours on complex multi-step tasks. That's not a typo — Anthropic reports the model working continuously on long-horizon agent workflows for over a day without degradation.

Pricing stays at $3/$15 per million tokens, unchanged from Sonnet 4. The 200K context window and 64K max output also remain the same. Released under ASL-3, Anthropic's highest safety tier.

Claude Agent SDK

Alongside the model, Anthropic released the Claude Agent SDK — a framework for building multi-step, tool-using AI agents. Combined with Claude Code checkpoints (which let you save and resume agent sessions) and the VS Code extension, this creates a complete developer platform around Claude.

The SDK is significant because it standardizes how developers build with Claude agents, rather than everyone implementing their own orchestration logic.

Code Execution and File Creation

Claude can now execute code and create files directly within Claude apps — not just suggest code, but run it and show results. This moves Claude closer to being a development environment, not just a chat interface.

The Three-Way Race

At the time of launch, the LLM leaderboard looked like this: Claude led coding (SWE-bench), GPT-5 led general knowledge and reasoning, and Gemini 2.5 Pro led academic benchmarks. Sonnet 4.5 widened Claude's coding lead specifically.

Google had released Gemini 2.5 Pro earlier in the year with strong reasoning scores, and OpenAI had shipped GPT-5 in August with broad improvements. But neither could match Sonnet 4.5 on the benchmarks that matter most for professional developers.

Our Take

Sonnet 4.5 at $3/$15 is absurd value. It outperforms models costing 5x more on the benchmarks developers actually care about. The 30-hour sustained focus claim is bold — if it holds up in production, it fundamentally changes what's possible with AI agents. Anthropic is building a moat around the developer experience, and the Agent SDK is the foundation. The question isn't whether Claude is the best coding model. It's whether anyone else can catch up.

FAQ

How much does Claude Sonnet 4.5 cost? Claude Sonnet 4.5 costs $3 per million input tokens and $15 per million output tokens — identical pricing to Sonnet 4. It's available through the Anthropic API with the model ID claude-sonnet-4-5-20250929.

What is the Claude Agent SDK? The Claude Agent SDK is a framework released alongside Sonnet 4.5 for building multi-step AI agents that can use tools, make decisions, and work on complex tasks autonomously. It standardizes agent development patterns for the Claude ecosystem.

How does Sonnet 4.5 compare to GPT-5? Sonnet 4.5 leads on coding benchmarks like SWE-bench Verified (77.2%-82.0%) while GPT-5 leads on general reasoning and knowledge tasks. The models are competitive, with each excelling in different categories.

What is ASL-3? ASL-3 is Anthropic's AI Safety Level classification system. Sonnet 4.5 was the second model released under ASL-3 (after the Claude 4 family), indicating it meets Anthropic's most rigorous safety and deployment requirements.

Tools Mentioned

Claude (Anthropic)Safe, helpful AI assistant with extended context and reasoning
$20/mo (Pro)
GPT (OpenAI)Industry-leading large language models powering ChatGPT
$20/mo (ChatGPT Plus)
Gemini (Google)Google's multimodal AI model family
$19.99/mo (Advanced)

More in AI LLMs

AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

Claude Managed Agents provides a fully managed infrastructure for running autonomous AI agents with sandboxing, tool execution, and SSE streaming. Available now to all API accounts.

Lisa Thoma·Apr 14, 2026
AI LLMs

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Reuters confirms DeepSeek V4 runs on Huawei's Ascend 950PR processors, not NVIDIA. The 1-trillion-parameter MoE model is expected in late April with an Apache 2.0 release.

Lisa Thoma·Apr 14, 2026
← Back to all news