Meta Superintelligence Labs unveils Muse Spark with dual modes, 58% on Humanity's Last Exam, and multimodal reasoning. Breaking with tradition, the model is not open-source.
The three biggest Western AI labs are sharing information through the Frontier Model Forum to prevent Chinese competitors from extracting their models' capabilities.
A coalition of 12 tech giants including Google, Microsoft, and NVIDIA launches Project Glasswing to secure AI infrastructure. Anthropic introduces Claude Mythos Preview for defensive cybersecurity.
Google releases Gemma 4 with advanced reasoning and agentic capabilities, supporting 140+ languages. Available under Apache 2.0 for mobile, desktop, and IoT deployment.
The largest AI funding round in history: $50B from Amazon, $30B each from SoftBank and NVIDIA, plus Microsoft, a16z, and others. OpenAI doubles down on compute and Stargate.
Mistral releases its most capable model family — including Mistral Large 3 at 675B parameters — and an open-weights text-to-speech model supporting 9 languages.
OpenAI releases GPT-5.4 mini with GPT-5.4-class capabilities at lower cost, and GPT-5.4 nano for simple tasks. Both support compaction for long-running applications.
DeepSeek's V4 model targets 1T parameters with only 37B active per token, a 1M context window, and native image/video generation. Leaked benchmarks claim Claude Opus-level performance.
Mistral Small 4 combines Magistral reasoning, Pixtral vision, and Devstral coding into a single multimodal model. 128 experts, 256K context, Apache 2.0.
Elon Musk confirms Grok 5 for 2026 with 6T parameters, trained on Colossus 2. Meanwhile, Grok Imagine already generated 1.2 billion videos in January alone.
OpenAI's GPT-5.4, released March 5, achieves record scores on OSWorld and WebArena while the company faces a changed competitive landscape.
Google's Gemini 3.1 Pro scores highest on reasoning benchmarks, edges past GPT-5.4 and Claude Opus 4.6 on academic tasks, and brings a new thinking_level parameter for developers.
Anthropic's Sonnet 4.6 ships with 1M token context, adaptive thinking, and web search tools. Internal testing showed users preferred it over Sonnet 4.5 roughly 70% of the time.
Anthropic closes its Series G at $30 billion with a $380 billion post-money valuation. The company's annualized revenue has reached $14 billion, fueling IPO speculation.
Claude Opus 4.6 introduces multi-agent teams and a 1M token context window. Meanwhile, Anthropic's annualized revenue has surpassed $30 billion, overtaking OpenAI for the first time.
The all-stock merger combines AI and space infrastructure at a $1.25 trillion combined valuation. Grok 4.20 launches with 2M token context and multi-agent orchestration.
Google's speed-optimized Gemini 3 Flash delivers near-Pro performance with multimodal function responses and code execution. Flash-Lite follows in March.
GPT-5.2 improves across intelligence, instruction following, and multimodality. Client-side compaction enables infinite conversations. System Card published alongside.
Anthropic open-sources MCP governance by donating it to the newly established Agentic AI Foundation. Accenture joins as launch partner.
Mistral's 10-model December release includes Mistral Large 3 (675B parameters, MoE) and nine Ministral 3 variants. All under Apache 2.0. The biggest open-source model drop of 2025.
Anthropic's Opus 4.5 exceeded all human candidates on the company's internal engineering exam, leads SWE-bench Verified, and introduces an effort parameter for speed optimization.
Gemini 3 Pro ships with thought signatures, thinking levels, and improved multimodal understanding. It's Google's most competitive model yet against Claude and GPT.
OpenAI's GPT-5.1 improves instruction following and code generation. Dedicated Codex models ship alongside enhanced RBAC and extended prompt caching up to 24 hours.
OpenAI's corporate restructuring creates the OpenAI Foundation (nonprofit with $130B equity) and OpenAI Group PBC (for-profit public benefit corporation). The Foundation commits $25 billion to health.
Anthropic's smallest model punches above its weight: Haiku 4.5 scores 73.3% on SWE-bench Verified and runs 4-5x faster than Sonnet 4.5 at one-third the cost.
Anthropic's Sonnet 4.5 hits 77.2% on SWE-bench Verified at standard settings and 82% with high compute. The company also ships Claude Agent SDK and introduces ASL-3 classification.
Ray-Ban Meta Display features full-color in-lens display, EMG wristband for gesture control, 18-hour battery, and Meta AI integration. Starting at $799.
GPT-5, GPT-5 mini, and GPT-5 nano ship with unified reasoning, web search, and an 80% reduction in factual errors. OpenAI's biggest model launch since GPT-4.
Grok 4 ships as xAI's most intelligent model with native tool use. Grok 4 Heavy spawns multiple agents to solve problems in parallel. SuperGrok Heavy costs $300/month.
Magistral Small (24B, Apache 2.0) and Magistral Medium bring step-by-step reasoning to Mistral's lineup. Le Chat delivers Magistral responses at 10x the speed of competing reasoning models.
Claude 4 introduces the first Opus-class model alongside an upgraded Sonnet, bringing extended thinking with tool use, parallel tool execution, and Claude Code going generally available.
The Meta AI app ships on iOS and Android with full-duplex voice, image generation, and cross-platform integration across WhatsApp, Instagram, and Ray-Ban Meta glasses.
o3 and o4-mini launch with web search, code execution, and file analysis. Codex CLI goes open source. Deep research expands to all paid tiers.
Llama 4 introduces MoE architecture with three models. Scout has a 10M token context window. Maverick's 128 experts beat GPT-4o on LMArena. Behemoth is still training.
OpenAI ships its new core API with built-in web search, file search, and computer use tools. The Agents SDK provides a framework for multi-step AI workflows. Assistants API sunset planned for 2026.
Grok 3 launches with 10x the compute of Grok 2, Think and Big Brain reasoning modes, and top scores on math and science benchmarks. X Premium+ price rises to $40/month.