GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding

Q: What are the GPT-5.1 Codex models?

Three dedicated coding models: `gpt-5.1-codex` (standard), `gpt-5.1-codex-mini` (faster/cheaper), and `gpt-5.1-codex-max` (maximum compute). They're optimized for OpenAI's Codex agentic coding product.

OpenAI's GPT-5.1 improves instruction following and code generation. Dedicated Codex models ship alongside enhanced RBAC and extended prompt caching up to 24 hours.

Lisa Thoma

Thursday, November 13, 2025·3 min read

OpenAI released GPT-5.1 on November 13, 2025, focusing on steerability — the model's ability to follow complex, multi-part instructions precisely. Alongside the main model, OpenAI shipped gpt-5.1-codex and gpt-5.1-codex-mini, purpose-built for agentic coding tasks, according to OpenAI's changelog.

Steerability Over Raw Power

Rather than chasing benchmark records, GPT-5.1 prioritized instruction following. The model defaults to none reasoning — no chain-of-thought by default — which makes responses faster and cheaper for tasks that don't require deep thinking. Developers enable reasoning explicitly when needed.

This is a pragmatic choice. Most production API calls don't need frontier-level reasoning; they need the model to do exactly what it's told, quickly and reliably.

Codex Models

gpt-5.1-codex and gpt-5.1-codex-mini are dedicated coding variants. They're optimized for OpenAI's Codex product — the agentic coding tool that competes with Anthropic's Claude Code and Cursor.

A gpt-5.1-codex-max followed on December 4, offering maximum compute for the hardest coding problems. Three tiers of coding model — matching Anthropic's approach of optimizing for different complexity levels.

Infrastructure Updates

Two enterprise features shipped alongside the model. Enhanced RBAC (Role-Based Access Control) capabilities let organizations manage API access at a granular level. Extended prompt cache retention — up to 24 hours with GPU-local storage offloading — dramatically reduces costs for applications that reuse similar prompts.

The 24-hour cache is significant. It means a customer service application that handles similar queries throughout the day pays the full input cost only once, then benefits from cached pricing for subsequent requests.

The Competitive Context

GPT-5.1 launched five days before Claude Opus 4.5, which took the coding benchmark crown. OpenAI was losing the coding race specifically — Claude had led SWE-bench since Sonnet 4.5 in September, and the dedicated Codex models were a direct response to that competitive pressure.

Our Take

GPT-5.1 is OpenAI doing the boring, important work. Steerability improvements and RBAC don't generate headlines, but they're exactly what enterprise customers ask for. The dedicated Codex line signals that OpenAI sees agentic coding as a distinct product category, not just a model feature. Whether the Codex models can catch Claude on coding benchmarks is the question that matters.

FAQ

What's new in GPT-5.1? GPT-5.1 focuses on improved steerability (instruction following), agentic coding via dedicated Codex models, enhanced RBAC for enterprises, and extended prompt caching up to 24 hours.

What are the GPT-5.1 Codex models? Three dedicated coding models: gpt-5.1-codex (standard), gpt-5.1-codex-mini (faster/cheaper), and gpt-5.1-codex-max (maximum compute). They're optimized for OpenAI's Codex agentic coding product.

How does GPT-5.1 compare to Claude? GPT-5.1 improves on instruction following and general tasks. Claude Sonnet 4.5 and Opus 4.5 lead on coding benchmarks. The models are competitive on different dimensions.

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

Claude Managed Agents provides a fully managed infrastructure for running autonomous AI agents with sandboxing, tool execution, and SSE streaming. Available now to all API accounts.

Lisa Thoma·Apr 14, 2026

AI LLMs

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Reuters confirms DeepSeek V4 runs on Huawei's Ascend 950PR processors, not NVIDIA. The 1-trillion-parameter MoE model is expected in late April with an Apache 2.0 release.

Lisa Thoma·Apr 14, 2026

Steerability Over Raw Power

This is a pragmatic choice. Most production API calls don't need frontier-level reasoning; they need the model to do exactly what it's told, quickly and reliably.

Codex Models

gpt-5.1-codex and gpt-5.1-codex-mini are dedicated coding variants. They're optimized for OpenAI's Codex product — the agentic coding tool that competes with Anthropic's Claude Code and Cursor.

Infrastructure Updates

Our Take

FAQ

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

Claude Managed Agents provides a fully managed infrastructure for running autonomous AI agents with sandboxing, tool execution, and SSE streaming. Available now to all API accounts.

Lisa Thoma·Apr 14, 2026

AI LLMs

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

Reuters confirms DeepSeek V4 runs on Huawei's Ascend 950PR processors, not NVIDIA. The 1-trillion-parameter MoE model is expected in late April with an Apache 2.0 release.

Lisa Thoma·Apr 14, 2026

GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding

Steerability Over Raw Power

Codex Models

Infrastructure Updates

The Competitive Context

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected

GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding

Steerability Over Raw Power

Codex Models

Infrastructure Updates

The Competitive Context

Our Take

FAQ

More in AI LLMs

Anthropic Launches Claude Managed Agents in Public Beta — $0.08/Hour Runtime

DeepSeek V4 Confirmed on Huawei Ascend Chips — Late April Launch Expected