GPT-5.1 Brings Better Steerability and Codex Models for Agentic Coding
OpenAI's GPT-5.1 improves instruction following and code generation. Dedicated Codex models ship alongside enhanced RBAC and extended prompt caching up to 24 hours.
Maya Johnson
OpenAI released GPT-5.1 on November 13, 2025, focusing on steerability — the model's ability to follow complex, multi-part instructions precisely. Alongside the main model, OpenAI shipped gpt-5.1-codex and gpt-5.1-codex-mini, purpose-built for agentic coding tasks, according to OpenAI's changelog.
Steerability Over Raw Power
Rather than chasing benchmark records, GPT-5.1 prioritized instruction following. The model defaults to none reasoning — no chain-of-thought by default — which makes responses faster and cheaper for tasks that don't require deep thinking. Developers enable reasoning explicitly when needed.
This is a pragmatic choice. Most production API calls don't need frontier-level reasoning; they need the model to do exactly what it's told, quickly and reliably.
Codex Models
gpt-5.1-codex and gpt-5.1-codex-mini are dedicated coding variants. They're optimized for OpenAI's Codex product — the agentic coding tool that competes with Anthropic's Claude Code and Cursor.
A gpt-5.1-codex-max followed on December 4, offering maximum compute for the hardest coding problems. Three tiers of coding model — matching Anthropic's approach of optimizing for different complexity levels.
Infrastructure Updates
Two enterprise features shipped alongside the model. Enhanced RBAC (Role-Based Access Control) capabilities let organizations manage API access at a granular level. Extended prompt cache retention — up to 24 hours with GPU-local storage offloading — dramatically reduces costs for applications that reuse similar prompts.
The 24-hour cache is significant. It means a customer service application that handles similar queries throughout the day pays the full input cost only once, then benefits from cached pricing for subsequent requests.
The Competitive Context
GPT-5.1 launched five days before Claude Opus 4.5, which took the coding benchmark crown. OpenAI was losing the coding race specifically — Claude had led SWE-bench since Sonnet 4.5 in September, and the dedicated Codex models were a direct response to that competitive pressure.
Our Take
GPT-5.1 is OpenAI doing the boring, important work. Steerability improvements and RBAC don't generate headlines, but they're exactly what enterprise customers ask for. The dedicated Codex line signals that OpenAI sees agentic coding as a distinct product category, not just a model feature. Whether the Codex models can catch Claude on coding benchmarks is the question that matters.
FAQ
What's new in GPT-5.1? GPT-5.1 focuses on improved steerability (instruction following), agentic coding via dedicated Codex models, enhanced RBAC for enterprises, and extended prompt caching up to 24 hours.
What are the GPT-5.1 Codex models?
Three dedicated coding models: gpt-5.1-codex (standard), gpt-5.1-codex-mini (faster/cheaper), and gpt-5.1-codex-max (maximum compute). They're optimized for OpenAI's Codex agentic coding product.
How does GPT-5.1 compare to Claude? GPT-5.1 improves on instruction following and general tasks. Claude Sonnet 4.5 and Opus 4.5 lead on coding benchmarks. The models are competitive on different dimensions.