OpenAI announced on Nov. 19 (local time) the release of GPT-5.1 Codex Max, a coding-optimized model designed to outperform Google’s Gemini 3 in both programming capability and long-context reasoning. The launch signals the company’s intention to reclaim dominance in the AI developer market.
A Shift Toward Persistent Development Agents and a New Standard for Long-Context AI
OpenAI describes GPT-5.1 Codex Max as a persistent development agent capable of handling the full spectrum of software engineering. The system can autonomously manage large projects, perform refactoring, and execute multi-context debugging tasks.
At the core is a long-context technology called “compaction.” The method selectively preserves essential information during ongoing work, allowing the model to support development sessions that effectively span millions of tokens. OpenAI says this approach reduces token usage in intermediate reasoning by roughly 30 percent, cutting both cost and latency.
Nonprofit research group METR reported that the model’s average continuous operation time reaches two hours and forty-two minutes, which is twenty-five minutes longer than GPT-5. Internal tests at OpenAI recorded uninterrupted runs lasting more than twenty-four hours, a meaningful improvement for real-world engineering environments.
A Direct Face-off with Gemini 3
OpenAI released benchmarking data comparing Codex Max with Google’s Gemini 3 Pro, reflecting the company’s confidence in the new model.
| Benchmark | GPT-5.1 Codex Max | Gemini 3 Pro |
| SWE-Bench Verified | 77.9% | 76.2% |
| Terminal-Bench 2.0 | 58.1% | 54.2% |
| LiveCodeBench Pro | 2439 | 2439 |
The results show Codex Max leading in high-complexity code reasoning and terminal-based problem solving, while matching or surpassing Google’s model in real-time coding tasks. OpenAI emphasized that the model performs strongest in engineering environments that mirror practical software workflows.
Significant Gains over Previous GPT-5.1 Codex Models
OpenAI also highlighted substantial improvements over earlier versions of the Codex line. Several real-world benchmarks demonstrated notable performance jumps.
• SWE-Lancer IC SWE: 66.3% to 79.9%
• SWE-Bench Verified: 73.7% to 77.9%
• Terminal-Bench 2.0: 52.8% to 58.1%
Refactoring and test automation in particular saw marked gains. Evaluators noted that the model can identify test failures, revise implementations autonomously, and iterate toward optimal solutions. The change signals a shift from simple code assistance toward AI systems capable of replacing core segments of a developer’s daily workflow.
Support Across CLI, IDEs, and Cloud Tools
GPT-5.1 Codex Max is available immediately across command-line interfaces, IDE extensions, cloud development platforms, and automated code review services. API access will follow soon, setting the stage for rapid adoption within enterprise engineering teams.
OpenAI describes Codex Max as the first general-purpose engineering model that can operate continuously during long development cycles, suggesting a broader transition toward AI-accelerated product development.
A New Phase in the AI Coding Race
The timing of the launch, coming shortly after Google revealed Gemini 3 Pro, reflects the intensifying competition for leadership in the developer AI market. The focus of the 2025 AI race is shifting away from language and multimodal capabilities and toward the degree to which AI systems can fully take over practical engineering work.
Written by Ju-baek Shinㅣjbshin@kmjournal.net
- Google’s Gemini 3 Beats GPT-5.1 Benchmarks, Integrated into Search on Day One
- Google Goes All In: With Gemini 3, the Search Giant Finally Takes the Gloves Off
- “ChatGPT Now Supports Multi-Person Conversations”… OpenAI Pilots ‘Group Chat’ in Korea
- China’s Moonshot AI Sparks Global Buzz With ‘Kimi-K2’: “Smarter Than DeepSeek... at a Lower Cost”
- OpenAI Unveils GPT-5.1 in Seoul: Adaptive Reasoning, Personalized Tone and a Full Suite of Next-Gen Upgrades
- Google Gemini 3 Shows “Signs of Life” as It Pushes Toward AGI
- OpenAI Announces End of GPT-4o API Support in 2026 as the Countdown Begins for Its Flagship Multimodal Model
- Google’s Gemini 3 Challenges GPT 5.1 and Shakes the AI Order After Three Years