OpenAI announced on Nov. 19 (local time) the release of GPT-5.1 Codex Max, a coding-optimized model designed to outperform Google’s Gemini 3 in both programming capability and long-context reasoning. The launch signals the company’s intention to reclaim dominance in the AI developer market.

image=OpenAI
image=OpenAI

A Shift Toward Persistent Development Agents and a New Standard for Long-Context AI

OpenAI describes GPT-5.1 Codex Max as a persistent development agent capable of handling the full spectrum of software engineering. The system can autonomously manage large projects, perform refactoring, and execute multi-context debugging tasks.

At the core is a long-context technology called “compaction.” The method selectively preserves essential information during ongoing work, allowing the model to support development sessions that effectively span millions of tokens. OpenAI says this approach reduces token usage in intermediate reasoning by roughly 30 percent, cutting both cost and latency.

Nonprofit research group METR reported that the model’s average continuous operation time reaches two hours and forty-two minutes, which is twenty-five minutes longer than GPT-5. Internal tests at OpenAI recorded uninterrupted runs lasting more than twenty-four hours, a meaningful improvement for real-world engineering environments.

A Direct Face-off with Gemini 3

OpenAI released benchmarking data comparing Codex Max with Google’s Gemini 3 Pro, reflecting the company’s confidence in the new model.

Benchmark GPT-5.1 Codex Max Gemini 3 Pro
SWE-Bench Verified 77.9% 76.2%
Terminal-Bench 2.0 58.1% 54.2%
LiveCodeBench Pro 2439 2439

The results show Codex Max leading in high-complexity code reasoning and terminal-based problem solving, while matching or surpassing Google’s model in real-time coding tasks. OpenAI emphasized that the model performs strongest in engineering environments that mirror practical software workflows.

Significant Gains over Previous GPT-5.1 Codex Models

OpenAI also highlighted substantial improvements over earlier versions of the Codex line. Several real-world benchmarks demonstrated notable performance jumps.

• SWE-Lancer IC SWE: 66.3% to 79.9%

• SWE-Bench Verified: 73.7% to 77.9%

• Terminal-Bench 2.0: 52.8% to 58.1%

Refactoring and test automation in particular saw marked gains. Evaluators noted that the model can identify test failures, revise implementations autonomously, and iterate toward optimal solutions. The change signals a shift from simple code assistance toward AI systems capable of replacing core segments of a developer’s daily workflow.

Support Across CLI, IDEs, and Cloud Tools

GPT-5.1 Codex Max is available immediately across command-line interfaces, IDE extensions, cloud development platforms, and automated code review services. API access will follow soon, setting the stage for rapid adoption within enterprise engineering teams.

OpenAI describes Codex Max as the first general-purpose engineering model that can operate continuously during long development cycles, suggesting a broader transition toward AI-accelerated product development.

A New Phase in the AI Coding Race

The timing of the launch, coming shortly after Google revealed Gemini 3 Pro, reflects the intensifying competition for leadership in the developer AI market. The focus of the 2025 AI race is shifting away from language and multimodal capabilities and toward the degree to which AI systems can fully take over practical engineering work.

Written by Ju-baek Shinㅣjbshin@kmjournal.net

관련기사
저작권자 © KMJ 무단전재 및 재배포 금지