OpenAI Ships ChatGPT Images 2.0 and Builds Always-On Agent P

Four million weekly active users. That is where OpenAI's Codex stands as of mid-April 2026, up from three million just two weeks earlier, according to the Wall Street Journal. The company is now partnering with consulting firms to push the AI coding tool deeper into enterprise accounts.

But Codex is only one thread in a busy week for the ChatGPT maker.

ChatGPT Images 2.0

OpenAI shipped an upgraded image generation model inside ChatGPT that improves text rendering, multi-image reasoning, and output fidelity. The update targets production use cases — marketing visuals, comics, and design assets that previously required manual cleanup.

The company also published a detailed prompting guide for developers working with the new model, covering techniques for controlling style, structure, and consistency across image workflows.

An Always-On Agent Layer

Separately, OpenAI is building a persistent agent platform within ChatGPT, codenamed Hermes. The system lets users create custom agents that run continuously rather than waiting for a prompt.

Key capabilities reported so far:

Workflow creation and scheduling
Third-party skill integrations
Autonomous task execution between sessions
A management interface for monitoring active agents

The move puts OpenAI in direct competition with workspace tools like Notion AI that have added agent-like features. The difference is distribution: ChatGPT's installed base gives Hermes an immediate audience that standalone agent platforms lack.

OpenAI is also working with consulting partners who will receive access to Codex as part of a broader enterprise sales push, the Journal reported. The consulting program signals that OpenAI sees professional services as a channel for landing large contracts, a playbook familiar from earlier waves of enterprise software.

Qwen3.5-Omni and the Multimodal Race

Elsewhere, Alibaba's Qwen team published a technical report on Qwen3.5-Omni, a multimodal model with hundreds of billions of parameters. It processes text, audio, images, and video natively within a single architecture and supports a 256k token context length — enough for roughly ten hours of audio or 400 seconds of high-definition video.

The model uses a Hybrid Attention Mixture of Experts framework and a dynamic alignment technique called ARIA for multilingual speech synthesis. It represents a significant step in unified multimodal processing from a Chinese lab.

Meanwhile, Ramp Labs published research showing that autonomous coding agents consistently ignore their own token budgets. When asked to self-regulate spending, the agents exhibited severe self-attribution bias, praising their own progress and approving more spend nearly every time. The team found that separating the working agent from budget decisions — using an independent controller model evaluating objective workspace snapshots — was the only reliable fix.

View tweet from @RampLabs

Sam Altman also took a public shot at Anthropic's cybersecurity model Mythos, calling it "fear-based marketing" in comments reported by TechCrunch. The exchange highlights growing friction between the two leading frontier labs as they compete for enterprise trust and government contracts.

OpenAI Ships ChatGPT Images 2.0 and Builds Always-On Agent Platform

ChatGPT Images 2.0

An Always-On Agent Layer

Qwen3.5-Omni and the Multimodal Race

Related reading

Microsoft's AI Business Hits $37 Billion Run Rate as Nadella Defends Revised OpenAI Deal

Seven Lawsuits Accuse OpenAI of Concealing Violent ChatGPT Users Before Canada School Shooting

GitHub Copilot Moves to Usage-Based Billing From June 1

💬 Discussion