Claude Opus 4.6 tops AGI benchmarks but is slower and pricier than rivals

Duane Villanueva • Mar 6, 2026 • 1 min read

Anthropic has launched Claude Opus 4.6, its most capable frontier model yet, targeting knowledge work, agentic search, reasoning with tools, and complex coding. On internal benchmarks, Opus 4.6 beats GPT 5.2 across tasks like knowledge work, search, and tool-based reasoning.

The standout result is on the ARC-AGI 2 benchmark, where Opus 4.6 scores 68.8%, which is far above previous versions and significantly ahead of top competitors. ARC-AGI 2 tests whether models can learn new visual puzzle rules from a single example then apply them, a proxy for learning new patterns beyond training data.

Independent leaderboards such as LMSys Arena and Artificial Analysis also rank Opus 4.6 as the top model for both text and coding, above Gemini 3 Pro and GPT 5.2. The trade-offs include Opus 4.6 being slower and substantially more expensive than other frontier models, making it a niche choice unless you are blocked on a very hard coding or research task.

Anthropic has rolled it out through paid Claude plans and its API, so teams can selectively route their trickiest workloads to Opus 4.6 while using cheaper tiers for everyday prompts.

Duane Villanueva

Communication graduate, closet cynic, and kid at heart. Duane is a rare person to find, quite literally. He often takes to himself but has proven his mettle in tech media with his quick wits. Well, the portfolio of scriptwriting, web content, and public relations help too, we suppose. As a homebody, he often spends his time on the streaming platform Twitch or ‘farming’ gaming clips with friends. He is also an avid fan of round glasses and anything relative to blueberries.

189 posts

Claude Opus 4.6 tops AGI benchmarks but is slower and pricier than rivals

Comments

Cancel reply