Anthropic has launched Claude Opus 4.6, its most capable frontier model yet, targeting knowledge work, agentic search, reasoning with tools, and complex coding. On internal benchmarks, Opus 4.6 beats GPT 5.2 across tasks like knowledge work, search, and tool-based reasoning.
The standout result is on the ARC-AGI 2 benchmark, where Opus 4.6 scores 68.8%, which is far above previous versions and significantly ahead of top competitors. ARC-AGI 2 tests whether models can learn new visual puzzle rules from a single example then apply them, a proxy for learning new patterns beyond training data.
Independent leaderboards such as LMSys Arena and Artificial Analysis also rank Opus 4.6 as the top model for both text and coding, above Gemini 3 Pro and GPT 5.2. The trade-offs include Opus 4.6 being slower and substantially more expensive than other frontier models, making it a niche choice unless you are blocked on a very hard coding or research task.
Anthropic has rolled it out through paid Claude plans and its API, so teams can selectively route their trickiest workloads to Opus 4.6 while using cheaper tiers for everyday prompts.
Comments
No comments yet. Be the first to share your thoughts!