Step 3.5 Flash is a new open-weight mixture-of-experts model that aims to match top closed LLMs in reasoning and research while staying relatively efficient. The model has around 196 billion total parameters, but only about 11 billion are active per token, enabling high throughput despite the large architecture.
On internal benchmarks, Step 3.5 Flash delivers deep reasoning performance comparable to Gemini 3 Pro, Claude Opus 4.5, and GPT 5.2 Extra High. Its coding results trail the very top proprietary coders but still look strong for its size, and its “agentic” benchmarks show parity with the best models for multi-step tool use.
The wild card is deep research: Step3.5 Flash reportedly beats Gemini Deep Research and OpenAI Deep Research on certain evaluation suites, suggesting it can already handle long-form investigations across documents and sources. Practical throughput sits around 100–300 tokens per second, which is extremely fast for its class.
Weights are fully available on Hugging Face, but the full model is roughly 399GB, so you’ll need multiple GPUs to run it locally. A GitHub repo documents distributed setup for those willing to build an on‑prem research stack.
Comments
No comments yet. Be the first to share your thoughts!