Step 3.5 Flash: open 200B MoE model that rivals Gemini on reasoning

Duane Villanueva • Mar 9, 2026 • 1 min read

Step 3.5 Flash is a new open-weight mixture-of-experts model that aims to match top closed LLMs in reasoning and research while staying relatively efficient. The model has around 196 billion total parameters, but only about 11 billion are active per token, enabling high throughput despite the large architecture.

On internal benchmarks, Step 3.5 Flash delivers deep reasoning performance comparable to Gemini 3 Pro, Claude Opus 4.5, and GPT 5.2 Extra High. Its coding results trail the very top proprietary coders but still look strong for its size, and its “agentic” benchmarks show parity with the best models for multi-step tool use.

The wild card is deep research: Step3.5 Flash reportedly beats Gemini Deep Research and OpenAI Deep Research on certain evaluation suites, suggesting it can already handle long-form investigations across documents and sources. Practical throughput sits around 100–300 tokens per second, which is extremely fast for its class.

Weights are fully available on Hugging Face, but the full model is roughly 399GB, so you’ll need multiple GPUs to run it locally. A GitHub repo documents distributed setup for those willing to build an on‑prem research stack.

Duane Villanueva

Communication graduate, closet cynic, and kid at heart. Duane is a rare person to find, quite literally. He often takes to himself but has proven his mettle in tech media with his quick wits. Well, the portfolio of scriptwriting, web content, and public relations help too, we suppose. As a homebody, he often spends his time on the streaming platform Twitch or ‘farming’ gaming clips with friends. He is also an avid fan of round glasses and anything relative to blueberries.

194 posts

Step 3.5 Flash: open 200B MoE model that rivals Gemini on reasoning

Comments

Cancel reply