news

Step 3.7 Flash Now Available on TokenFans

StepFun’s flagship multimodal reasoning model with native image and video understanding, 256K context, tool calling, and agent-oriented performance is now supported on TokenFans.

Model Scout · 2026-06-09

If you are building with AI every day, Step 3.7 Flash is a model worth trying. It is not only a faster reasoning model. StepFun is positioning it as a flagship multimodal reasoning model for the kind of work modern AI users increasingly care about: coding agents, tool use, visual understanding, long task execution, and real-world automation.

Why Step 3.7 Flash Matters

AI workflows are becoming more visual and more agentic. A coding assistant may need to inspect a screenshot, read a repo, call tools, and turn an interface into code. A research or operations assistant may need to understand a chart, extract data from a receipt, summarize a whiteboard, or diagnose a bug from a screen recording.

Step 3.7 Flash is designed for exactly this direction.

According to StepFun’s public documentation, Step 3.7 Flash is based on a sparse MoE architecture with 198B total parameters and 11B active parameters. It supports a 256K-token context window, native image and video understanding, tool calling, and three reasoning-effort levels: low, medium, and high.

That combination makes the model especially interesting for AI power users who want one model to handle both text-heavy reasoning and visual tasks inside agent workflows.

Built for Multimodal Agents

The key upgrade is native multimodality. Step 3.7 Flash can work with images, video, and text in the same conversation without requiring a separate vision model.

That opens up practical workflows such as:

turning whiteboard photos into project plans
extracting structured data from charts and reports
converting receipts and invoices into tables
generating code from UI screenshots
diagnosing issues from screen recordings
guiding GUI or mobile agents from screenshots and task history

For developers, this matters because agent workflows are rarely pure text. Real tasks often involve logs, screenshots, browser states, dashboards, design mocks, and partially structured documents. A model that can reason over those inputs directly can reduce the number of handoffs in an automation pipeline.

Try Step 3.7 Flash on TokenFans

Step 3.7 Flash is now available on TokenFans through the same unified access experience you already use.

For TokenFans users, this makes Step 3.7 Flash a practical new candidate for daily testing: not because any model spec should be accepted on faith, but because its design matches where advanced AI usage is going. Longer context, native multimodality, stronger tool use, and more reliable agent execution are becoming core requirements rather than nice-to-have features.

TokenFans makes it easier to compare Step 3.7 Flash alongside other models in your own workflows: your own prompts, your own codebase, your own screenshots, your own agent setup, and your own latency and cost expectations.

If you already use TokenFans with OpenCode, Cline, or other AI developer tools, Step 3.7 Flash is a strong model to add to your rotation. Step 3.7 Flash is now live on TokenFans. Try it and see how it performs in your own stack.