DeepSeek V4 Flash: a free, long-context AI model now on OpenRouter
TL;DR
OpenRouter adds DeepSeek V4 Flash for free with 256k context. The efficiency-optimized MoE model has 284B total parameters, 13B active, and supports 1M-token context for fast inference.
What changed
DeepSeek V4 Flash launched free on OpenRouter. This MoE model uses 13B active parameters out of 284B total for fast inference. Users get immediate API access with 256k context.
Specs
- •Parameters 284B total, 13B active
- •Context window 256k tokens
- •Pricing input $0.00 per M tokens
- •Pricing output $0.00 per M tokens
- •Model ID deepseek/deepseek-v4-flash:free
- •Vendor docs https://openrouter.ai/deepseek/deepseek-v4-flash:free
Why it matters
Free API access delivers 284B-parameter MoE capabilities at zero cost. This undercuts GPT-4o mini pricing of $0.15 per M input tokens. Builders gain cheap option for RAG over customer-support transcripts.
What to watch for
Compare inference speed to Gemini 1.5 Flash on long prompts. Test rate limits during peak hours on OpenRouter. Monitor DeepSeek updates for potential 1M context expansion.
Who this matters for
- Vibe Builders: Experiment with free, high-capacity models to prototype complex AI agents without cost.
- Developers: Integrate the free DeepSeek V4 Flash API to scale RAG pipelines and long-context analysis.
Harsh’s take
The arrival of free, high-parameter MoE models on OpenRouter signals a shift in the economics of inference. By offering 284B parameters at zero cost, DeepSeek forces a reevaluation of utility-based pricing for mid-tier tasks. This is a clear win for builders who need to process massive datasets or long-form transcripts without burning through API credits.
Smart operators will treat this as a sandbox for high-volume experimentation. The 256k context window makes it a viable candidate for complex RAG workflows that previously required expensive proprietary models. Test the latency against established flash models to determine if this fits your production stack.
Relying on free tiers requires a solid fallback strategy, so build your infrastructure to swap providers when rate limits hit.
by Harsh Desai
About OpenRouter
View the full OpenRouter page →All OpenRouter updatesMore from OpenRouter
- LaunchOpenRouter launches Perceptron Mk1 with 33k context at $0.15/M input, $1.50/M output
OpenRouter launched Perceptron Mk1, a vision-language model for video and embodied reasoning. It processes images and videos with 33k context at $0.15/M input and $1.50/M output.
- LaunchOpenRouter adds Tencent Hunyuan model with 262K context window
OpenRouter adds Tencent Hunyuan Hy3 preview, a mixture-of-experts model with 262K context window. Pricing is $0.07/M input tokens and $0.26/M output tokens with configurable reasoning levels.
- LaunchInclusionAI launches free Ring-2.6-1T (262k context) on OpenRouter
InclusionAI launches free Ring-2.6-1T on OpenRouter. The 1T-parameter-scale model uses 63B active parameters and supports coding agents and tool use.