DeepSeek V4 Flash: a free, long-context AI model now on OpenRouter

What changed

DeepSeek V4 Flash launched free on OpenRouter. This MoE model uses 13B active parameters out of 284B total for fast inference. Users get immediate API access with 256k context.

Specs

•Parameters 284B total, 13B active
•Context window 256k tokens
•Pricing input $0.00 per M tokens
•Pricing output $0.00 per M tokens
•Model ID deepseek/deepseek-v4-flash:free
•Vendor docs https://openrouter.ai/deepseek/deepseek-v4-flash:free

Why it matters

Free API access delivers 284B-parameter MoE capabilities at zero cost. This undercuts GPT-4o mini pricing of $0.15 per M input tokens. Builders gain cheap option for RAG over customer-support transcripts.

What to watch for

Compare inference speed to Gemini 1.5 Flash on long prompts. Test rate limits during peak hours on OpenRouter. Monitor DeepSeek updates for potential 1M context expansion.

Who this matters for

Vibe Builders: Experiment with free, high-capacity models to prototype complex AI agents without cost.
Developers: Integrate the free DeepSeek V4 Flash API to scale RAG pipelines and long-context analysis.

Harsh’s take

The arrival of free, high-parameter MoE models on OpenRouter signals a shift in the economics of inference. By offering 284B parameters at zero cost, DeepSeek forces a reevaluation of utility-based pricing for mid-tier tasks. This is a clear win for builders who need to process massive datasets or long-form transcripts without burning through API credits.

Smart operators will treat this as a sandbox for high-volume experimentation. The 256k context window makes it a viable candidate for complex RAG workflows that previously required expensive proprietary models. Test the latency against established flash models to determine if this fits your production stack.

Relying on free tiers requires a solid fallback strategy, so build your infrastructure to swap providers when rate limits hit.

by Harsh Desai

Source:openrouter.ai

About OpenRouter

View the full OpenRouter page →All OpenRouter updates

More from OpenRouter

Launch12 May 2026
OpenRouter launches Perceptron Mk1 with 33k context at $0.15/M input, $1.50/M output
OpenRouter launched Perceptron Mk1, a vision-language model for video and embodied reasoning. It processes images and videos with 33k context at $0.15/M input and $1.50/M output.
Launch8 May 2026
OpenRouter adds Tencent Hunyuan model with 262K context window
OpenRouter adds Tencent Hunyuan Hy3 preview, a mixture-of-experts model with 262K context window. Pricing is $0.07/M input tokens and $0.26/M output tokens with configurable reasoning levels.
Launch8 May 2026
InclusionAI launches free Ring-2.6-1T (262k context) on OpenRouter
InclusionAI launches free Ring-2.6-1T on OpenRouter. The 1T-parameter-scale model uses 63B active parameters and supports coding agents and tool use.

TL;DR