DeepSeek launches V4-Pro and V4-Flash previews at low prices

By Harsh Desai25 April 2026

TL;DR

DeepSeek released V4-Pro and V4-Flash previews, two Mixture-of-Experts models with a 1M-token context window, priced well below comparable frontier APIs.

What changed

DeepSeek released previews of V4-Pro and V4-Flash, two Mixture-of-Experts models supporting a 1 million token context window. Both target frontier-class reasoning at a fraction of the API price of comparable models. V4-Flash optimizes for low latency, V4-Pro for deeper reasoning.

Why it matters

For developers, this is another aggressive cut on $/M tokens for long-context workloads. You can route document analysis, code review, and large RAG payloads through a cheaper endpoint without sacrificing reasoning depth. V4-Flash in particular is positioned for high-throughput pipelines where latency drives unit economics.

For non technical users, the practical upshot is that the AI assistants and document tools you rely on can now ingest much more material in one go: a full contract, a quarter of emails, a complete onboarding binder, all in one prompt rather than chopped into pieces.

What to watch for

Developers should benchmark V4-Pro and V4-Flash on their own evals before migrating, especially on tool-use and structured output tasks where MoE models sometimes regress. Watch the licensing terms and rate limits carefully on the preview endpoints. Non technical users should check whether the apps they already pay for offer DeepSeek as a model option, since switching often requires nothing more than a dropdown change.

Who this matters for

Developers: Swap your current API provider for DeepSeek V4-Flash and benchmark $/M tokens against your current spend before next billing cycle.
Non Technical: Use the 1M token window to upload your full company knowledge base into one prompt for instant answers.

Harsh’s take

Developers: the price floor on frontier-class inference just dropped again. If you are still paying premium $/M token rates for routine extraction or classification, you are wasting margin. Run V4-Pro and V4-Flash through your evals this week and stop romanticizing your current provider.

Non Technical users: this matters because the AI tools you already use will get cheaper or smarter without you doing anything. The 1M token window means you can paste your entire employee handbook or a year of meeting notes into one chat and get a real answer, not a fragment. Stop treating ChatGPT-style tools like a search box.

by Harsh Desai

Source:simonwillison.net

More AI news

Daily Roundup29 July 2026
Pixelship agent on Replicate, Gemini 3.6 Flash batch on OpenRouter, and agent tools for builders
New agentic image tools and trending multimodal models arrived alongside expanded Gemini agent features and deployment options for production use.
Daily Roundup28 July 2026
Kimi K3 on AI Gateway, mage-flow on Replicate, and agent tools for builders
Vendors added model access, regional routing, and Slack hooks while new image and agent products appeared on Replicate and Product Hunt.
Weekly Digest27 July 2026
Hermes Agent 80% latency cuts and 51 updates, OpenClaw Mac app, and durable export tools
Hermes Agent rolled out dozens of stability, speed, and integration fixes across three days while OpenClaw added a Mac app and remote server catalog.