DeepSeek V4 Pro and Flash are now available on Vercel AI Gateway
TL;DR
Vercel AI Gateway now supports DeepSeek V4 Pro and V4 Flash, exposing both 1M-token context models behind the standard gateway interface with built-in observability and rate limiting.
What changed
Vercel added DeepSeek V4 Pro and V4 Flash to the AI Gateway. Both models support a 1 million token context window and are accessible through the gateway's unified API alongside existing providers, with the standard observability, rate limiting, and key management Vercel already exposes.
Why it matters
For developers, this collapses integration cost. You can route a percentage of traffic to DeepSeek V4 without changing application code, just gateway config. That makes real $/M token comparisons cheap to run: shadow the new models, compare outputs and latency on production-like prompts, then shift weight when the numbers justify it. The 1M token window is the headline capability for codebase-scale analysis, long document ingestion, and large RAG bundles where chunking strategies have been hurting answer quality.
What to watch for
Validate tool-calling and structured output reliability before you treat V4 Pro as a drop-in for your current reasoning model; MoE models occasionally regress on strict JSON schemas. Use the gateway's built-in metrics to catch tail latency, especially on prompts above 200K tokens where MoE routing overhead shows up. Set explicit per-model budgets in the gateway and watch your token spend daily for the first week, since aggressive context use can offset the lower per-token price.
Who this matters for
- Developers: Add the DeepSeek V4 model IDs to your gateway routing config and shadow-test them against your current provider for $/M tokens and p95 latency.
Harsh’s take
Developers, the gateway pattern is winning and you should commit to it. Hard-coding a single provider in your inference path was always a liability; with V4 Pro and Flash now in Vercel AI Gateway, swapping models is a config change, not a refactor. If your code still imports a vendor SDK directly in your hot path, fix that this sprint.
The interesting workload is long-context. 1M tokens through a gateway with observability means you can finally do whole-repo or whole-corpus prompts in production without writing your own metrics layer. Wire fallback routes from V4 Pro to V4 Flash on latency budget and stop overpaying for reasoning you are not using.
by Harsh Desai
About Vercel
View the full Vercel page →All Vercel updatesGo deeper
More AI news
- FeatureChatGPT's Lockdown Mode is now available to all logged-in users
ChatGPT has released Lockdown Mode for all logged-in users. The optional setting restricts network features including web browsing, research, and downloads to lower prompt injection risks.
- FeatureAnthropic suspends access to new models as India debates AI future
Anthropic has suspended access to its new models in India. Tech leaders discuss the impact on the country's AI development.
- Daily RoundupRio-3.5 trends on Hugging Face, BiRefNet video tools hit Replicate, Anthropic industry updates
Fresh open models appeared on Hugging Face while Replicate added background removal options for video and images. Vercel and Anthropic released policy and integration changes that affect access and workflows.