Skip to content
Giant Antique Postage Stamp style editorial illustration for the news article: DeepSeek V4 Pro and Flash Now on Vercel AI Gateway
Model ReleaseVercelDeveloper

DeepSeek V4 Pro and Flash are now available on Vercel AI Gateway

By Harsh Desai
Share

TL;DR

Vercel AI Gateway now supports DeepSeek V4 Pro and V4 Flash, exposing both 1M-token context models behind the standard gateway interface with built-in observability and rate limiting.

What changed

Vercel added DeepSeek V4 Pro and V4 Flash to the AI Gateway. Both models support a 1 million token context window and are accessible through the gateway's unified API alongside existing providers, with the standard observability, rate limiting, and key management Vercel already exposes.

Why it matters

For developers, this collapses integration cost. You can route a percentage of traffic to DeepSeek V4 without changing application code, just gateway config. That makes real $/M token comparisons cheap to run: shadow the new models, compare outputs and latency on production-like prompts, then shift weight when the numbers justify it. The 1M token window is the headline capability for codebase-scale analysis, long document ingestion, and large RAG bundles where chunking strategies have been hurting answer quality.

What to watch for

Validate tool-calling and structured output reliability before you treat V4 Pro as a drop-in for your current reasoning model; MoE models occasionally regress on strict JSON schemas. Use the gateway's built-in metrics to catch tail latency, especially on prompts above 200K tokens where MoE routing overhead shows up. Set explicit per-model budgets in the gateway and watch your token spend daily for the first week, since aggressive context use can offset the lower per-token price.

Who this matters for

  • Developers: Add the DeepSeek V4 model IDs to your gateway routing config and shadow-test them against your current provider for $/M tokens and p95 latency.

Harshs take

Developers, the gateway pattern is winning and you should commit to it. Hard-coding a single provider in your inference path was always a liability; with V4 Pro and Flash now in Vercel AI Gateway, swapping models is a config change, not a refactor. If your code still imports a vendor SDK directly in your hot path, fix that this sprint.

The interesting workload is long-context. 1M tokens through a gateway with observability means you can finally do whole-repo or whole-corpus prompts in production without writing your own metrics layer. Wire fallback routes from V4 Pro to V4 Flash on latency budget and stop overpaying for reasoning you are not using.

by Harsh Desai

Source:vercel.com

About Vercel

View the full Vercel page →All Vercel updates

Go deeper

More AI news

Everything AI. One email.
Every Monday.

New tools. Model launches. Plugins. Repos. Tactics. The moves the sharpest builders are making right now, before everyone else.

No spam. Unsubscribe anytime.