Google's Gemini 3.1 Flash Lite Model Now Available on OpenRouter
TL;DR
Google's lightweight Gemini 3.1 Flash Lite lands on OpenRouter at $0.25 per million input tokens with a roughly 1M-token context window.
What changed
Google added Gemini 3.1 Flash Lite to OpenRouter. The model handles roughly 1M-token context at $0.25 per million input tokens and $1.50 per million output. Existing OpenRouter integrations only need to switch the model ID to google/gemini-3.1-flash-lite to start using it.
Specs
- •Context window 1,049,576 tokens (~1M)
- •Pricing input $0.25 per million tokens
- •Pricing output $1.50 per million tokens
- •License proprietary
- •Model ID
google/gemini-3.1-flash-lite - •Vendor docs https://openrouter.ai/google/gemini-3.1-flash-lite
Why it matters
At $0.25/M input, Flash Lite undercuts GPT-4o-mini (~$0.15/M but smaller context) and Claude Haiku ($0.80/M) on the unit-cost-per-context-token metric. The 1M window lets you stuff an entire codebase, a multi-hundred-page PDF, or a year of customer-support transcripts into a single prompt without chunking. For high-volume tasks like classification, RAG, content scoring, and routing, the cost math now favours cheap-and-massive over premium-and-small.
What to watch for
Benchmark Flash Lite against Pro on your actual workload before swapping; lite tier trades raw reasoning for speed. Track OpenRouter rate limits as adoption climbs. Compare price-vs-context against DeepSeek V3.2, Qwen 3.5 Plus, and Kimi K2 before locking in a single provider for production.
Who this matters for
- Vibe Builders: Feed an entire project history (codebase + chat transcripts + design notes) into one prompt for genuinely coherent output across a build session.
- Basic Users: Run high-volume tasks (summarising long docs, classifying support tickets, batch translation) at a fraction of the cost of premium models.
Harsh’s take
Google is aggressively dumping cheap, high-context models into the ecosystem to choke out smaller competitors. At 25 cents per million tokens, Flash Lite forces a race to the bottom that turns proprietary model hosting into a commodity. This release is a clear signal Google wants to own the infrastructure layer for every developer building on OpenRouter.
If you ignore the cost efficiency here you are burning capital on inferior alternatives. The 1M context window is the only real differentiator worth noting; most developers hit token-limit pain when building complex automation, and this model solves that for pennies.
Stop overpaying for bloated models on simple tasks. Slot Flash Lite into classification, RAG, and routing today, or accept that your margins will keep shrinking against competitors who do.
by Harsh Desai
About Gemini
View the full Gemini page →All Gemini updatesMore AI news
- FeatureGitHub Blog publishes guide to reviewing agent pull requests
GitHub Blog released a guide on reviewing pull requests generated by AI agents. It covers checks for issues and preventing technical debt.
- FeatureAnthropic adds dreaming, outcomes, multiagent orchestration to Claude Managed Agents
Anthropic adds dreaming for cross-session learning, outcomes for rubric grading, and multiagent orchestration for parallel tasks to Claude Managed Agents in research preview.
- FeatureOpen-OSS privacy-filter trends on Hugging Face
Open-OSS released privacy-filter on Hugging Face Hub, a token-classification model that detects personally identifiable information in text. Built with the transformers library and supports ONNX + safetensors for download, fine-tuning, and inference.