Skip to content
Antique Anatomical / Mechanical Diagram style editorial illustration for the news article: 7 OpenRouter Leaderboards Reshuffle with Hy3, Kimi K2.6, Cl

7 OpenRouter Leaderboards Reshuffle with Hy3, Kimi K2.6, Claude Sonnet 4.6 Top

By Harsh Desai
Share

TL;DR

OpenRouter reshuffled 7 leaderboards for the week of 2026-05-11. Hy3 preview (free), Kimi K2.6, and Claude Sonnet 4.6 hold the top three across rankings.

A volatile week on OpenRouter. 6 rankings saw their #1 model change, including gpt-oss-safeguard-20b, Tencent's Hy3 Preview (free), Tencent's Hy3 preview. The field is churning fast: see per-area breakdowns.

LLM Leaderboard

The big picture: which models AI builders are paying to run this week.

  1. Hy3 preview (free) by Tencent: 3.24T tokens ↑25%
  2. Kimi K2.6 by Moonshot: 1.68T tokens ↑7%
  3. Claude Sonnet 4.6 by Anthropic: 1.45T tokens ↑7%
  4. Claude Opus 4.7 by Anthropic: 1.2T tokens ↑31%
  5. Gemini 3 Flash Preview by Google: 1.07T tokens ↑11%

Benchmark Leaders (AA Index)

Independent benchmark scores via Artificial Analysis. High AA Index = stronger reasoning, but cost matters too.

  1. GPT-5.5 (xhigh) by OpenAI: 60.2 AA Index
  2. Claude Opus 4.7 (Adaptive Reasoning, Max Effort) by Anthropic: 57.3 AA Index
  3. Gemini 3.1 Pro Preview by Google: 57.2 AA Index
  4. GPT-5.4 (xhigh) by OpenAI: 56.8 AA Index
  5. Kimi K2.6 by Moonshot: 53.9 AA Index

Fastest Models

Throughput champs: pick these for latency-sensitive apps where speed beats raw quality.

  1. gpt-oss-safeguard-20b
  2. gpt-oss-120b
  3. Qwen3 32B
  4. gpt-oss-20b
  5. Qwen3 235B A22B Instruct 2,507

Top Coding Models

If you're shipping coding workflows on OpenRouter, this is what other builders chose this week.

  1. Hy3 Preview (free) by Tencent: 2.31T tokens ↑23.6%
  2. Kimi K2.6 by Moonshot: 1.81T tokens ↑18.5%
  3. Claude Opus 4.7 by Anthropic: 500B tokens ↑5.1%
  4. Step 3.5 Flash by StepFun: 448B tokens ↑4.6%
  5. DeepSeek V4 Pro by DeepSeek: 417B tokens ↑4.3%

Top Models for English

Most-used models for English content this week: the multilingual leaders.

  1. Hy3 preview by Tencent: 336B tokens ↑8.7%
  2. Kimi K2.6 by Moonshot: 258B tokens ↑6.7%
  3. DeepSeek V4 Flash by DeepSeek: 216B tokens ↑5.6%
  4. Claude Sonnet 4.6 by Anthropic: 197B tokens ↑5.1%
  5. DeepSeek V3.2 by DeepSeek: 184B tokens ↑4.8%

Top Models for Python

Which models are getting picked for Python work right now.

  1. Hy3 preview by Tencent: 145B tokens ↑17.6%
  2. DeepSeek V4 Flash by DeepSeek: 49.9B tokens ↑6.1%
  3. Kimi K2.6 by Moonshot: 43.8B tokens ↑5.3%
  4. Claude Opus 4.7 by Anthropic: 40.6B tokens ↑4.9%
  5. DeepSeek V3.2 by DeepSeek: 40.3B tokens ↑4.9%

Top Models for short prompts (1K-10K tokens)

For short prompts (1K-10K tokens, the bulk of typical traffic), here's what builders chose.

  1. Gemini 2.5 Flash Lite by Google: 111M requests ↑9.4%
  2. Gemini 2.5 Flash by Google: 87.4M requests ↑7.4%
  3. Grok 4.1 Fast by X-ai: 83.6M requests ↑7.1%
  4. Gemini 3 Flash Preview by Google: 66.4M requests ↑5.6%
  5. gpt-oss-120b by OpenAI: 50.2M requests ↑4.3%

Top Models for Tool Calls

If your stack uses tool calls / function calling, these models are getting the most invocations.

  1. Hy3 Preview (free) by Tencent: 35.5M tokens ↑11.3%
  2. Gemini 3 Flash Preview by Google: 17.9M tokens ↑5.7%
  3. Kimi K2.6 by Moonshot: 16.9M tokens ↑5.4%
  4. Claude Sonnet 4.6 by Anthropic: 14.1M tokens ↑4.5%
  5. Gemini 2.5 Flash by Google: 13.7M tokens ↑4.4%

Top Image Models

Image-generation through OpenRouter: most-served models this week.

  1. Gemini 2.5 Flash Lite by Google: 187M images ↑32.6%
  2. Gemini 3 Flash Preview by Google: 55.1M images ↑9.6%
  3. Qwen3.5 397B A17B by Qwen: 42.5M images ↑7.4%
  4. Gemini 2.5 Flash by Google: 41.9M images ↑7.3%
  5. GPT-4.1 Mini by OpenAI: 26.4M images ↑4.6%

Top Audio-Input Models

Audio-input (transcription, voice-in) leaders.

  1. Gemini 3.1 Flash Lite Preview by Google: 2.58M prompts ↑45.4%
  2. Gemini 2.5 Flash by Google: 1.04M prompts ↑18.2%
  3. Gemini 3 Flash Preview by Google: 770K prompts ↑13.5%
  4. Gemini 3.1 Pro Preview by Google: 309K prompts ↑5.4%
  5. Gemini 2.0 Flash Lite by Google: 157K prompts ↑2.8%

Top Apps on OpenRouter

Useful as social proof when picking your stack: the largest public apps and agents that opt into OpenRouter usage tracking.

Most Popular

  1. OpenClaw (8.96T tokens): OpenClaw is an open-source AI agent that connects to your messaging apps and takes real actions on your behalf, from running commands and browsing the web to managing files and sending emails.
  2. Hermes Agent (6.47T tokens): Hermes Agent is an open-source, self-improving AI agent by Nous Research that runs persistently with memory across sessions, and builds reusable skills from experience. It comes with 40+ built-in tools, including web search, browser automation, and vision, plus scheduled automations and subagents.
  3. Kilo Code (5.41T tokens): Kilo Code is an open-source AI coding agent that works across VS Code, JetBrains, and CLI to help developers ship code faster with agentic workflows.
  4. Claude Code (3.05T tokens): Claude Code is Anthropic's agentic coding tool that reads your entire codebase, plans and executes changes across files, runs tests, and iterates on failures, all from natural language prompts.

Trending

Fastest-growing apps on OpenRouter this week: early signals of which builder workflows are breaking out.

  1. pi (312B tokens) ↑242%
  2. Hermes Agent (1.69T tokens) ↑6%
  3. Lemonade (172B tokens) ↑32%
  4. MinutaIA (39.7B tokens) ↑1,103%
  5. Roo Code (151B tokens) ↑25%
  6. One API (32.2B tokens) ↑357%
  7. PaperGen Terminus-2 Agent (19.7B tokens) ↑415,330%
  8. Portkey AI (31.5B tokens) ↑109%

What this means for builders

The OpenRouter rankings are the cleanest signal of what AI builders pay to run, not what vendor marketing claims. Watch the top three week-on-week: when they reshuffle, the field is volatile and your model choice has a short half-life.

Who this matters for

  • Vibe Builders: Pick winners by real-world usage on OpenRouter, not vendor marketing.
  • Developers: Token-share ratios are the cleanest production-fit signal for model selection.

Harshs take

OpenRouter rankings are the realest signal we have for what AI builders actually pay to run. Marketing pages claim everything; this table reflects production fit. Watch leadership changes for early indicators of where the field is consolidating.

by Harsh Desai

Source:openrouter.ai

About OpenRouter

View the full OpenRouter page →All OpenRouter updates

More from OpenRouter

Everything AI. One email.
Every Monday.

New tools. Model launches. Plugins. Repos. Tactics. The moves the sharpest builders are making right now, before everyone else.

No spam. Unsubscribe anytime.