Skip to content

AI Leaderboards

Curated benchmarks and leaderboards to compare LLMs across coding, reasoning, speed, cost, and more.

Chatbot Arena

ELO-based ranking from real user votes. The gold standard for LLM quality comparison with blind A/B testing.

Measures: General LLM quality (ELO)

general

OpenRouter Rankings

Live model popularity and performance rankings across 100+ models from all major providers.

Measures: Model popularity + performance

generalcost

OpenRouter Apps

Directory of applications built on OpenRouter, showing which models power real products.

Measures: App ecosystem

general

ClawBench

Benchmark specifically for AI coding agents. Tests real-world software engineering tasks.

Measures: AI coding agents

coding

PinchBench

Comprehensive LLM benchmark suite covering reasoning, knowledge, and instruction following.

Measures: LLM benchmarks

generalreasoning

LLM Stats

Comparison of LLM speed, cost, and context window across all major models and providers.

Measures: Speed, cost, context

speedcost

Artificial Analysis

Independent quality, speed, and cost analysis of AI models with standardized testing methodology.

Measures: Speed, quality, cost

generalspeedcost

Design Arena

AI design quality ranking. Compares models on UI generation, visual design, and creative tasks.

Measures: AI design quality

design

BridgeBench

Reasoning benchmark testing logical deduction, mathematical reasoning, and analytical thinking.

Measures: Reasoning benchmarks

reasoning

VibeBench

Benchmark for vibe coding — tests how well AI models handle end-to-end app building from natural language.

Measures: Vibe coding benchmarks

coding