AI Leaderboards
Curated benchmarks and leaderboards to compare LLMs across coding, reasoning, speed, cost, and more.
Chatbot Arena
ELO-based ranking from real user votes. The gold standard for LLM quality comparison with blind A/B testing.
Measures: General LLM quality (ELO)
OpenRouter Rankings
Live model popularity and performance rankings across 100+ models from all major providers.
Measures: Model popularity + performance
OpenRouter Apps
Directory of applications built on OpenRouter, showing which models power real products.
Measures: App ecosystem
ClawBench
Benchmark specifically for AI coding agents. Tests real-world software engineering tasks.
Measures: AI coding agents
PinchBench
Comprehensive LLM benchmark suite covering reasoning, knowledge, and instruction following.
Measures: LLM benchmarks
LLM Stats
Comparison of LLM speed, cost, and context window across all major models and providers.
Measures: Speed, cost, context
Artificial Analysis
Independent quality, speed, and cost analysis of AI models with standardized testing methodology.
Measures: Speed, quality, cost
Design Arena
AI design quality ranking. Compares models on UI generation, visual design, and creative tasks.
Measures: AI design quality
BridgeBench
Reasoning benchmark testing logical deduction, mathematical reasoning, and analytical thinking.
Measures: Reasoning benchmarks
VibeBench
Benchmark for vibe coding — tests how well AI models handle end-to-end app building from natural language.
Measures: Vibe coding benchmarks