AI Leaderboards
Curated benchmarks and leaderboards to compare LLMs across coding, reasoning, speed, cost, and more.
Chatbot Arena
ELO-based ranking from real user votes. The gold standard for LLM quality comparison with blind A/B testing.
Measures: General LLM quality (ELO)
Updated daily
Visit LeaderboardOpenRouter Rankings
Live model popularity and performance rankings across 100+ models from all major providers.
Measures: Model popularity + performance
Updated daily
Visit LeaderboardOpenRouter Apps
Directory of applications built on OpenRouter, showing which models power real products.
Measures: App ecosystem
Updated daily
Visit LeaderboardClawBench
Benchmark specifically for AI coding agents. Tests real-world software engineering tasks.
Measures: AI coding agents
Updated daily
Visit LeaderboardPinchBench
Comprehensive LLM benchmark suite covering reasoning, knowledge, and instruction following.
Measures: LLM benchmarks
Updated daily
Visit LeaderboardLLM Stats
Comparison of LLM speed, cost, and context window across all major models and providers.
Measures: Speed, cost, context
Updated daily
Visit LeaderboardArtificial Analysis
Independent quality, speed, and cost analysis of AI models with standardized testing methodology.
Measures: Speed, quality, cost
Updated daily
Visit LeaderboardDesign Arena
AI design quality ranking. Compares models on UI generation, visual design, and creative tasks.
Measures: AI design quality
Updated daily
Visit LeaderboardBridgeBench
Reasoning benchmark testing logical deduction, mathematical reasoning, and analytical thinking.
Measures: Reasoning benchmarks
Updated daily
Visit LeaderboardVibeBench
Benchmark for vibe coding — tests how well AI models handle end-to-end app building from natural language.
Measures: Vibe coding benchmarks
Updated daily
Visit Leaderboard