Skip to content

openai/mle-bench

Official

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

1,473 stars235 forksPythonUpdated March 2026

Best for

Developer
✅ Reviewed by My AI Guide — vetted for vibe builders

Our Review

  • Comprehensive benchmark with leaderboard for agent comparison
  • Includes code for dataset, evals, and agents
  • Supports detailed performance metrics and reports

Cons

  • Focused only on ML engineering tasks
  • Requires significant compute for running evals
  • Leaderboard dates suggest ongoing but specific model focus

Our Verdict

Solid research tool for benchmarking AI agents in ML engineering; great for devs pushing agent frontiers.

Frequently Asked Questions

What is mle-bench?

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

How do I install mle-bench?

Visit the GitHub repository at https://github.com/openai/mle-bench for installation instructions.

What license does mle-bench use?

mle-bench uses the Other license.

What are alternatives to mle-bench?

Search My AI Guide for similar tools in this category.

Great for: Pro Vibe Builders

Skip if: You need something more beginner-friendly or guided

🔒

Open source & community-verified

Other licensed — free to use in any project, no strings attached. 1,473 developers have starred this, meaning the community has reviewed and trusted it.

Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.