openai/mle-bench
OfficialMLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
Best for
Our Review
- •Comprehensive benchmark with leaderboard for agent comparison
- •Includes code for dataset, evals, and agents
- •Supports detailed performance metrics and reports
Cons
- Focused only on ML engineering tasks
- Requires significant compute for running evals
- Leaderboard dates suggest ongoing but specific model focus
Our Verdict
Solid research tool for benchmarking AI agents in ML engineering; great for devs pushing agent frontiers.
Frequently Asked Questions
What is mle-bench?
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
How do I install mle-bench?
Visit the GitHub repository at https://github.com/openai/mle-bench for installation instructions.
What license does mle-bench use?
mle-bench uses the Other license.
What are alternatives to mle-bench?
Search My AI Guide for similar tools in this category.
Great for: Pro Vibe Builders
Skip if: You need something more beginner-friendly or guided
Open source & community-verified
Other licensed — free to use in any project, no strings attached. 1,473 developers have starred this, meaning the community has reviewed and trusted it.
Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.