openai/mle-bench

Official

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

1,560 stars250 forksPythonUpdated April 2026

✅ Reviewed by My AI Guide

Our Review

•Comprehensive benchmark with leaderboard for agent comparison
•Includes code for dataset, evals, and agents
•Supports detailed performance metrics and reports

Our Verdict

Solid research tool for benchmarking AI agents in ML engineering; great for devs pushing agent frontiers.

Frequently Asked Questions

What is mle-bench?

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

How do I install mle-bench?

Visit the GitHub repository at https://github.com/openai/mle-bench for installation instructions.

What license does mle-bench use?

mle-bench uses the Other license.

What are alternatives to mle-bench?

Search My AI Guide for similar tools in this category.

🔒

Open source & community-verified

Other licensed: free to use in any project, no strings attached. 1,560 developers have starred this, meaning the community has reviewed and trusted it.

Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.

← Browse more repos