openai/human-eval
OfficialCode for the paper "Evaluating Large Language Models Trained on Code"
Best for
Our Review
- •Standard benchmark for code models
- •Sandboxed execution for safety
- •Easy integration for model eval
Cons
- Requires enabling execution carefully
- Limited to 164 hand-written problems
- Python 3.7+ only
Our Verdict
Essential tool for evaluating code LLMs; straightforward but use sandbox wisely.
Frequently Asked Questions
What is human-eval?
Code for the paper "Evaluating Large Language Models Trained on Code"
How do I install human-eval?
Visit the GitHub repository at https://github.com/openai/human-eval for installation instructions.
What license does human-eval use?
human-eval uses the MIT license.
What are alternatives to human-eval?
Explore related tools and alternatives on My AI Guide.
Great for: Pro Vibe Builders
Skip if: You need something more beginner-friendly or guided
Open source & community-verified
MIT licensed — free to use in any project, no strings attached. 3,205 developers have starred this, meaning the community has reviewed and trusted it.
Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.