Skip to content

openai/human-eval

Official

Code for the paper "Evaluating Large Language Models Trained on Code"

3,247 stars442 forksPythonUpdated January 2025
✅ Reviewed by My AI Guide

Our Review

  • Standard benchmark for code models
  • Sandboxed execution for safety
  • Easy integration for model eval

Our Verdict

Code and benchmark dataset for evaluating large language models trained on code, from the OpenAI paper. Best for AI researchers and developers assessing code generation performance. Sets the standard as the original HumanEval, more established than newer benchmarks.

Frequently Asked Questions

What is openai/human-eval used for?

The openai/human-eval repository provides code for the paper 'Evaluating Large Language Models Trained on Code'. It offers a Python benchmark to test LLMs on generating correct code from docstrings, widely used in AI research.

What is human-eval?

Code for the paper "Evaluating Large Language Models Trained on Code"

How do I install human-eval?

Visit the GitHub repository at https://github.com/openai/human-eval for installation instructions.

What license does human-eval use?

human-eval uses the MIT license.

What are alternatives to human-eval?

Explore related tools and alternatives on My AI Guide.

🔒

Open source & community-verified

MIT licensed: free to use in any project, no strings attached. 3,247 developers have starred this, meaning the community has reviewed and trusted it.

Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.