openai/gpt-oss
Officialgpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
OpenAI's first Apache-2.0 open-weight models with 19,993 GitHub stars -- MXFP4 quantization fits 120B parameters on a single H100 GPU. Run gpt-oss-20b locally via Ollama on a 16GB GPU, or deploy gpt-oss-120b at scale with vLLM for production agentic workloads.
Best for
Our Review
GPT-OSS delivers OpenAI's reasoning models as open weights -- gpt-oss-120b and gpt-oss-20b with 19,993 GitHub stars as of April 2026. Apache-2.0 licensed from OpenAI, MXFP4 quantization fits the 117B model on one 80GB GPU.
Key capabilities:
- •Model sizes gpt-oss-120b uses 117B params with 5.1B active for single 80GB GPU; gpt-oss-20b uses 21B params with 3.6B active for 16GB setups.
- •MXFP4 quantization post-training method cuts memory needs while keeping performance for easy deployment.
- •Configurable reasoning pick low, medium, or high effort per request to trade speed for depth.
- •Chain-of-thought traces view full reasoning steps to debug and improve outputs.
- •Agentic features handle function calling, web browsing, Python execution, and structured outputs natively.
- •Fine-tuning support tune all parameters on your data for custom needs.
Runtime support:
- •Transformers load weights directly in Hugging Face for standard inference.
- •vLLM serve at high throughput for production apps.
- •Ollama run locally with one command, no API key.
- •LM Studio use in desktop app for quick tests.
- •PyTorch / Triton / Metal optimize for NVIDIA, Apple, or custom hardware.
Getting started:
Pull from Hugging Face Hub. Run gpt-oss-20b with ollama run gpt-oss-20b. Always use Harmony response format in prompts. Check repo README for benchmarks and examples.
Limitations:
Harmony response format is mandatory -- models produce incorrect output without it. The gpt-oss-120b model requires an 80GB GPU (H100 or MI300X) for full performance. Reference implementations for PyTorch and Triton are educational, not production-ready. Latest release is v0.0.9 from January 2026 -- expect rapid changes.
Cons
- Harmony format required-- models produce garbage without exact prompt structure; read docs first.
- Early release stage-- latest v0.0.9 from January 2026; expect bugs or missing features.
- GPU hardware needed-- gpt-oss-120b demands 80GB VRAM; no CPU fallback for full speed.
- No built-in UI-- pair with Ollama or LM Studio for interfaces; raw code only.
Our Verdict
Developers building agentic applications will find GPT-OSS compelling -- Apache-2.0 licensed, OpenAI-quality reasoning, and native function calling that most open models lack. Deploy with vLLM for production throughput or fine-tune on your own data without licensing restrictions.
For Vibe Builders, gpt-oss-20b via Ollama runs on consumer GPUs with 16GB VRAM. Prototype AI agents with code execution and web browsing locally -- no API keys, no per-token costs. The ollama run gpt-oss:20b command gets you from zero to inference in minutes.
The Harmony response format requirement is the biggest friction point. Models will not work correctly without it, and the documentation is still catching up to the implementation. Plan to read the format spec before writing your first prompt.
Skip if you need a polished developer experience with extensive documentation, or if your hardware caps out below 16GB VRAM. For hosted inference without hardware management, stick with the ChatGPT API or Claude API instead.
Frequently Asked Questions
What is GPT-OSS?
GPT-OSS releases OpenAI's open-weight models gpt-oss-120b (117B params) and gpt-oss-20b (21B params) for advanced reasoning and agents. It provides inference code, benchmarks, and full chain-of-thought traces. Weights are hosted on Hugging Face Hub. The GitHub repo has 19,993 stars. Developers deploy locally via Ollama or vLLM without API dependencies.
Is GPT-OSS free and what is the license?
GPT-OSS is completely free under OpenAI's Apache-2.0 license. Download weights and code from Hugging Face and GitHub at no cost. It permits unrestricted commercial use, forking, and modifications without API fees. The repository achieved 19,993 GitHub stars. Latest release v0.0.9 launched in Jan 2026.
GPT-OSS vs DeepSeek -- which to choose?
GPT-OSS includes native agent tools like web browsing and Python execution, while DeepSeek dominates raw reasoning benchmarks like GSM8K. GPT-OSS runs 120B model on single 80GB GPU using MXFP4 quantization. DeepSeek demands more hardware for similar scale. Choose GPT-OSS when building agents, DeepSeek when chasing math scores.
How do you run GPT-OSS locally?
Run GPT-OSS locally by installing Ollama and typing `ollama run gpt-oss-20b` for instant launch on 16GB GPU. Download from Hugging Face Hub beforehand if needed. Apply Harmony prompt format: 'reasoning_effort: medium'. Test Python agent calls via scripts. Also supports vLLM, Transformers, LM Studio. Full guide in repo README.
What hardware do you need for GPT-OSS?
The gpt-oss-20b model requires 16GB VRAM like RTX 3090; gpt-oss-120b fits single 80GB H100 GPU in 2026 setups. MXFP4 quantization limits to 5.1B active params for efficiency. CPU runs too slowly for production. Apple Metal works via reference code. Verify with repo benchmarks for your hardware config.
How do I install gpt-oss?
Visit the GitHub repository at https://github.com/openai/gpt-oss for installation instructions.
What license does gpt-oss use?
gpt-oss uses the Apache-2.0 license.
What are alternatives to gpt-oss?
Explore related tools and alternatives on My AI Guide.
Great for: Pro Vibe Builders
Skip if: You need something more beginner-friendly or guided
Open source & community-verified
Apache-2.0 licensed — free to use in any project, no strings attached. 20,017 developers have starred this, meaning the community has reviewed and trusted it.
Reviewed by My AI Guide for relevance, quality, and active maintenance before listing.