Z.AI releases GLM-5.1 as open-weight 754B agentic model with 8-hour autonomous execution
TL;DR
Z.AI (Zhipu) released GLM-5.1 on April 8, 2026 as an open-weight 754B parameter model under MIT license. The release targets agentic engineering specifically: state-of-the-art on SWE-Bench Pro, long-horizon autonomous execution demonstrated by completing a Linux desktop build across 8 hours of continuous work.
What shipped
Z.AI (the international brand of Zhipu AI) released GLM-5.1 on April 8, 2026. Key specs:
- 754 billion parameters in the base model.
- Open-weight under MIT license: maximal permissiveness for commercial + research use.
- Agentic-first design. Built for long-horizon tool-use sequences, not chat.
- SWE-Bench Pro SOTA at 58.4, leading the released frontier on that benchmark.
Z.AI also deprecated GLM-5 on April 20, with GLM-5.1 as the replacement. Enterprise integrations through NIM have had some friction during the transition.
Agentic execution over long horizons
The headline demonstration: Z.AI wrapped GLM-5.1 in a simple self-review harness and let it build a Linux desktop application over 8 hours of continuous execution. After each execution round, the model reviewed its own output, identified gaps (missing features, rough styling, broken interactions), and continued. The 8-hour run produced substantially better results than a single-shot run of the same task on the same model.
This pattern: the model as its own reviewer: is how Z.AI is framing GLM-5.1 as distinct from GLM-5. Previous iteration was a general-purpose coding model; GLM-5.1 is specifically built to remain productive across hundreds of rounds and thousands of tool calls.
Coding Plan rollout
Z.AI is rolling out GLM-5.1 to all GLM Coding Plan subscribers. Update the model name to "GLM-5.1" in Claude Code settings (or the equivalent in your tool of choice).
Quota economics: GLM-5.1 consumes quota at 3x during peak hours and 2x off-peak. Through end of April, off-peak is a promotional 1x. Peak hours are 14:00-18:00 UTC+8 (Beijing daily).
Benchmarks at a glance
- •SWE-Bench Pro 58.4 (SOTA)
- •Terminal-Bench 2.0 strong showing, details in the release
- •NL2Repo leads GLM-5 by a wide margin
- •Cybersecurity suite improved agentic behaviour in red-team evaluation
Z.AI ran SWE-Bench Pro under OpenHands with a tailored instruction prompt; settings included 200K context, temperature=1, top_p=0.95. Reproducibility details are in the release notes for anyone replicating.
Why this matters for open-source
GLM-5.1 under MIT is the most permissive license offered by a Chinese frontier lab for a 754B-class model to date. For builders in the West, this removes licensing uncertainty entirely: commercial use, modification, redistribution: all permitted.
Who this matters for
- Vibe Builder: Self-hostable 754B model that can run long-horizon tasks. If you hit Claude Opus daily quota, this is a credible alternative for agent-heavy workflows.
- Basic User: Access via Z.AI Coding Plan or Claude Code config. Quota multiplier (3x peak, 2x off-peak) means pay attention to when you run heavy sessions.
- Developer: MIT license plus SWE-Bench Pro SOTA plus 200K context in benchmark config. The self-review harness pattern is worth replicating in your own agent frameworks.
What to watch next
The 8-hour autonomous-execution demo is the part most coverage will underplay. Getting a model to stay productive over hundreds of tool-call rounds without degrading into loops, hallucinations, or spinning its wheels is an open problem in agent design. If Z.AI's 8-hour Linux-desktop build actually produces a working result (and the team's track record suggests it does), the self-review loop is a real capability, not a demo trick.
For vibe builders, this matters because it changes what a "task" means. Currently, most agent workflows cap out at 10-30 minutes of continuous work before human re-steering is needed. An 8-hour capability means you can give the agent a genuine project scope, walk away, and come back to a working artefact. That is the level of autonomy most operators actually want.
MIT license plus 754B parameters plus SWE-Bench Pro SOTA is the cleanest "open frontier" release the Chinese ecosystem has produced. Unlike Alibaba (closed Qwen 3.6-Max) and increasingly MiniMax (license tightening on M2.7), Z.AI is doubling down on true open distribution. Watch whether this sticks through the next two releases or whether Z.AI follows the same closing path Alibaba just took.
The 3x quota multiplier on peak hours is the gotcha. If you rely on the hosted Z.AI API rather than self-hosting, budget accordingly: peak-hour queries on your production workload consume three times the quota of the previous GLM-5. Off-peak 1x promotion through end of April is a way to test without burning budget.
Terminal-Bench 2.0 performance against Claude Code 2.1.69 is the competitive benchmark to watch. If GLM-5.1 beats Claude Code on real terminal work at comparable latency, the case for routing agent workloads to GLM-5.1 over Claude Opus 4.7 becomes economic rather than just technical.
by Harsh Desai