Z.AI releases open-weight GLM-5.1 754B model for 8-hour agent tasks
TL;DR
Z.AI released GLM-5.1 on April 8, 2026: a 754B-parameter open-weight model under MIT license, scoring 58.4 on SWE-Bench Pro (SOTA) with an 8-hour autonomous Linux-build demo.
What changed
Z.AI released GLM-5.1 on April 8, 2026 with 754 billion parameters under MIT license. Benchmarks: SWE-Bench Pro 58.4 (SOTA), strong Terminal-Bench 2.0 results, NL2Repo leadership over GLM-5, and improved cybersecurity-suite agentic behaviour. SWE-Bench Pro was run under OpenHands with 200K context, temperature=1, top_p=0.95. GLM-5 was deprecated April 20. The headline demo wrapped GLM-5.1 in a self-review harness for an 8-hour continuous Linux-desktop build.
Why it matters
MIT licensing on a 754B-class model removes commercial-use uncertainty entirely: modification and redistribution are permitted. The agentic-first design targets long-horizon tool-use sequences rather than chat, with the self-review loop framed as the architectural distinction from GLM-5. SWE-Bench Pro SOTA at 58.4 sets the open-frontier reference point, and the reproducibility settings allow you to validate the numbers against your own harness.
What to watch for
Quota economics on the hosted API: GLM-5.1 consumes 3x during peak (14:00-18:00 UTC+8) and 2x off-peak, with a promotional 1x off-peak through end of April. Update your model name to GLM-5.1 in Claude Code or your harness of choice. Compare Terminal-Bench 2.0 head-to-head against Claude Code 2.1.69 on your real workloads before routing production traffic. Some NIM enterprise integrations have had transition friction worth verifying.
Who this matters for
- Developers: MIT license plus 754B parameters plus SWE-Bench Pro SOTA at 58.4 plus 200K context in benchmark config. The self-review harness pattern across hundreds of tool-call rounds is worth replicating in your own agent frameworks.
Harsh’s take
The 8-hour autonomous-execution demo is the part most coverage will underplay. Getting a model to stay productive over hundreds of tool-call rounds without degrading into loops, hallucinations, or spinning its wheels is an open problem in agent design. If Z.AI's 8-hour Linux-desktop build actually produces a working result, the self-review loop is a real capability and not a demo trick. The implication for long-horizon agent harnesses is significant: cap-out points move from 30 minutes to multi-hour runs.
MIT license plus 754B parameters plus SWE-Bench Pro SOTA is the cleanest open-frontier release the Chinese ecosystem has produced. Watch whether this sticks through the next two releases or whether Z.AI follows the closing path Alibaba just took with Qwen 3.6-Max. Terminal-Bench 2.0 performance against Claude Code 2.1.69 is the head-to-head that determines whether routing agent workloads to GLM-5.1 over Claude Opus 4.7 becomes economic rather than just technically interesting.
by Harsh Desai
About Z.ai
View the full Z.ai page →All Z.ai updatesMore AI news
- Daily RoundupGoogle magenta-realtime-2 and gemma-4-12B trend on Hugging Face plus new Replicate characters and Seedance video on Fal
Google placed two new models at the top of Hugging Face trends while small character generators appeared on Replicate and ByteDance dropped a faster reference-to-video option on Fal, giving builders quicker access to audio, multimodal, and video tools.
- Daily RoundupVercel Sandbox drives, Gemma 4 QAT, and NVIDIA models trending on Hugging Face (agent tools today)
Vendors added persistent storage and quantized models while new text-to-speech and image models appeared on Hugging Face and Fal, with fresh agent tools listed on Product Hunt.
- Weekly DigestCursor enterprise orgs, Claude Code Bedrock auto mode, Codex Sites preview for quick deploys
Cursor added team management and design tools while Claude Code and Codex rolled out cloud integrations, safety checks, and deployment features across CLI and apps from 29 May to 5 June 2026.