Skip to content
Giant Antique Postage Stamp style editorial illustration for the news article: Z.AI releases open-weight GLM-5.1 754B model for 8-hour agent tasks
Model ReleaseZ.aiDeveloper

Z.AI releases open-weight GLM-5.1 754B model for 8-hour agent tasks

By Harsh Desai
Share

TL;DR

Z.AI released GLM-5.1 on April 8, 2026: a 754B-parameter open-weight model under MIT license, scoring 58.4 on SWE-Bench Pro (SOTA) with an 8-hour autonomous Linux-build demo.

What changed

Z.AI released GLM-5.1 on April 8, 2026 with 754 billion parameters under MIT license. Benchmarks: SWE-Bench Pro 58.4 (SOTA), strong Terminal-Bench 2.0 results, NL2Repo leadership over GLM-5, and improved cybersecurity-suite agentic behaviour. SWE-Bench Pro was run under OpenHands with 200K context, temperature=1, top_p=0.95. GLM-5 was deprecated April 20. The headline demo wrapped GLM-5.1 in a self-review harness for an 8-hour continuous Linux-desktop build.

Why it matters

MIT licensing on a 754B-class model removes commercial-use uncertainty entirely: modification and redistribution are permitted. The agentic-first design targets long-horizon tool-use sequences rather than chat, with the self-review loop framed as the architectural distinction from GLM-5. SWE-Bench Pro SOTA at 58.4 sets the open-frontier reference point, and the reproducibility settings allow you to validate the numbers against your own harness.

What to watch for

Quota economics on the hosted API: GLM-5.1 consumes 3x during peak (14:00-18:00 UTC+8) and 2x off-peak, with a promotional 1x off-peak through end of April. Update your model name to GLM-5.1 in Claude Code or your harness of choice. Compare Terminal-Bench 2.0 head-to-head against Claude Code 2.1.69 on your real workloads before routing production traffic. Some NIM enterprise integrations have had transition friction worth verifying.

Who this matters for

  • Developers: MIT license plus 754B parameters plus SWE-Bench Pro SOTA at 58.4 plus 200K context in benchmark config. The self-review harness pattern across hundreds of tool-call rounds is worth replicating in your own agent frameworks.

Harshs take

The 8-hour autonomous-execution demo is the part most coverage will underplay. Getting a model to stay productive over hundreds of tool-call rounds without degrading into loops, hallucinations, or spinning its wheels is an open problem in agent design. If Z.AI's 8-hour Linux-desktop build actually produces a working result, the self-review loop is a real capability and not a demo trick. The implication for long-horizon agent harnesses is significant: cap-out points move from 30 minutes to multi-hour runs.

MIT license plus 754B parameters plus SWE-Bench Pro SOTA is the cleanest open-frontier release the Chinese ecosystem has produced. Watch whether this sticks through the next two releases or whether Z.AI follows the closing path Alibaba just took with Qwen 3.6-Max. Terminal-Bench 2.0 performance against Claude Code 2.1.69 is the head-to-head that determines whether routing agent workloads to GLM-5.1 over Claude Opus 4.7 becomes economic rather than just technically interesting.

by Harsh Desai

Source:z.ai

About Z.ai

View the full Z.ai page →All Z.ai updates

More AI news

Everything AI. One email.
Every Monday.

New tools. Model launches. Plugins. Repos. Tactics. The moves the sharpest builders are making right now, before everyone else.

No spam. Unsubscribe anytime.