Z.AI releases open-weight GLM-5.1 754B model for 8-hour agent tasks

By Harsh Desai8 April 2026

TL;DR

Z.AI released GLM-5.1 on April 8, 2026: a 754B-parameter open-weight model under MIT license, scoring 58.4 on SWE-Bench Pro (SOTA) with an 8-hour autonomous Linux-build demo.

What changed

Z.AI released GLM-5.1 on April 8, 2026 with 754 billion parameters under MIT license. Benchmarks: SWE-Bench Pro 58.4 (SOTA), strong Terminal-Bench 2.0 results, NL2Repo leadership over GLM-5, and improved cybersecurity-suite agentic behaviour. SWE-Bench Pro was run under OpenHands with 200K context, temperature=1, top_p=0.95. GLM-5 was deprecated April 20. The headline demo wrapped GLM-5.1 in a self-review harness for an 8-hour continuous Linux-desktop build.

Why it matters

MIT licensing on a 754B-class model removes commercial-use uncertainty entirely: modification and redistribution are permitted. The agentic-first design targets long-horizon tool-use sequences rather than chat, with the self-review loop framed as the architectural distinction from GLM-5. SWE-Bench Pro SOTA at 58.4 sets the open-frontier reference point, and the reproducibility settings allow you to validate the numbers against your own harness.

What to watch for

Quota economics on the hosted API: GLM-5.1 consumes 3x during peak (14:00-18:00 UTC+8) and 2x off-peak, with a promotional 1x off-peak through end of April. Update your model name to GLM-5.1 in Claude Code or your harness of choice. Compare Terminal-Bench 2.0 head-to-head against Claude Code 2.1.69 on your real workloads before routing production traffic. Some NIM enterprise integrations have had transition friction worth verifying.

Who this matters for

Developers: MIT license plus 754B parameters plus SWE-Bench Pro SOTA at 58.4 plus 200K context in benchmark config. The self-review harness pattern across hundreds of tool-call rounds is worth replicating in your own agent frameworks.

Harsh’s take

The 8-hour autonomous-execution demo is the part most coverage will underplay. Getting a model to stay productive over hundreds of tool-call rounds without degrading into loops, hallucinations, or spinning its wheels is an open problem in agent design. If Z.AI's 8-hour Linux-desktop build actually produces a working result, the self-review loop is a real capability and not a demo trick. The implication for long-horizon agent harnesses is significant: cap-out points move from 30 minutes to multi-hour runs.

MIT license plus 754B parameters plus SWE-Bench Pro SOTA is the cleanest open-frontier release the Chinese ecosystem has produced. Watch whether this sticks through the next two releases or whether Z.AI follows the closing path Alibaba just took with Qwen 3.6-Max. Terminal-Bench 2.0 performance against Claude Code 2.1.69 is the head-to-head that determines whether routing agent workloads to GLM-5.1 over Claude Opus 4.7 becomes economic rather than just technically interesting.

by Harsh Desai

Source:z.ai

About Z.ai

View the full Z.ai page →All Z.ai updates

Go deeper

Read our Z.ai review →

More AI news

Feature22 July 2026
Cursor unifies plugin management with plugin canvases in Customize page
Cursor introduces a unified Customize page for managing plugins, skills, MCPs, and hooks. It features marketplace leaderboards, plugin canvases, and third-party repo imports.
Deprecation22 July 2026
ChatGPT deprecates Atlas browser agent for built-in features
OpenAI sunsets Atlas on August 9, 2026 as agentic browser capabilities integrate into ChatGPT and Codex, with export support for bookmarks, cookies, and passwords.
Daily Roundup22 July 2026
Gemini 3.6 Flash on AI Gateway, Laguna S 2.1, and agent tools shipping today
Vercel added Gemini 3.6 Flash and Laguna S 2.1 to AI Gateway while new agent platforms and hardware updates rolled out, giving builders faster model testing and SMBs simpler ways to run AI tasks without extra setup.