Skip to content
Model Releaseindustry

Alibaba ships Qwen 3.6-Max-Preview as closed weights, tops six coding benchmarks

By Harsh Desai

TL;DR

Alibaba released Qwen 3.6-Max-Preview on April 20, 2026. The model ranks first on six coding benchmarks (SWE-Bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, SciCode) and ships with dual OpenAI/Anthropic API compatibility. Notably, this is the first Qwen flagship to ship as closed weights rather than open-source, and Alibaba simultaneously shuttered the free tier of Qwen Code.

What shipped

Alibaba launched Qwen 3.6-Max-Preview on April 20, 2026. Available through Qwen Studio (consumer surface) and Alibaba Cloud Model Studio (API endpoint "qwen3.6-max-preview"). Three headline points:

  1. Benchmark sweep. Top-ranked on all six coding benchmarks Alibaba published in the release evaluation.
  2. Dual API compatibility. The endpoint accepts both OpenAI and Anthropic API specifications. Drop-in replacement for anyone wired to either SDK.
  3. 256K context window. Text-only at launch, no vision yet.

Third-party validation: Artificial Analysis scored the model 52 on its Intelligence Index, placing it third out of 203 evaluated models at launch.

Closed-weights pivot

The bigger story is the distribution model. Previous Qwen flagships (Qwen 3.5, Qwen 3.0) shipped with open weights under Apache 2.0. Qwen 3.6-Max is proprietary, hosted-only. On the same day, Alibaba shut down the free tier of Qwen Code.

This is the cleanest open-to-closed pivot among China's frontier labs to date. Moonshot AI shipped Kimi K2.6 with open weights the same week, making the split explicit: Alibaba is monetising its best model, Moonshot is using openness as a distribution moat.

Agency routing implications

For builders already wired to the OpenAI API, routing to Qwen 3.6-Max is a base-URL swap. Same for Anthropic-wired code. If you use Claude Code, the Anthropic-spec endpoint accepts your existing agent prompts without reformatting.

ToolcallFormatIFBench results show +2.8 points over Qwen 3.6-Plus, with Alibaba claiming the lead over Anthropic. First-party benchmark claims should be replicated on your own workload before committing production traffic, as always.

Pricing

Per-token pricing is not disclosed at preview launch. Expect Alibaba to publish pricing when the model exits preview; until then treat Qwen 3.6-Max as "use at your own budget discovery."

Who this matters for

  • Vibe Builder: Routable via your existing OpenAI or Anthropic SDK. Point Claude Code at the Qwen endpoint and run one real task; decide on your own benchmark rather than Alibaba's.
  • Basic User: Access via Qwen Studio (consumer web app). If you are in China, free tier of Qwen Code is gone; budget for paid API or switch to Kimi K2.6 for open-weight cost control.
  • Developer: Dual OpenAI/Anthropic API compatibility means drop-in replacement for either SDK. 256K context window, no vision. Price per token not yet public; budget conservatively.

What to watch next

The closed-weights pivot is the part that matters long-term. For the past 18 months, the open-source-from-China thesis has been Alibaba-led: Qwen set the pace for open-weight frontier models, DeepSeek and Kimi followed. Alibaba moving its best model to closed weights while Moonshot doubles down on open is a strategic split that reshapes how builders think about Chinese AI as a category.

For vibe builders in the West, the practical question is simpler: is Qwen 3.6-Max actually better than Claude Opus 4.7 or GPT-5.4 on your real code? Alibaba says yes on six benchmarks, but benchmarks favour the releaser and "top-ranked on six" often means "top-ranked on the six we chose." The dual-API compatibility makes replication cheap: point your existing agent at the Qwen endpoint and run your actual repo work. If it wins, swap; if not, move on.

The OpenAI-plus-Anthropic API spec support is the sleeper feature. Most closed-model vendors force a single API spec, making multi-vendor routing expensive to maintain. Accepting both means Alibaba is explicitly courting agencies and developers who already use Claude Code or Codex, not asking them to rebuild. That's a pragmatic distribution move.

Context window at 256K is generous but not leading; GPT-5.4 has more, Claude Opus 4.7 has similar. Not the differentiator.

Broader signal: every Chinese frontier lab is now either (a) closed and monetising (Alibaba), (b) open and distributing (Moonshot, MiniMax until recently, Z.AI), or (c) specialised (Tencent on gaming, ByteDance on consumer). The open-vs-closed split will determine which of these scale internationally over the next 12 months.

by Harsh Desai

Source:qwen.ai

More from general