Most AI models today are not built for sustained, multi-step autonomous execution. Tasks like running hundreds of iterative code modifications or chaining tool calls across hours without human intervention require a different kind of model architecture and training focus.
Two Preview Models Released Simultaneously
Alibaba’s Qwen team previewed two models simultaneously: Qwen3.7-Max-Preview and Qwen3.7-Plus-Preview. They ranked 13th globally in text capabilities and 16th in vision capabilities, respectively, according to LM Arena.
What is Qwen3.7-Max Designed For
Alibaba’s Qwen team described Qwen3.7-Max as its most advanced and comprehensive agent model to date. The model is proprietary and closed-weight, capable of handling coding and debugging, office workflow automation, and long-horizon tasks spanning hundreds or even thousands of steps.
Extended-Thinking Mode
Qwen3.7-Max is a reasoning model. The model generates a chain of thought first — an internal sequence of steps where it plans, checks its work, and corrects course before committing to a final answer. On interfaces like Qwen Chat, this shows up as a “Thinking” mode you can switch on to see the model’s reasoning trace.
Reasoning models produce significantly more output tokens than standard completions. When Artificial Analysis ran its Intelligence Index evaluation, Qwen3.7-Max generated about 97 million tokens, compared to an average of 24 million for models on that benchmark. For short or simple tasks, this overhead adds latency without improving output quality. For multi-step planning, code refactoring, or long agent chains, extended-thinking mode is where the model’s strength applies.
Context Window
The model features a 1M token context window, up from 256K on Qwen3.6 Max Preview. It supports text input and output only. Pricing has not yet been announced. A million-token context window can hold a full mid-sized code repository or a large stack of documents in a single request. Models often reason less reliably as the context window fills. Independent long-context testing for Qwen3.7-Max is not yet available.
Benchmark Results
Qwen3.7-Max scored 56.6 on the Artificial Analysis Intelligence Index, placing it fifth overall. That represents a 4.8-point gain over its predecessor Qwen3.6 Max Preview (51.8), and puts it ahead of Google’s Gemini 3.5 Flash (55.3). GPT-5.5 (60.2), Claude Opus 4.7 (57.3), and Gemini 3.1 Pro Preview (57.2) still lead the overall rankings.
The Intelligence Index v4.0 aggregates ten evaluations, including GDPval-AA, Terminal-Bench Hard, SciCode, AA-Omniscience, Humanity’s Last Exam, and GPQA Diamond.
Agentic Performance — Internal Test
In an internal Alibaba test on a new chip platform, the model autonomously performed more than 1,000 tool calls and iterative code modifications to optimize a key kernel. Alibaba claimed the process improved inference speed by roughly 10x compared with the previous version.
Marktechpost’s Visual Explainer
May 2026
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




