Qwen3.7-Plus is Alibaba's bid to turn multimodal AI into a full-blown autonomous agent

Alibaba’s Qwen team has launched Qwen3.7-Plus, a multimodal agent model designed to integrate visual perception, graphical user interface operation, and coding capabilities into a single autonomous loop. Demonstrations show the model independently building a vocabulary learning application, generating over 10,000 lines of code through 1,000 agent calls in eleven hours. While the model currently leads on-screen understanding within Alibaba’s internal benchmarks, overall performance remains mixed. The offering is proprietary with no open weights available and is priced significantly lower than Western frontier models.

This development matters because it signals a strategic shift from simple chatbots to complex digital workers capable of executing multi-step tasks without human intervention. By combining vision and code generation, the model addresses the critical gap between perceiving a screen and manipulating it to achieve a goal. The aggressive pricing and closed-source nature suggest Alibaba aims to capture enterprise markets that require reliability and specific integration rather than community-driven transparency. This positions the model as a practical tool for businesses seeking to automate software development and interface management efficiently.

Qwen3.7-Plus merges visual input with code generation to perform autonomous tasks like building apps.
The model is proprietary and priced lower than comparable Western frontier models.
Performance is strong in screen understanding but remains mixed in broader capability assessments.

Source Read original →