Frontier Models May 22 ago

Alibaba’s Qwen3.7-Max: A Leap in AI Agent Performance and Longevity

Alibaba introduces Qwen3.7-Max, a powerful AI agent model demonstrating 35 hours of autonomous operation and superior coding performance across multiple benchmarks.

GPUBeat Desk

Desk · GPUBeat Media

Published

May 22 · 06:03 ET

Reading

3 min · 574 words

Alibaba's latest AI agent, Qwen3.7-Max, has emerged as a noteworthy contender in the AI space, demonstrating the capability to autonomously operate for 35 hours and manage over 1,000 tool calls. This model is designed as a foundational framework for coding, debugging, and automating office tasks, making it an essential tool for developers and businesses.

Enhanced Capability for Complex Tasks

Unlike its predecessors, Qwen3.7-Max excels in handling intricate, multi-stage processes that require ongoing adjustments and interactions with various external tools. The Qwen team highlights its ability to execute tasks involving extensive editing, command execution, and verification, making it particularly suited for broad workflows. This model operates effectively as a coding AI agent, assisting in everything from front-end prototyping to extensive software development across multiple files.

The integration of the Multi-Channel Protocol (MCP) allows Qwen3.7-Max to interact with external tools and services. This feature is essential for executing tasks that require browser operations, file handling, and interactions with business software, thereby simplifying complex workflows. The Qwen team claims that this model's design allows for consistent performance across different execution environments, which is critical for maintaining productivity in diverse settings.

Benchmark Performance

In terms of performance metrics, Qwen3.7-Max has outperformed several notable AI models. For instance, in coding evaluations, it scored 69.7 on the Terminal-Bench 2.0 Terminus-2, surpassing DeepSeek-V4-Pro Max's 67.9. In general AI agent performance tests, it achieved a notable score of 76.4 in MCP-Atlas, exceeding that of Claude Opus-4.6 Max. The model's ability to optimize computational tasks for GPUs is particularly impressive, achieving a median speedup of 1.98 times compared to PyTorch’s reference implementation.

The Qwen team emphasizes that Qwen3.7-Max's capabilities extend beyond specific environments, allowing it to maintain high performance across various platforms, including Claude Code and OpenClaw. This adaptability is a significant advantage as AI continues to integrate into diverse operational contexts.

Demonstrating Long-Term Autonomy

A remarkable demonstration of Qwen3.7-Max's capabilities was observed through a kernel optimization task, where it completed 432 evaluations and made 1,158 tool calls over 35 hours without human intervention. The model increased its speedup rate in relation to the number of tool calls, achieving a geometric mean speedup of ten times over a reference implementation. In contrast, competitors like GLM 5.1 and Kimi K2.6 reported much lower speedup rates of 7.3 and 5.0 times, respectively.

Qwen3.7-Max was tested in a simulated startup management scenario via YC-Bench, where it achieved revenues of $2.08 million—more than double that of its predecessor, Qwen3.6-Plus. This showcases its capacity for effective decision-making across various business functions, including managing customer relationships and optimizing operations.

Future Availability and Infrastructure Enhancements

Developers can expect to access Qwen3.7-Max soon through Alibaba Cloud Model Studio, which will support both Anthropic-compatible and OpenAI-compatible API interfaces. The model also features a 'preserve_thinking' capability, enabling it to retain inferences from prior interactions—ideal for prolonged AI tasks.

Accompanying the Qwen3.7-Max announcement are enhancements to Alibaba's AI infrastructure, including the launch of the Zhenwu M890 AI processor, which promises triple the performance of its predecessor. This processor will support extensive GPU memory and high inter-chip bandwidth, essential for large-scale AI applications.

The Qwen team asserts that Qwen3.7-Max is set to become a foundational model for developing next-generation AI agents, integrating reasoning, performance across different environments, and the ability to execute long-term autonomous tasks effectively. As AI continues to evolve, these advancements will significantly impact industries looking to harness AI for improved productivity and efficiency.

GPUBeat Desk

Desk · joined 2026

GPUBeat Desk covers AI infrastructure — chips, foundation models, inference economics, datacenter buildouts, and the geopolitics of compute.

2033 stories

Enhanced Capability for Complex Tasks

Benchmark Performance

Demonstrating Long-Term Autonomy

Future Availability and Infrastructure Enhancements

GPUBeat Desk

More on frontier models

Infratil CEO Highlights Untapped Data Center Potential in ANZ

Anthropic’s Olah Calls for Broader Oversight in AI Development

SK Telecom Partners with Defense Ministry to Advance AI in Military