On Lunar New Year’s Eve, February 16th, Alibaba open-sourced the all-new generation large model Qwen3.5-Plus, which rivals Gemini 3 Pro in performance and tops the list of the world’s most powerful open-source models.
It is reported that Qwen3.5 has achieved a comprehensive overhaul of the underlying model architecture. The released Qwen3.5-Plus version has a total of 397 billion parameters, with only 17 billion active parameters, outperforming models with over a trillion parameters like Qwen3-Max. Deployment memory usage is reduced by 60%, inference efficiency is significantly improved, and maximum inference throughput can be increased up to 19 times. The API price for Qwen3.5-Plus is as low as 0.8 yuan per million tokens, only 1/18 of Gemini 3 Pro.
Unlike previous generations of Qwen large language models, Qwen3.5 has achieved an evolutionary leap from a pure text model to a native multimodal model. Qwen 3 was pre-trained on pure text tokens, while Qwen3.5 is pre-trained on a hybrid of visual and text tokens, with substantial additions of data in Chinese, English, multilingual, STEM, and reasoning tasks. This enables the large model with “eyes” to learn more intensive world knowledge and reasoning logic, achieving top performance comparable to the trillion-parameter Qwen3-Max with less than 40% of the parameters. It performs excellently across comprehensive benchmarks such as reasoning, programming, and agent intelligence. For example, Qwen3.5 scores 87.8 on the MMLU-Pro knowledge reasoning test, surpassing GPT-5.2; scores 88.4 on the challenging GPQA doctoral-level test, higher than Claude 4.5; and sets a new record of 76.5 on the instruction-following benchmark IFBench. In general agent evaluations like BFCL-V4 and search agent benchmarks like Browsecomp, Qwen3.5 outperforms Gemini 3 Pro and GPT-5.2.
Native multimodal training also brings a leap in Qwen3.5’s visual capabilities: in numerous authoritative assessments such as multimodal reasoning (MathVision), general visual question answering (RealWorldQA), text recognition and document understanding (CC_OCR), spatial intelligence (RefCOCO-avg), and video understanding (MLVU), Qwen3.5 consistently achieves top performance. In subject problem-solving, task planning, and physical space reasoning tasks, Qwen3.5 outperforms the specialized Qwen3-VL model, with significantly enhanced spatial localization and image reasoning abilities, resulting in more detailed and accurate reasoning analysis. In video understanding, Qwen3.5 supports direct input of videos up to 2 hours long (1 million tokens of context), suitable for long video content analysis and summarization. Additionally, Qwen3.5 has achieved native integration of visual understanding and coding capabilities; combined with image search and generative tools, it can convert hand-drawn interface sketches directly into usable front-end code, allowing a screenshot to locate and fix UI issues, making visual programming a true productivity tool.
Qwen3.5’s native multimodal training was efficiently conducted on Alibaba Cloud’s AI infrastructure. Through a series of technological innovations, the training throughput on mixed data such as text, images, and videos is nearly 100% comparable to that of pure text base models, greatly lowering the barrier for native multimodal training. Meanwhile, by employing carefully designed FP8 and FP32 precision strategies, the memory activation is reduced by about 50% when scaling training to hundreds of trillions of tokens, with a 10% speedup, further reducing training costs and improving efficiency.
Qwen3.5 also marks a new breakthrough from the agent framework to agent applications. It can autonomously operate smartphones and computers, efficiently complete daily tasks, support more mainstream apps and commands on mobile devices, and handle more complex multi-step operations on PCs, such as cross-application data organization and automation workflows, significantly improving operational efficiency. At the same time, the Qwen team has built an extensible asynchronous reinforcement learning framework for agents, which can accelerate end-to-end training by 3 to 5 times, and support plugin-based intelligent agents at a scale of millions.
It is reported that the Qwen3.5 series models are now integrated into the Qwen app and PC versions. Developers can download the new models from the Mofa Community and HuggingFace, or directly access API services via Alibaba Cloud Balian. Alibaba will soon continue to open-source different sizes and functionalities of the Qwen3.5 series models. The more powerful flagship model, Qwen3.5-Max, will also be released soon.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Alibaba releases the new generation of the base model Qianwen 3.5, topping the list of the world's most powerful open-source large models
On Lunar New Year’s Eve, February 16th, Alibaba open-sourced the all-new generation large model Qwen3.5-Plus, which rivals Gemini 3 Pro in performance and tops the list of the world’s most powerful open-source models.
It is reported that Qwen3.5 has achieved a comprehensive overhaul of the underlying model architecture. The released Qwen3.5-Plus version has a total of 397 billion parameters, with only 17 billion active parameters, outperforming models with over a trillion parameters like Qwen3-Max. Deployment memory usage is reduced by 60%, inference efficiency is significantly improved, and maximum inference throughput can be increased up to 19 times. The API price for Qwen3.5-Plus is as low as 0.8 yuan per million tokens, only 1/18 of Gemini 3 Pro.
Unlike previous generations of Qwen large language models, Qwen3.5 has achieved an evolutionary leap from a pure text model to a native multimodal model. Qwen 3 was pre-trained on pure text tokens, while Qwen3.5 is pre-trained on a hybrid of visual and text tokens, with substantial additions of data in Chinese, English, multilingual, STEM, and reasoning tasks. This enables the large model with “eyes” to learn more intensive world knowledge and reasoning logic, achieving top performance comparable to the trillion-parameter Qwen3-Max with less than 40% of the parameters. It performs excellently across comprehensive benchmarks such as reasoning, programming, and agent intelligence. For example, Qwen3.5 scores 87.8 on the MMLU-Pro knowledge reasoning test, surpassing GPT-5.2; scores 88.4 on the challenging GPQA doctoral-level test, higher than Claude 4.5; and sets a new record of 76.5 on the instruction-following benchmark IFBench. In general agent evaluations like BFCL-V4 and search agent benchmarks like Browsecomp, Qwen3.5 outperforms Gemini 3 Pro and GPT-5.2.
Native multimodal training also brings a leap in Qwen3.5’s visual capabilities: in numerous authoritative assessments such as multimodal reasoning (MathVision), general visual question answering (RealWorldQA), text recognition and document understanding (CC_OCR), spatial intelligence (RefCOCO-avg), and video understanding (MLVU), Qwen3.5 consistently achieves top performance. In subject problem-solving, task planning, and physical space reasoning tasks, Qwen3.5 outperforms the specialized Qwen3-VL model, with significantly enhanced spatial localization and image reasoning abilities, resulting in more detailed and accurate reasoning analysis. In video understanding, Qwen3.5 supports direct input of videos up to 2 hours long (1 million tokens of context), suitable for long video content analysis and summarization. Additionally, Qwen3.5 has achieved native integration of visual understanding and coding capabilities; combined with image search and generative tools, it can convert hand-drawn interface sketches directly into usable front-end code, allowing a screenshot to locate and fix UI issues, making visual programming a true productivity tool.
Qwen3.5’s native multimodal training was efficiently conducted on Alibaba Cloud’s AI infrastructure. Through a series of technological innovations, the training throughput on mixed data such as text, images, and videos is nearly 100% comparable to that of pure text base models, greatly lowering the barrier for native multimodal training. Meanwhile, by employing carefully designed FP8 and FP32 precision strategies, the memory activation is reduced by about 50% when scaling training to hundreds of trillions of tokens, with a 10% speedup, further reducing training costs and improving efficiency.
Qwen3.5 also marks a new breakthrough from the agent framework to agent applications. It can autonomously operate smartphones and computers, efficiently complete daily tasks, support more mainstream apps and commands on mobile devices, and handle more complex multi-step operations on PCs, such as cross-application data organization and automation workflows, significantly improving operational efficiency. At the same time, the Qwen team has built an extensible asynchronous reinforcement learning framework for agents, which can accelerate end-to-end training by 3 to 5 times, and support plugin-based intelligent agents at a scale of millions.
It is reported that the Qwen3.5 series models are now integrated into the Qwen app and PC versions. Developers can download the new models from the Mofa Community and HuggingFace, or directly access API services via Alibaba Cloud Balian. Alibaba will soon continue to open-source different sizes and functionalities of the Qwen3.5 series models. The more powerful flagship model, Qwen3.5-Max, will also be released soon.