DeepSeek quietly pushed a three-mode test on April 8: Fast, Expert, and Vision—three parallel streams—seen by the community as the final warm-up before the official launch of V4.
(Background: DeepSeek V4 refuses NVIDIA and goes for Huawei! Alibaba, ByteDance “jumping,” Tencent rushing to buy Ascend 950PR chips)
(Additional context: DeepSeek V4 announces it will abandon NVIDIA! Where does China’s AI “compute independence” breakthrough campaign stand?)
Table of Contents
Toggle
Early in the morning on April 8, DeepSeek’s website and app simultaneously pushed an update. The interface now has three mode options. This is not a complete, full-featured official rollout, but a preliminary test for some users. However, as soon as the message came out, the community immediately linked it to V4’s release timeline.
The division of labor among the three modes is quite clear:
Fast Mode (Fast Mode) is the default option, aimed at everyday conversations and real-time responses. It uses a lightweight, low-latency model with no usage cap limits. Attachment support is limited to text extraction only; it does not handle images or voice.
Expert Mode is positioned for complex reasoning tasks and supports a deep thinking mode. Community testing shows that a single inference can trigger more than 500 seconds of thinking time. This mode requires waiting during peak hours, and it does not support attachment and voice uploads. It is still in the testing stage and is not available to all users yet.
Vision Mode (Vision Mode) is the most symbolic one among the three. This is DeepSeek’s first official support for visual input on the consumer side. Its multimodal capability is no longer just a technical option on the API layer—it is directly aimed at general users.
The overall logic is: route compute consumption by task type. High-frequency, low-demand work goes through the fast lane; high-compute reasoning goes through the expert lane; and image-and-text input goes through the vision lane. This design isn’t new in itself, but DeepSeek is the first among China’s leading models to do it at the consumer product level.
Discussions in the community about this test quickly focused on one technical point.
Some test users found that the answer quality in Expert mode improves only slightly compared with Fast mode—the gap isn’t as big as people expected. More importantly, some users directly asked the model itself, and the response was: the two modes share the same underlying architecture, and the difference mainly comes from adjustments to the system prompt.
If that’s true, then the essence of “Expert mode” is more like a calibrated system prompt than an independent reasoning model.
DeepSeek has not formally responded to this claim. From an external perspective, there are two possible interpretations: one is that this is only a temporary configuration during the rollout stage, and the real model tiering won’t be activated until after V4 launches; the other is that the purpose of the tiered design was never a model-level switching in the first place. Instead, it aims to control compute consumption through different reasoning budgets and system configurations, allowing more users to use the service at the same time.
The three-mode interface itself is a user-experience upgrade. But the V4 behind it—the weight of this update—matters more.
The DeepSeek team has confirmed that V4 is delayed to April, mainly due to Huawei Ascend chip deep integration work. The known technical specifications are quite aggressive: a 1 trillion parameter scale, an 81% pass rate on the SWE-bench coding ability test, an API price of $0.30/MTok, and a self-developed long-term memory technology called Engram: a conditional memory mechanism that allows the model to preserve user preferences and context across conversations.
But the most worth observing about V4 is its underlying choice of compute.
If V4 is truly fully implemented on China-made chips such as Huawei Ascend and Cambricon, it will become the first mainstream large model at consumer scale that fully bypasses NVIDIA’s CUDA ecosystem. (That said, because we know a large number of NVIDIA chips have been smuggled into China, the underlying reality is even more complicated.)