DeepSeek launches “Expert Mode” and “Vision Mode”—is this the last warm-up before the official release of V4?

動區BlockTempo

2026-04-08 11:39:28

DeepSeek quietly pushed a three-mode test on April 8: Fast, Expert, and Vision—three parallel streams—seen by the community as the final warm-up before the official launch of V4.
(Background: DeepSeek V4 refuses NVIDIA and goes for Huawei! Alibaba, ByteDance “jumping,” Tencent rushing to buy Ascend 950PR chips)
(Additional context: DeepSeek V4 announces it will abandon NVIDIA! Where does China’s AI “compute independence” breakthrough campaign stand?)

Table of Contents

Toggle

Three-stream routing: Fast, Expert, Vision—each does its job
Controversy over Expert mode: is it an architecture difference, or prompt engineering?
The real meaning of V4: what if compute is truly decoupled?

Early in the morning on April 8, DeepSeek’s website and app simultaneously pushed an update. The interface now has three mode options. This is not a complete, full-featured official rollout, but a preliminary test for some users. However, as soon as the message came out, the community immediately linked it to V4’s release timeline.

Three-stream routing: Fast, Expert, Vision—each does its job

The division of labor among the three modes is quite clear:

Fast Mode (Fast Mode) is the default option, aimed at everyday conversations and real-time responses. It uses a lightweight, low-latency model with no usage cap limits. Attachment support is limited to text extraction only; it does not handle images or voice.

Expert Mode is positioned for complex reasoning tasks and supports a deep thinking mode. Community testing shows that a single inference can trigger more than 500 seconds of thinking time. This mode requires waiting during peak hours, and it does not support attachment and voice uploads. It is still in the testing stage and is not available to all users yet.

Vision Mode (Vision Mode) is the most symbolic one among the three. This is DeepSeek’s first official support for visual input on the consumer side. Its multimodal capability is no longer just a technical option on the API layer—it is directly aimed at general users.

The overall logic is: route compute consumption by task type. High-frequency, low-demand work goes through the fast lane; high-compute reasoning goes through the expert lane; and image-and-text input goes through the vision lane. This design isn’t new in itself, but DeepSeek is the first among China’s leading models to do it at the consumer product level.

Controversy over Expert mode: is it an architecture difference, or prompt engineering?

Discussions in the community about this test quickly focused on one technical point.

Some test users found that the answer quality in Expert mode improves only slightly compared with Fast mode—the gap isn’t as big as people expected. More importantly, some users directly asked the model itself, and the response was: the two modes share the same underlying architecture, and the difference mainly comes from adjustments to the system prompt.

If that’s true, then the essence of “Expert mode” is more like a calibrated system prompt than an independent reasoning model.

DeepSeek has not formally responded to this claim. From an external perspective, there are two possible interpretations: one is that this is only a temporary configuration during the rollout stage, and the real model tiering won’t be activated until after V4 launches; the other is that the purpose of the tiered design was never a model-level switching in the first place. Instead, it aims to control compute consumption through different reasoning budgets and system configurations, allowing more users to use the service at the same time.

The real meaning of V4: if compute is truly decoupled

The three-mode interface itself is a user-experience upgrade. But the V4 behind it—the weight of this update—matters more.

The DeepSeek team has confirmed that V4 is delayed to April, mainly due to Huawei Ascend chip deep integration work. The known technical specifications are quite aggressive: a 1 trillion parameter scale, an 81% pass rate on the SWE-bench coding ability test, an API price of $0.30/MTok, and a self-developed long-term memory technology called Engram: a conditional memory mechanism that allows the model to preserve user preferences and context across conversations.

But the most worth observing about V4 is its underlying choice of compute.

If V4 is truly fully implemented on China-made chips such as Huawei Ascend and Cambricon, it will become the first mainstream large model at consumer scale that fully bypasses NVIDIA’s CUDA ecosystem. (That said, because we know a large number of NVIDIA chips have been smuggled into China, the underlying reality is even more complicated.)

View Source

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments