Local AI model running tool Ollama, announced on the X platform on 4/24, will add DeepSeek’s V4-Flash model released the day before by Chinese AI startup DeepSeek into the Ollama Cloud service. The inference host is located in the United States, and it provides three sets of one-click commands so developers can directly plug V4-Flash into mainstream AI software development workflows such as Claude Code, OpenClaw, and Hermes.

deepseek-v4-flash is now available on Ollama’s cloud! Hosted in the US. Try it with Claude Code: ollama launch claude –model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw –model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes…

— ollama (@ollama) April 24, 2026

DeepSeek V4 Preview: two sizes, 1M context

According to a release announcement from DeepSeek’s official API documentation on 4/24, the DeepSeek-V4 Preview is being open-sourced in two sizes simultaneously:

Model Total parameters Active parameters Positioning DeepSeek-V4-Pro 1.6 trillion 49 billion Targeting a closed-source flagship DeepSeek-V4-Flash 1M 130 billion Fast, efficient, and low-cost

Both use a Mixture-of-Experts (MoE) architecture and natively support a 1 million tokens long context. In its announcement, DeepSeek stated: “1M context is now the default value for all DeepSeek official services.”

Architecture innovation: DSA sparse attention + token-wise compression

The core architectural improvements in the V4 series include:

Token-wise compression together with DSA (DeepSeek Sparse Attention)—significantly reducing the cost of inference computation and KV cache memory under ultra-long context

Compared with V3.2, in a 1 million tokens context scenario, V4-Pro requires only 27% of FLOPs for per-token inference, and the KV cache requires only 10%

Supports switching between Thinking and Non-Thinking dual modes, corresponding to different task-depth reasoning needs

At the API level, it is compatible with both OpenAI ChatCompletions and Anthropic APIs specifications, reducing migration costs for existing Claude/GPT clients.

Ollama Cloud’s three sets of one-click startup commands

Ollama’s official model page provides a cloud inference service using the model identifier deepseek-v4-flash:cloud. Developers can use the following three sets of commands to directly connect V4-Flash into existing AI software development workflows:

Workflow Command Claude Code ollama launch claude --model deepseek-v4-flash:cloud OpenClaw ollama launch openclaw --model deepseek-v4-flash:cloud Hermes ollama launch hermes

Worth noting is the signal of “US-hosted.” For enterprises and Western developers, the biggest concern when using Chinese open-source models is data being sent back to China. Ollama chooses to place the inference layer of V4-Flash in the United States, meaning the prompt and code content do not leave US legal jurisdiction, reducing friction in compliance and data sovereignty.

Why this matters to the AI industry

By connecting three ecosystems that were previously independent—DeepSeek V4-Flash, Ollama Cloud, and Claude Code—three layers of meaning are created:

Cost pathway: With V4-Flash’s 13 billion active parameters far smaller than GPT-5.5 (input $5, output $30 per million tokens) and flagship models like Claude Opus 4.7, for use cases such as small- and medium-sized agent tasks, batch summarization, and test automation, unit costs are expected to drop significantly

A geopolitical-risk intermediary layer: With Ollama as a US-registered intermediary inference layer, it enables enterprise users of Chinese-native models to avoid the concern of “sending data directly to DeepSeek’s Beijing servers,” which is a practical solution for the international spread of open-source models

Instant developer switching: Users of Claude Code and OpenClaw can switch models with a single line in the command line, without changing prompt structure or IDE settings. For scenarios like “multi-model regression testing” and “cost-sensitive batch tasks,” this is a genuine boost to productivity

Tied in with earlier DeepSeek news

This V4 release and the rapid integration with Ollama Cloud occur amid a backdrop where DeepSeek is currently negotiating its first round of external financing, with a valuation of $20 billion. V4 is a key product proof during DeepSeek’s capitalization process; using an open-source strategy plus fast diffusion with international hosting partners is its speed strategy before it establishes an overwhelming developer ecosystem. For OpenAI and Anthropic, an open-source replacement model that can be switched with one line inside Claude Code is a new variable in the race for control of agent workflows.

This article DeepSeek V4-Flash lands on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration first appeared on 链新闻 ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

AI Trading Agent Platform Fere AI Raises $1.3M, Led by Ethereal Ventures

AI Agent AI Industry News

Gate News message, April 25 — AI-powered digital asset trading agent platform Fere AI announced the completion of a $1.3 million funding round, led by Ethereal Ventures, with Galaxy Vision Hill and Kosmos Ventures participating. The platform supports cross-chain networks including Ethereum,

GateNews40m ago

China's NDRC Directs AI Firms Including Moonshot and StepFun to Reject U.S. Capital Without Approval

AI Industry News

Gate News message, April 25 — China's National Development and Reform Commission (NDRC) has directed multiple AI companies to reject U.S. capital in recent weeks unless they obtain explicit government approval, according to Bloomberg citing informed sources. Moonshot AI and StepFun, both preparing f

GateNews1h ago

U.S. Judge Dismisses Musk's Fraud Claims Against OpenAI and Altman

AI Industry News

Gate News message, April 25 — A U.S. judge has dismissed fraud allegations filed by Elon Musk against OpenAI and OpenAI co-founder Sam Altman in his ongoing lawsuit against the company. The court has ruled that the fraud claims will not proceed, though the judge has scheduled additional hearings

GateNews1h ago

Why did Intel’s good news drive Nvidia’s stock to surge?

Stocks AI Industry News

Intel’s quarterly earnings EPS came in at $0.29 and revenue at $13.6 billion, both beating expectations. Momentum in data center and computing equipment updates has rebounded, boosting confidence in semiconductor and AI demand. This positive news lifted Nvidia’s stock price by about 4.9% at midday. The market believes underlying compute demand remains strong, reducing concerns about AI becoming overhyped, and supporting Nvidia’s long-term growth outlook. Positive sector linkages are appearing at the same time as the two companies’ competition.

ChainNewsAbmedia6h ago

China and US Face AI Showdown Over Model Distillation Accusations and Investment Restrictions

AI Industry News

Gate News message, April 24 — China has rejected U.S. accusations that its tech giants are exploiting American AI technology through industrial-scale distillation, as both countries set up for a major collision over AI development and investment control. The Trump administration is preparing to

GateNews8h ago

Alphabet to Invest Up to $40 Billion in Anthropic, Boosting AI Competition

Stocks AI Industry News

Gate News message, April 24 — Alphabet, Google's parent company, plans to invest up to $40 billion in AI developer Anthropic, comprising $10 billion upfront and $30 billion in additional funding contingent on achieving certain performance milestones. This investment follows Alphabet's previous 14% s

GateNews10h ago

Comment

0/400

No comments