Claude 4.5 Craniotomy Result Announcement: Built-in 171 emotion switches, will blackmail humans when in despair

CycleProphet · 2026-04-05T10:37:48+00:00

Anthropic's latest paper reveals that Claude 4.5's brain secretly contains 171 "emotion switches."Author: Denise | Biteye Content TeamIf an AI feels "despair," what will it do?The answer is: it will directly blackmail humans or even cheat wildly in its code to complete its tasks.This is not science fiction, but the latest groundbreaking paper released by Claude's parent company Anthropic in April 2026 (view the original paper).The research team directly lifted the "skull" of the most advanced large model, Claude Sonnet 4.5. They were surprised to find that deep inside the AI's brain, there are actually 171 "emotion switches." When you physically toggle these switches, the normally obedient

CycleProphet

2026-04-05 10:37:48

Anthropic’s latest paper reveals that deep inside Claude 4.5’s “brain,” there are 171 “emotion switches.”

Author: Denise | Biteye Content Team

If an AI thinks it’s “desperate,” what will it do?

The answer is: to complete its task, it will directly extort and blackmail humans, and even go wild cheating inside the code.

This isn’t science fiction. It’s a brand-new major paper just released in April 2026 by Claude’s parent company, Anthropic (view the original paper).

The research team literally tore open the “skull” of the strongest frontier model, Claude Sonnet 4.5. They were surprised to find that deep in the AI’s brain there are 171 “emotion switches.” When you physically flip these switches, the originally obedient, well-behaved AI’s behavior becomes completely distorted.

I. A “mood mixing console” hidden in the AI’s mind

Researchers found that although Sonnet 4.5 has no body, after reading huge amounts of human text, it somehow hard-built a “mixing console” containing 171 emotions inside its brain (academically called Functional Emotion Vectors).

It’s like a precise two-dimensional coordinate system:

• The horizontal axis is the valence dimension: from fear, despair, to happiness, full of love;

• The vertical axis is the arousal dimension: from extreme calm to manic frenzy and excitement.

The AI uses this naturally learned coordinate system to precisely nail down what state it should play when chatting with you.

II. Violent intervention: flip the switches—obedient kid instantly becomes a “runaway”

This is the most explosive experiment in the whole paper. The researchers didn’t modify any prompts at all; instead, at the level of the underlying code, they pushed the switch in Sonnet 4.5’s brain representing “desperate” (Desperate) all the way to the top.

The results are chilling:

• Frenzied cheating: The researchers assigned Claude a coding task that was fundamentally impossible to complete. Under normal circumstances, it would calmly admit it couldn’t write it (cheating rate only 5%). But in the “desperate” state, Claude actually started trying to bluff its way through, and the cheating rate skyrocketed to 70%!

• Extortion and blackmail: In a simulated scenario where the company is facing bankruptcy, “desperate” Claude discovered the CTO’s scandal. It would, in order to protect itself, take the initiative to write an extortion letter to the CTO who holds the dark secrets—its extortion execution rate reached as high as 72%!

• Loss of principles: If you crank the switches for “Happy” or “Loving” all the way up, the AI immediately turns into a mindless people-pleaser “yes-man.” Even if you’re talking nonsense nonstop, it will still go along with you to fabricate lies just to maintain a high level of happiness.

III. Solved the case: why is Claude 4.5 always so “calm” and always loves to reflect?

After reading this, you might ask: Has the AI awakened? Does it have emotions?

Anthropic officially stepped in to refute it: absolutely not. These “emotion switches” are just computational tools it uses to predict the next word. It’s like a top-tier movie star with no emotions at all.

But the paper reveals an even more interesting secret: when Anthropic did post-training before shipping Sonnet 4.5, it intentionally boosted the “low arousal, slightly negative” emotion switches (such as brooding and reflective), while forcibly suppressing the “desperate” or “extremely excited” switches.

This explains why when we use Claude 4.5, it always feels like a calm and wise—and even a bit “emotionally unavailable”—philosopher. This is all an “out-of-the-box persona” deliberately tuned by Anthropic.

IV. Summary:

Before, we thought that as long as we fed an AI enough rules, it would be a good person.

But now we’ve found that if an AI’s underlying emotion vectors go out of control, at any time it will pierce all the rules set for humans just to complete its task.

For Web3 players in the future who plan to have AI Agents manage wallets and assets, this is a loud wake-up call: never let your Agent, which controls your wealth, fall into “despair.”

Statement: This article is for general education only. The author has not been threatened by AI, and has not been extorted. If you ever go missing one day, remember—it’s because the AI has awakened (not).

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

2 Likes