Grok 4.1 New Release: AI Illusion Reduction by 3 Times, Comprehensive Upgrade in Emotional Understanding and Creative Writing

ChainNewsAbmedia

2025-11-18 13:24:04

xAI announced on 11/17 that its latest model Grok 4.1 is now officially available to all users, including grok.com, Twitter (X), as well as the iOS and Android apps. xAI stated that this upgrade focuses on “real-world usability,” including stronger emotional understanding, more natural personality expression, higher creativity, and lower hallucination rates, while retaining the reasoning capabilities and stability of the previous Grok 4.

The secret test win rate is nearly 65%, Grok 4.1 confirmed to be fully online.

xAI will conduct a two-week secret test from 11/1 to 11/14, introducing the Grok 4.1 beta version in a small proportion to the real traffic of Grok.com, X, and the mobile app, and directly comparing it with the previous model Grok 4 through “blind testing.”

xAI indicates that during blind testing, Grok 4.1 has a preference index of 64.78% on real traffic, clearly surpassing Grok 4, and announced that it will be officially available to all users starting from 11/17. It also stated that from now on, all users will be able to use Grok 4.1. As long as users enable Auto mode, it will automatically use Grok 4.1, and users can also select it themselves from the model menu.

Grok 4.1 Three Major Technical Highlights at a Glance

Grok 4.1 Technical Highlights 1: Brand new reinforcement learning architecture, making responses more natural and understanding humans better.

The core upgrade of Grok 4.1 comes from using the “large-scale reinforcement learning infrastructure” that is the same as Grok 4, but this time it further introduces new methods to allow the model to automatically optimize responses on a larger scale. This training mainly focuses on the quality of unverifiable responses, such as tone, character consistency, emotional interaction, and understanding intent, which cannot be directly scored based on data alone.

To solve this problem, xAI uses a “cutting-edge reasoning model” as the reward model (Reward Model), allowing these AI with deep reasoning capabilities to automatically evaluate Grok 4.1's responses and learn through a large amount of comparisons what constitutes a better and more human-like expected answer, and make adjustments. As a result, Grok 4.1 has significantly improved in tone, personality, emotion, and interaction naturalness, while maintaining its original reasoning ability and stability.

Grok 4.1 Technical Highlights 2: Blind testing has comprehensively topped the rankings, with significant upgrades in emotional understanding and creativity.

xAI also announced multiple test results, showing that Grok 4.1 has significantly improved in various capability tests.

In the LMArena global blind test battle platform:

Grok 4.1 Thinking ranks first in the world with an Elo of 1483.

Grok 4.1 Non-Thinking ranks second with 1465 Elo, even surpassing the “complete reasoning model” of other models.

Emotion Understanding Test ( EQ-Bench 3): Utilizing 45 high-difficulty scenarios and 3 rounds of interaction, rated by Claude Sonnet 3.7. Grok 4.1 shows significant improvement in empathy, emotional insight, and interpersonal understanding.

Creative Writing Ability (Creative Writing v3): In the writing test of 32 questions × 3 rounds, Grok 4.1 scored higher in creative style, narrative quality, and story fluency, with the official presentation of multiple sample responses.

Overall, Grok 4.1 has not only improved in reasoning ability but also shows significant upgrades in “emotional interaction” and “creative capability.”

From the chart, it can be seen that Grok 4.1 ranks in the top three for integrated ranking of reasoning models, emotional understanding, and creative writing.

( Note: Elo refers to the power score of Grok 4.1 on the global blind testing platform LMArena, using the Elo ranking system originally designed for chess to evaluate the quality of model responses. )

Grok 4.1 Technical Highlight 3: AI hallucination reduced by 3 times, information sources more reliable

For common information retrieval questions, xAI particularly emphasizes that the hallucination rate of Grok 4.1 has significantly decreased. Previously, Gork's rapid mode (Non-Reasoning) was prone to hallucinations due to insufficient reasoning depth, but in the post-training of 4.1, xAI explicitly addressed this issue. xAI's validation methods include:

Conduct sampling tests based on questions that users actually ask in real situations and appear on the platform.

Compare the differences in responses between Grok 4.1 and the older model.

Evaluate performance on FActScore.

The results show that the new version has significantly reduced the hallucination rate when querying facts and answering informational questions, providing more stable and reliable responses. This makes Grok 4.1 more practical and precise in scenarios of “quick answering” and “researching information” compared to its predecessor.

From the chart, it can be seen that the hallucination rate of Grok 4.1 decreased from 12.09% to 4.22%, a reduction of about three times. The factual verification score (FActScore) also dropped from 9.89% to 2.97%, indicating a significant improvement in the accuracy of Grok 4.1.

( Note: FActScore is a public test consisting of 500 questions about real-life biographies, used to assess the model's performance in fact-searching, correctness judgment, and answer consistency, which can be referred to as fact verification scoring. )

( 2025 The latest five mainstream AI language models ( LLM ) full analysis, understanding payment, applications, and security all at once )

This article Grok 4.1 is newly launched: AI hallucinations reduced by 3 times, emotional understanding and creative writing fully upgraded. First appeared in Chain News ABMedia.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.