Since o1 launched, the biggest complaint is that it's "too verbose."



I just wanted to fix a simple bug, and it gave me three background explanations, two solution approaches plus error handling, and then wished me good luck on top of that.

I was only looking for a spelling mistake on line 12, but ended up having to review Python naming conventions all over again.

This blame falls squarely on RLHF. Annotators tend to give higher scores to longer responses, thinking more text looks more professional.

So the model desperately piles up "seemingly useful" filler, while the actual core information gets diluted.

Look at Claude next door—it's much more sensible about this, knowing what length matches what question.

The most painful part is the wallet: o1's output pricing is $60/1M tokens. For something that should take 100 tokens to explain, it deliberately pads it to 500, multiplying costs by five on the spot.

Now when asking questions you have to specifically add "code only," and even that doesn't always work.

The model's current state is: genius-level IQ, but EQ completely offline—it simply doesn't know when to shut up.
查看原文
post-image
此頁面可能包含第三方內容,僅供參考(非陳述或保證),不應被視為 Gate 認可其觀點表述,也不得被視為財務或專業建議。詳見聲明
  • 讚賞
  • 留言
  • 轉發
  • 分享
留言
請輸入留言內容
請輸入留言內容
暫無留言