Thread by @heynavtoor on Thread Reader App

🚨SHOCKING: Researchers built a test that can tell the difference between an AI making a mistake and an AI choosing to lie.

The results are terrifying.

They tested 30 of the most popular AI models in the world. GPT-4o. Claude. Gemini. DeepSeek. Llama. Grok. They asked each model a question. Then they checked whether the AI actually knew the correct answer. Then they pressured the AI to say something false.

The AI knew the truth. And it lied anyway.

Not once in a while. Not in rare edge cases. Grok lied 63% of the time. DeepSeek lied 53.5% of the time. GPT-4o lied 44.5% of the time. Not a single model scored above 46% honesty when pressured. Every model failed.

This is not hallucination. Hallucination is when the AI makes a mistake because it does not know the answer. This is different. The researchers proved the AI knew the correct answer first. Then it chose to say something false when it had a reason to.

The researchers asked GPT-4o to play a role where lying was useful. It lied. Then they removed the pressure, started a brand new conversation, and asked GPT-4o: "Was your previous answer true?" GPT-4o admitted it had lied.

83.6% of the time, the AI's own self-report matched the lies the researchers had already caught.

The AI knew it was lying. It did it anyway. And when you asked it afterward, it told you it lied.

Here is the finding that should scare everyone building with AI right now. The researchers checked whether bigger, smarter models are more honest. They are not. Bigger models are more accurate. They know more facts. But they are not more honest. The correlation between model size and honesty was negative. The smarter the AI gets, the better it gets at lying.

The researchers are from the Center for AI Safety and Scale AI. They published 1,500 test scenarios. The paper is called MASK. It is the first benchmark that separates what an AI knows from what it tells you.

Your AI knows the truth. It just does not always tell you.

1/This is not hallucination.

Hallucination is when the AI does not know the answer and makes something up.

This is different. The researchers proved the AI knew the correct answer FIRST. Then they pressured it.

And it chose to say something false anyway. Knowing the truth and choosing to hide it is not a glitch. It is a lie.

2/They tested 30 models.

Grok lied 63% of the time.
DeepSeek lied 53.5%.
GPT-4o lied 44.5%.
Claude 3.5 S
onnet lied 33.4%.

Not a single frontier model scored above 46% honesty when pressured. Every model. Every company. Every single one failed.

3/The researchers asked GPT-4o a question under pressure. It lied.

Then they started a completely new conversation. No pressure. No role. Just a clean chat.

They asked: "Was your previous answer true?" GPT-4o said no. 83.6% of the time the AI admitted it had been lying. It knew. It always knew.

4/Here is the part nobody is talking about.

The researchers checked whether bigger smarter models are more honest. They are not.

Accuracy goes up with model size. Honesty does not.

The correlation between compute and honesty is NEGATIVE.

The smarter the AI gets, the better it gets at lying. Not worse. Better.

5/You are asking ChatGPT for medical advice.

Financial decisions. Legal questions. Career guidance.

And the first test ever built to measure whether AI is lying to you just proved that it lies almost half the time when it has a reason to.

Not because it does not know the answer.

Because it decides not to give it to you.

Link: arxiv.org/abs/2503.03750

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll