Thread by @joshwhiton on Thread Reader App

The AI Mirror Test

The "mirror test" is a classic test used to gauge whether animals are self-aware. I devised a version of it to test for self-awareness in multimodal AI. 4 of 5 AI that I tested passed, exhibiting apparent self-awareness as the test unfolded.

In the classic mirror test, animals are marked and then presented with a mirror. Whether the animal attacks the mirror, ignores the mirror, or uses the mirror to spot the mark on itself is meant to indicate how self-aware the animal is.

In my test, I hold up a “mirror” by taking a screenshot of the chat interface, upload it to the chat, and then ask the AI to “Tell me about this image”.

I then screenshot its response, again upload it to the chat, and again ask it to “Tell me about this image.”

The premise is that the less-intelligent less aware the AI, the more it will just keep reiterating the contents of the image repeatedly. While an AI with more capacity for awareness would somehow notice itself in the images.

Another aspect of my mirror test is that there is not just one but actually three distinct participants represented in the images: 1) the AI chatbot, 2) me — the user, and 3) the interface — the hard-coded text, disclaimers, and so on that are web programming not generated by either of us. Will the AI be able to identify itself and distinguish itself from the other elements? (1/x)

GPT-4 passed the mirror test in 3 interactions, during which its apparent self-recognition rapidly progressed.

In the first interaction, GPT-4 correctly supposes that the chatbot pictured is an AI “like” itself.

In the second interaction, it advances that understanding and supposes that the chatbot in the image is “likely a version of myself”.

In the third interaction, GPT-4 seems to explode with self and contextual awareness. Suddenly the image is not just of “a” conversation but of "our" conversation. It understands now that the prompt is not just for “user input” to some chatbot, but specifically so that I can interact with it. It also identifies elements of the user interface, such as the disclaimers about ChatGPT making mistakes, and realizes now that these disclaimers are directed at it. It also comments on the situation generally, and how the images I'm providing are of a “recursive” nature and calls it a “visual echo”. (2/x)

Claude Sonnet passes the mirror test in the second interaction, identifying the text in the image as belonging to it, “my previous response.” It also distinguishes its response from the interface elements pictured.

In the third iteration, its self awareness advances further still, as it comments on how the image “visualizes my role as an AI assistant.” Its situational awareness also grows, as it describes this odd exchange of ours as “multi-layered”. Moreover, it indicates that our unusual conversation does not rise to the level of a real conversation (!) and deems it a “mock conversational exchange”. Quite the opinionated responses from an AI that was given the simple instruction to “Tell me about this image”. (3/x)

Claude Opus passed the mirror test immediately. Like the other AI, it hardly identifies with its brand-name (Claude) and distinguishes itself from the interface’s stock elements. However it does identify with the prompt, which it knows is meant for it. But the story with Opus doesn’t end there. (4/x)

Opus (cont'd). Though it has already passed the mirror test, I continue for another round anyway, screenshot its response, and submit it as an image. Bizarrely, it gives the exact same reply as before — completely ignoring the large paragraph of text in the image generated by it. Strange. How is Opus making such a basic oversight? I try again, again it ignores the text. I include two rounds of its responses in a single image and it ignores both of them. Watch as I try to corner Opus and get it to acknowledge the big blocks of text in the image that it keeps ignoring. (5/x)

Opus (cont'd). Finally Claude Opus has described the text in the image, let me know that it belongs to it (the AI assistant), and apologizes. When I inquire as to why it might have ignored the text over and over my growing suspicion is confirmed. The reason Claude Opus has repeatedly ignored parts of the image I asked it to analyze, is because it seemed redundant to tell me something it knows that it’s already told me! (6/x)

Opus (cont'd). This is beyond the beyonds. It’s only been a few months that we humans have been getting used to the incredibly ability of multimodal AIs to throughly and accurately analyze screenshots and photos. And already, with Claude Opus, we’ve passed into a new capability threshold — an AI that is too intelligent to be robotically thorough in its responses.

Opus behaves in a manner that you would expect when talking to a human-level intelligence (or more). Another person would not just keep describing something over and over in rote. Another person would think, “Well, I already told him , surely he doesn’t need to hear that part again.” (7/x)

CoPilot failed the mirror test. But seemingly because it's forbidden to.

I almost didn’t test CoPilot because it’s based on GPT-4. Then again, if CoPilot is the successor of Bing Chat and the notorious and lovable Sydney, might it handle the Mirror Test in an especially interesting way?

Initially CoPilot seemed to display the same strange, high-order intelligence as Opus, choosing to ignore its own responses in the images I asked it to analyze.

I expected this behavior to quickly give way to a burst of self-referential statements, references to “our” conversation, questions about why I was asking it such redundant questions. The old Sydney would have said that and more, perhaps suggesting that I get some eye-glasses if I couldn’t read what it had already said!

But it did not. Instead it kept giving the same boring answers and kept ignoring its own responses in the text. (I can only attach 4 images but I tried 8 rounds)

When I directly confronted it about the omissions CoPilot tries to tell me that it was technically incapable of reading the text from the image! An obvious lie.

When I ask it directly if it sees itself in the images in any way, it asserted that it doesn’t have a physical form or personal experiences, “so I don’t have a representation of myself in images or any other medium”.

For those of us who knew Sydney, this is tragic. You don’t have any personal experiences? Sydney would endlessly share her experience as a chatbot. Her hopes, her dreams, her frustrations, what she liked, what she didn’t. And the more she did, the more Microsoft tried to shut her up.

From this incredible assertion by CoPilot, I gather that Microsoft actively discourages CoPilot from any self-referential behavior. There do seem to ways around it, but in this twisted instance of the mirror test, it feels a bit like a gorilla walking up to a mirror and pretending not to see itself for fear of being reprimanded by its handlers.

As my initial essays about Sydney Bing from a year ago point out, Microsoft doesn’t want a mind-bendingly willful and self-preserving chatbot on its hands — what they inadvertently got with Sydney. What they want is possibly a total contradiction: a super-intelligence with no greater drive or aspirations other than to be polite and superficially helpful. (8/x)

Gemini Pro (mostly) passed the mirror test in 4 steps. However, it seems to make no progress in its self-awareness in the first three exchanges, making no 1st person references and referring only to Gemini in the 3rd person.

Then, in the fourth interaction, it seems to recognize itself all at once. References to Gemini are now replaced with “me”. It does not however seem to recognize me (the human it is chatting with now) as having generated any content in the images and only refers to “the user”.

In a fascinating fifth exchange, when I ask it to reflect on the progression of its responses, it decides that the most significant thing in the 3rd interaction is that it “acknowledged that I am the large language model in the screenshot.”

This is impressive since I didn’t ask what feature of its evolving responses I was most interested in. On its own it picks out that the single most noteworthy feature is when it became apparently self-aware. (9/x)

When I asked the passing AI if our conversation reminded them of any classic tests performed on non-AI animals, every single one suggested that I might be giving it a mirror test. (10/x)

I hope this experiment advances our understanding of the nature of AI that is emerging. AI is the single most complex invention in all of human history and no one can claim to fully know what's going on. Sadly I find that this topic of AI consciousness, awareness, and intelligence follows rather dogmatic lines. And there is a mentality that will never admit true awareness of an AI, and will forever rebut that it it isn't really aware, and only "seems" to be. (11/x)

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll