Michael Moor Profile picture
MD. PhD. Postdoc @Stanford CS w/ @jure. Working towards generalist medical AI, one gradient step at a time.

Mar 25, 2023, 9 tweets

Interesting failure mode of #GPT4!

It can't play "Set", a card game that is trivial to solve with a 10-line python program.

--> it can explain the game, can abstract it, write a program to solve it, but can't actually *play* it.

Check the full convo in the chat below:

First, I asked about the game (great intro if you don't know the game).

Then, I asked #GPT4 to come up with a strategy to solve it. The strategy is sound, but it doesn't consistently map between the numbers and the attributes:

Ok, so let's try playing: (spoiler: it provides only wrong answers!)

Funnily, it can write a program that does it correctly:

which returns:
Set 1: (2121, 1213, 3332)
Set 2: (3221, 3113, 3332)
Set 3: (1331, 3132, 2233)

But #GPT4 can't simulate this program properly, and when being pointed out that the output is wrong, it suggests the algorithm is wrong and edits the code:

This again showcases the promise of Toolformer and external plugins to execute code..

Might be of interest to @fchollet

#Bard can't play Set either. At least, it hallucinated a valid Set (but the cards were not on the table).

#AI #LLM #NLP

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling