Researcher @withsecure. Prompt Engineering. Machine Learning. Artificial Life. Disinformation Analysis. https://t.co/JKFqDzzs3S @r0zetta@freeradical.zone

Mar 17, 10 tweets

Is GPT-4 intelligent enough to solve ARC (github.com/fchollet/ARC)- a collection of intelligence tests devised by @fchollet? I ran a few quick experiments to check.
#gpt4 #gpt #ai #ml

In this test, the solution is to output the portion of the grid containing a different coloured cell. Thus the answer to the right-hand side question would be a 6x6 blue grid containing a red cell at the second position in the fourth row down.

GPT-4 did provide a smaller grid containing the correct colours (1 and 2). However, it generated a 5x5 blue grid with one red cell in it that was obviously in the wrong place. Not correct, but the gist of its answer was along the right tracks.

To solve this next task, copy the dot colours horizontally from both left to right and right to left and place a grey pixel at the mid point. A pretty easy puzzle for humans to solve.

GPT-4's solution did use the correct grid size. It did fill in the correct rows. However, it didn't choose the correct sequence of colours. The correct solution would be:

00000000000
44444588888
00000000000
00000000000
66666599999

In the third and final test I tried, the solution is to find the background colour of the area of the picture that contains the most off-coloured cells. The solution is always a single number. In this case, one that represents red.

GPT-4 came up with its own theory that makes no sense when you read it.

I provided tasks to GPT-4 in a textual manner as illustrated here. Each colour was represented by a number. Input and output examples were labelled as such. The final output was left empty and I ran inference from there.

We've seen examples of GPT-4 performing rather amazing understanding tasks against images. Thus, it will be interesting to retry these tests against the version of the model that accepts images as input in order to determine whether it does better.

Bottom line: we not ded yet.