Let's start with a simple example drawn from my 2001 book The Algebraic Mind, that anyone can try at home: (2/9)
Train a basic multilayer perceptron on the identity function (ie mulltiplying the input times one) on a random subset of 10% of the the even numbers, from 2 to 1024, representing each number as a standard distributed representation of nodes encoding binary digits. (3/9)
What you will find is that the network can often interpolate – generalizing from some set of even numbers to some other even numbers, but never reliably extrapolate to odd numbers, which lie outside the trading space. (4/9)
There are any number fo workarounds to this.
But adding more dimensions per se does not help. Try for example to train the neural net on a randomlly drawn subset of the even numbers up to 2^16.Or 2^256, if you like. (5/9)
Again, you will find some interpolation to withheld even numbers, but you still won't get reliable extrapolation to odd numbers. (6/9)
Systems like GPT-3 behave erratically, I conjecture, precisely because some test items are far in n-dimensional space from the training set, hence not solvable with interpolation; failures occur when extrapolation is required. (7/9)
*Because the training set is not publicly available, it is not possible to test this conjecture directly. @eleuther or @huggingface might want to give it a shot with their GPT+like models. (8/9) cc @LakeBrenden who has valuable, related evidence.
The good news is that Yoshua Bengio has begun to recognize the foundational nature of this challenge of extrapolation (he calls it "out of distribution generalization:).
The rest of the field would do well to follow his lead.
(9)
• • •
Missing some Tweet in this thread? You can try to
force a refresh