What's the logic behind DeepMind's universal Perceiver-IO block?
Perhaps we want to compare this with the original perceiver architecture:
Note how the input array (in green) is fed back into multiple layers as a 'cross attention' in the previous diagram. That cross attention is similar to how your would tie an encoder with a decoder in the standard transformer model:
The Perceiver-IO block differs from the perceiver block in that there is this additional "output query array" that it merges into another cross attention block. The innovation here is that this additional block maintains the richness of the outputs.
Let me explain this in semiotic terms. In simple semiotics, there are three kinds of signs (i.e. icons, indexes and symbols). In the conventional feedforward network, similarity is computed between the inputs and the weights. That is, an object is compared to its icon.
A transformer block is more complex. The keys and query inputs undergo a iconic transformation. This is followed by a correlation between these two iconic representations. It is this correlation that is compared to internal weights for the final output.
In semiotic logic, a transformer block is an implementation of a natural proposition. More specifically, a collection of parallel natural propositions, that leads to a final argument.
But how do we explain what is going on in the Perceiver-IO in terms of semiotic terminology?
The implementation advantage of the Perceiver architecture is that it has a lower-dimensional latent space. This allows for deeper pipelines and hence more semiotic transformations. It maintains its integrity by syncing back via cross attention with the original object.
The consequence of this is that each layer of the Perceiver can attend to different aspects of the original object without having to carry features across the entire semiotic pipeline.
Expressing this differently, a Perceiver network attends back to the original object to capture more information that it may have not captured in earlier layers. It is an interactive form of perception quite reminiscent of saccades.
The utility of the Perceiver network is that it works universally across all kinds of sensory input. Unlike fine-tuned architectures like CNNs, it does not need to use a hardwired network architecture to exploit invariances in the input data.
The Perceiver-IO network inherits all these features but adds a twist that in that instead of only iteratively syncing back to an external object, it syncs with an internal sign.
In other words, it is performing a semiotic process that is driven by an internal symbolic language.
I do recommend that you go through the talk before proceeding in this tweet storm.
The conclusion comes out at the very end. It is the same observation that Christopher Alexander employs in this multiple volume series 'The Nature of Order': amazon.com/Nature-Order-P…
In IT you can be employed for life because the technical debt keeps piling up. The garbage collector analogy is an apt analogy. Nobody wants to do it, so they'll pay people to do it.
The most predictable path to profitability is to create a business collecting other people's garbage. The riskiest business model is to do only the cool things that everyone else wants to do.
Too many startups are fixated only on doing the next cool thing. There's a survival bias that big companies are doing all the cool things. Doing the cool thing is profitable only when you are first to pick the low-hanging fruit.
Here's a map of the US and the delta-variant as of Aug 5. Is there a firewall that can contain the rapid rise of infections coming from the south?
Here's the covid vaccination rate as of July 1. The northeast is bordered by well-vaccinated states (i.e. IL, KY, VA). It will be problematic if there is an outbreak in IN, OH and WV. Reminds me of playing the game of Risk.
So what's going on in FL? It's in the middle of a battle. There's no firewall protecting it from lowvax states like AL. If you look at the map, the intensity of the outbreak in AL is at the border with FL.
Von Neumann once told a student who was troubled by the counter-intuitiveness of quantum mechanics:
There are many deep learning practitioners who are also lost in mathematics. The difference of course is that DL folks don't actually have to perform the computations, the network does that for them.
In fact, with methods like architecture search, they can just use the machine to discover the optimal neural architecture.
This is also why money printing thingamajigs have so much persuasive value. Hence why cryptocurrencies have their appeal.
People are more easily persuaded to pay for something if they perceive that it is an investment. An investment is anything that makes more money than what you originally put in.