What's the logic behind DeepMind's universal Perceiver-IO block?
Perhaps we want to compare this with the original perceiver architecture:
Note how the input array (in green) is fed back into multiple layers as a 'cross attention' in the previous diagram. That cross attention is similar to how your would tie an encoder with a decoder in the standard transformer model:
The Perceiver-IO block differs from the perceiver block in that there is this additional "output query array" that it merges into another cross attention block. The innovation here is that this additional block maintains the richness of the outputs.
Let me explain this in semiotic terms. In simple semiotics, there are three kinds of signs (i.e. icons, indexes and symbols). In the conventional feedforward network, similarity is computed between the inputs and the weights. That is, an object is compared to its icon.
A transformer block is more complex. The keys and query inputs undergo a iconic transformation. This is followed by a correlation between these two iconic representations. It is this correlation that is compared to internal weights for the final output.
In semiotic logic, a transformer block is an implementation of a natural proposition. More specifically, a collection of parallel natural propositions, that leads to a final argument.
But how do we explain what is going on in the Perceiver-IO in terms of semiotic terminology?
The implementation advantage of the Perceiver architecture is that it has a lower-dimensional latent space. This allows for deeper pipelines and hence more semiotic transformations. It maintains its integrity by syncing back via cross attention with the original object.
The consequence of this is that each layer of the Perceiver can attend to different aspects of the original object without having to carry features across the entire semiotic pipeline.
Expressing this differently, a Perceiver network attends back to the original object to capture more information that it may have not captured in earlier layers. It is an interactive form of perception quite reminiscent of saccades.
The utility of the Perceiver network is that it works universally across all kinds of sensory input. Unlike fine-tuned architectures like CNNs, it does not need to use a hardwired network architecture to exploit invariances in the input data.
The Perceiver-IO network inherits all these features but adds a twist that in that instead of only iteratively syncing back to an external object, it syncs with an internal sign.
In other words, it is performing a semiotic process that is driven by an internal symbolic language.
Summarized in this blog entry: medium.com/intuitionmachi…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carlos E. Perez

Carlos E. Perez Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @IntuitMachine

6 Aug
This is a wonderful talk about design and how to do it right. runemadsen.com/talks/uxcampcp…
I do recommend that you go through the talk before proceeding in this tweet storm.
The conclusion comes out at the very end. It is the same observation that Christopher Alexander employs in this multiple volume series 'The Nature of Order': amazon.com/Nature-Order-P…
Read 10 tweets
6 Aug
In IT you can be employed for life because the technical debt keeps piling up. The garbage collector analogy is an apt analogy. Nobody wants to do it, so they'll pay people to do it.
The most predictable path to profitability is to create a business collecting other people's garbage. The riskiest business model is to do only the cool things that everyone else wants to do.
Too many startups are fixated only on doing the next cool thing. There's a survival bias that big companies are doing all the cool things. Doing the cool thing is profitable only when you are first to pick the low-hanging fruit.
Read 13 tweets
6 Aug
Here's a map of the US and the delta-variant as of Aug 5. Is there a firewall that can contain the rapid rise of infections coming from the south? Image
Here's the covid vaccination rate as of July 1. The northeast is bordered by well-vaccinated states (i.e. IL, KY, VA). It will be problematic if there is an outbreak in IN, OH and WV. Reminds me of playing the game of Risk. Image
So what's going on in FL? It's in the middle of a battle. There's no firewall protecting it from lowvax states like AL. If you look at the map, the intensity of the outbreak in AL is at the border with FL. ImageImage
Read 6 tweets
5 Aug
Von Neumann once told a student who was troubled by the counter-intuitiveness of quantum mechanics:
There are many deep learning practitioners who are also lost in mathematics. The difference of course is that DL folks don't actually have to perform the computations, the network does that for them.
In fact, with methods like architecture search, they can just use the machine to discover the optimal neural architecture.
Read 13 tweets
4 Aug
Successful startups boil down to possesions of effective mass persuasion thingamajigs.
This is also why money printing thingamajigs have so much persuasive value. Hence why cryptocurrencies have their appeal.
People are more easily persuaded to pay for something if they perceive that it is an investment. An investment is anything that makes more money than what you originally put in.
Read 7 tweets
3 Aug
Feynman once said "Religion is a culture of faith; science is a culture of doubt." But I must ask then, Engineering is a culture of?
GPT-3 has a very interesting answer. Don't read past this tweet so that it doesn't bias your answer.
Religion is a culture of faith; science is a culture of doubt; engineering is a culture of
1 - Procedure
2 - Proof
3 - Confirmation
4 - Verification.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(