IMHO, diffusion models are as big a breakthrough as transformer models. It's a rare development when an architecture requires fewer compute resources than previous proposals. lilianweng.github.io/posts/2021-07-…
The intriguing bit about diffusion models is how it employs a numerical solver to calculate the reverse flow. It's rare that you have this level of computational control. yang-song.net/blog/2021/scor…
The stable diffusion process takes this to a new level by employing ideas from StyleGAN to control each layer of the reconstruction process. ommer-lab.com/research/laten…
This is next-level in because the control variables are not non-parametric distributions but rather raw text. The richness of semantics is immensely greater in textual models. It's mindboggling we can control these massively parallel systems.
It's serendipitous that we reveal the immense capability of diffusion models due to their fusion with transformer models. It appears that the utility of deep learning hinges on its ability to employ language models as input.
Transformer models have been around since 2017. But diffusion models are a little over a year old. Its uptake will be much faster given it can leverage all the new computing and software innovations from the last 5 years. arxiv.org/abs/1706.03762
But are diffusion models something that brains do? Is this how brains retrieve context for subsequent cognition?
Diffusion models are more effective than Generative Adversarial Networks. They can be applied to any kind of modality and can transform any space into any other space.
It's the ease in reversibility that is shocking. It goes against the 2nd law of thermodynamics. It's Maxwell's demon instantiated in software. en.wikipedia.org/wiki/Maxwell%2…
This reversibility has been perplexing to me since it was demonstrated in 2015 by @jaschasd and @SuryaGanguli arxiv.org/abs/1503.03585
In fact, this method highlights the flaw of employing non-parametric distributions as the mechanism for latent representation (see: VAE). Latent representations can be anything! autodesk.com/research/publi…
The mind-bending insight behind all of this is that it's an error to believe that latent representations can objectively mean something. It is the process that renders meaning to a representation, not the other way around!
It is analogous to biology. How is it that DNA evolved to its present encoding to render life? Is there an objective universal machine code, or is this code a consequence of a subjective process? Said differently, it's the evolutionary process that renders an interpretation.
Just as diffusion models like #stablediffusion are able to recreate images from just a seed (and its original prompts), DNA functions in the same way. The only requirement for repeatability is that the seed can be stored in a robust form.
Ever since DNA was discovered, it has been a perplexing thing as to how biology could render its code into DNA. Yet here we have with a diffusion process a reproducible process of how that may come about. Is this not, in fact, revolutionary from an abstract framing?
In this framing, let's review deep learning history again. DL is, at its core, curve-fitting. Backpropagation (i.e. the chain rule) solved the problem of constructing covariant representations across many layers of related representations.
Covariant representations makes possible complex iconicity. This sets the stage that leads to the recognition of complex indexical relationships. This was discovered oddly enough through research in the symbolic space. Transformers was the breakthrough needed for indexicality.
In parallel, through the development of GANs, deep learning was discovered to be extremely competent in recreating images with uncanny precision. This is known not because we can measure it mathematically but because we can see the images. medium.com/intuitionmachi…
From here, the combination of the two (i.e. Transformers and StyleGANs) and the scalable diffusion models led to today's image generators controlled by human language textual prompting. We know this works because we see the results.
Progress is being made not because we have good mathematical measures to tell us how good one network is over another. It is good because we see with our eyes that it is better. In the old days, it was believed that rendering detail out of VAEs meant lowering the variance.
In today's image renderers, you simply prompt that asks for more detail (i.e. "high detail 4K"). The encoding in the latent space is simply irrelevant. This reveals that the latent encoding and the outputs do not render any discernable objective (or mathematical) meaning.
This is a strong argument for the anti-representation stance of enactivist psychology. The intuition is correct, but it is incomplete. It is the semiotic process that is critical. medium.com/intuitionmachi…
Now that we have many pieces of the puzzle resulting from DL, transformers, and diffusion models, what is next? But we must ask if scale is all we need (see: "Bitter Lesson"), then why is there still a need for yet another innovation?
Many problems have yet to be solved (1) symbol grounding (2) the frame problem and (3) abduction. These remain unreachable despite DL innovation.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carlos E. Perez

Carlos E. Perez Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @IntuitMachine

Aug 19
Peircian semiotics has a strange kind of self-referentiality. Interpretants are the interpreting thoughts that signify relations, and these interpreting thoughts are themselves signs. All thoughts are signs, or as Peirce calls them, “thought-signs.”
Hofstadter claims that analogy is the core of cognition.
Thus are Peirce's thought signs the same as Hofstadter's analogies?
Read 4 tweets
Aug 19
Tiny book that everyone should read. Image
Umberto Eco grew up under Mussolini, so he would know!
14 characteristics: "The cult of tradition",
"The rejection of modernism",
"The cult of action for action's sake"
"Disagreement is treason",
"Fear of difference",
"Appeal to a frustrated middle class",
"Obsession with a plot",
"Pacifism is trafficking with the enemy",
Read 4 tweets
Aug 19
It's been a few years since I was introduced to Hoffmeyer's "code duality" in Biosemiotics. It's frustrating to me that this metaphor is never used to explain deep learning networks' astonishing generative capabilities. medium.com/intuitionmachi…
I believe a historic bias favors a dynamical explanation over an explanation that embeds linguistic elements. This is odd when we know that biology requires DNA to maintain its long-term stability. Yet researchers persist with methods that are absent language features.
Biology and computers share in common the nature that both are rate-independent. Present-day computers are completely devoid of time constraints. Computers are decoupled with actuators and sensors that translate the digital and continuous domains.
Read 16 tweets
Aug 17
Causation (even causality) is usually framed in the brutal precision of deductive logic. This is incomplete because we now see how inductive logic leads to intuition machines, and their behavior is not of a deductive nature but rather a contextual one.
Inductive logic is a consequence of accumulating historic experiences and collective behavior. How an agent will behave cannot be predicted in a context-free way as required by deduction. This is a construction of one kind of emergent behavior that exists in biology.
Abductive logic is a consequence of our human ability to generate abstractions and analogy thinking. This leads to an entirely new kind of emergent behavior in that humans invent technologies and civilization.
Read 4 tweets
Aug 16
C.S. Peirce may have overlooked an important trichotomy: 1st person->3rd person->2nd person. Alternatively, subjective->objective->collective. It's the relationship between the interpretant and *other* interpretants.
Peirce had proposed many of these. The most well-known was icon->index->symbol. These triadic structures represented the evolution of the interpretation of a sign. They form a way to frame a curriculum for knowing or knowledge discovery.
Peirce, before he passed, proposed 10 of these. The 10 seem like a work in progress rather than a final conclusive framing.
Read 6 tweets
Aug 15
More insight into the heated debate about art and diffusion models. danieljeffries.substack.com/p/the-fantasti…
Also, a related tweet storm. This is a fire drill as AI becomes increasingly capable.
The pushback stems from the reality that many artists take a long time to master their style. It takes countless hours of painting. It is disconcerting that a complete novice can replicate a style without effort.
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(