Will scale pay dividends in diffusion models? Here is Alibaba's Composer with 5B parameters. Dall-e 2 has 3.5B parameters, and StableDiffusion has 890m parameters.
Browsing through the examples, there are only a few things that #stablediffusion with #controlnet cannot do. There aren't any Color palette and intensity ControlNet networks. But certainly this can be trained.
It is still too early to tell if Composer reveals new emergent phenomena. But it is indeed exciting that there's a 5G parameter image AI that will soon be available publicly to test.
But if larger diffusion models are not significantly superior to smaller ones. Then we must ask if we have reached the limits of a specific kind of architecture and need to invest in a new kind. What does Composer not do differently enough from StableDiffusion?
This is an absolutely fascinating development because we can now contrast a large model like Composer that is trained to compose multiple input sources and compare them with #ControlNet where you can manually achieve the same thing through the right sequencing!
Absent access to Composer, I'm guessing here that a larger network affords the parallel satisfaction of multiple constraints. This is in contrast with multiple #controlnet that do well via sequential application.
The question here is in the 5x larger network worth the parallel constraint satisfaction? What new emergent capabilities does this make possible that is absent in a more modular network like #controlnet? We will discover this quite soon!
I expect the next advance to involve the replacement of the UNet convolution network with a ViT transformer network. There are semantics not captured in UNet that is essential in image composition. I'm just waiting for a firm to foot the training bill!
This question of scale versus modularity is a critical question to answer. This is because it will determine if AI monopolies can build their defensive moats around their cathedrals or whether a cottage industry of bespoke solutions will generate markets of diversity.
Human cognition is iconic, metaphoric, analogical, and hence ultimately empathetic. Indexicality and symbolism all germinate from discovery patterns of similarity.
We understand the unfamiliar when it is expressed in a familiar similarity. We cannot understand new abstractions if it is never framed into an existing metaphor. We can understand a particle or a wave, but never both being the same.
All agency is structured relative to a reference frame of the self. The self of agents varies in scope. Higher intelligence sees themselves as belonging to a larger self that includes all of humanity. A narcissistic self cares only for "number one" (i.e., himself).
It's occurred to me that when discussing general intelligence, you must dumb it down! Use metaphors that appeal to the reductionist and noun-centric thinking of the masses. Appeal to the IQ of 100!
The concept of an IQ score is exactly this reductionist and noun-centric thinking. That all intelligence can be distilled in a single number and that this number remains fixed and permanent (like a thing) throughout one's entire life. This is the consensus notion of an IQ score.
Intelligence is a multifaced thing. There are many kinds of cognitive competencies (see: Martin Gardner), just as humans have different personality traits (see: Big 5). Society doesn't have a measure for this multidimensional nature en.wikipedia.org/wiki/Theory_of…… twitter.com/i/web/status/1…
People haven't read enough books to understand AI and the complexity of reality. My past self from two years ago may need help understanding my ideas of today. But those ideas came from elsewhere; they came from focusing on a specific line of thinking and reading books.
But which books does one find to read? Interestingly enough, I discover them from recommendations I find on Twitter! You can't know what you don't know without interacting with different minds. There's a ton of stuff you don't know because you were never exposed to it!
Books are extremely important because they take a deeper look at a subject matter and they build a narrative that is easier to digest and remember. A narrative that stitches together a complex subject. You cannot get this on short form renditions like podcasts and interviews.
1/n How does @deepmind's RETRO and @MetaAI ToolFormer inform how modular deep learning systems should be architected?
2/n Previously, I tweeted about innovations in the stable diffusion space on how the mix-and-match modularity of deep learning components is leading to an explosion of capabilities. medium.com/intuitionmachi…
3/n I noted that language models were critical in how image models are stitched together to perform image transformation workflows. Surprisingly, it's not as obvious how this is done when the medium that you transform is also language (and not images).
We need robots because we don't want to sacrifice the rest of our lives caring for those dying Boomers who destroyed the climate and our future on their watch.
We need VR and the metaverse because we have become a futureless civilization. Where entrepreneurial hustle is faking till you make it. Where follower growth is the ultimate metric.
We need AI and crypto because VCs and investors love that automation and the securitization of everything permits businesses to wash themselves of all accountability.
1/n Modularity is essential for any disruptive technology. For years deep learning lacked sufficient remix functionality to quickly customize solutions. Everything had to either be trained from scratch or fine-tuned. The latest innovations are tearing down these restrictions.
2/n Modularity allows a developer to combine an existing module with others to generate a bespoke solution. Years ago, this was difficult to do with deep learning. medium.com/intuitionmachi…
3/n Transformer and diffusion models have radically changed the methods. Transformers are an essential building block for specifying constraints. Diffusion models serve as a model of parallel constraint satisfaction. This has led to the powerful generative AI of today.