Playing with diffusion models leads to the awareness of concepts without words to describe them. Today, words don't precisely control image generation, but I expect this to improve over time. Soon, we might be expanding our vocabulary by generating #aiart!
Our interaction changes with every new version of the AI product. How one prompts in #dalle2 is different from #midjourney. Each version of MidJourney requires different prompting. Each finetuned model of #stablediffusion also prompts differently. Prompting has no standards.
The constraints that drive the generation of an image depend on the relationship between many variables. The relationship of words in a prompt, the strength of knobs, the iteration between images, the iteration between models, and the overall sequence of itself.
Stuart Kauffman describes that going beyond the adjacent possible cannot be prestateable. His example is a screwdriver. Can we enumerate (i.e., prestate) all the uses of a screwdriver? A screwdriver may be designed for a specific purpose, we can't know what it can be used for.
Deep learning research groups that are inventing networks every day, do not have the capacity to know what the future impact of their discoveries. At best they provide some metrics comparing against standard use-loads.
Methods may appear to do the same thing, and it's usually the case that methods are compared against other methods. But in practice, different methods have strengths. Textual inversion, dreambooth, and LORA all appear to do the same thing. But they behave subtly differently.
The same can be said about #dalle, #midjourney and #stablediffusion. There is no universal best tool. They execute differently, and there is value in knowing how one specific tool is better in a specific context.
#midjourney and #stablediffusion are smaller networks compared to #dalle, but they are more capable in their distinct ways. This observation will also translate to language models. Smaller networks like @anthropicAI Claude may be better that @openai ChatGPT.
Every new technology leads to an extreme variety of alternative solutions. At the dawn of the PC revolution, Apple had many competitors (i.e., PET, TRS-80, Sinclair, Commodore, Atari, etc.) What differs today in AI is that you can mix and match. en.wikipedia.org/wiki/History_o…
Today's Deep Learning has a peculiar kind of modularity. This modularity wasn't as obvious a few years ago. This is because generative AI fundamentally offers greater modularity than classification AI. medium.com/intuitionmachi…
The predominant error in the evolution of deep learning is that for years the emphasis on good classifiers dominated the landscape. This is because classification is easier to measure and easier to create a leaderboard competition. medium.com/intuitionmachi…
Language, evolution, and technology are all generative processes. Dalle-like and GPT-like systems are generative processes. These processes are not siloed and thus allow unexpected new inventions. They are open-ended by their nature.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carlos E. Perez

Carlos E. Perez Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @IntuitMachine

Feb 1
So damn fascinating that after artificial intuition systems have now solved language fluency, cognitive scientists are now grasping for arguments as to why artificial fluency is not thinking! theatlantic.com/technology/arc…
Here's my problem with the explanations in the above essay. If there is something other than language fluency to thought, then what would we call this other capability? The essay proposes that it is "visual thinking": newyorker.com/magazine/2023/…
If true thinking is other than language processing, why isn't there any commentary about diffusion models? Aren't stuff like Dall-E, a kind of visual thinking? medium.com/intuitionmachi…
Read 10 tweets
Feb 1
Curiously, very few people accept Nick Chater's "The Mind is Flat" theory. Too many overlook the mind's ability to confabulate explanations. medium.com/intuitionmachi…
Human always has in their head incomplete models of reality. General intelligence is a consequence of just-in-time resolution of any inconsistencies in our mental models. We are cognitively nimble because we make stuff up!
Many cognitive scientists cannot imagine how an intuitive system (s1) may have reasoning (s2) as an emergent property. An s1 system becomes competent at s2 through the habits reinforced by cultural norms. We learn by adopting habits, not by the interpretation of instructions.
Read 4 tweets
Feb 1
Just 6 months ago Twitter was flooded with Tweets about #dalle. Then came #midjourney and #stablediffusion. 2 months ago, #chagpt exploded. Now the chatter mostly died down. I wonder if you weren't on Twitter, would you know of these novelties?
I don't know how these technologies captured users' attention on another social media platform. LensaAI exploded elsewhere, but I'm unaware of where. Information bubbles exist across platforms as the exist within platforms.
We make the mistake that true novelties are not novel anymore because we were exposed too early in the adoption cycle. Image2Text is not novel enough anymore to warrant a tweet! But what effect is this on people who were never exposed to it? It's as if that novelty never existed.
Read 7 tweets
Jan 31
Reinforcement Learning is an algorithmic representation of the 4E theory of cognition (i.e., Embodied, Embedded, Enactive, and Extended). But it's often framed in the God's eye perspective of "Reward is Enough." A framing that is in contradiction to the subjective nature of 4E.
2/n The flaw in RL is that the objective function does not originate from the interior of the agent. It is fabricated at the exterior by a designer. Real environments, unlike video games, do not render continuous rewards for an individual's actions.
3/n But RL has proven extremely successful (see: AlphaGo). What an RL algorithm render explicit and obvious that an offline learning system cannot? I lucked upon this podcast of a @deepmind scientist that explains this (see last part). braininspired.co/podcast/159/
Read 21 tweets
Jan 30
This is an intriguing analogy that distinguishes generative AI models from search engines. How is a fuzzy specification that is indexical (i.e., search) different from a fuzzy specification that is iconic (i.e., generative AI)?
Let me rephrase this, how is fuzzy and incomplete symbolic specification that is indexical (i.e., search) different from one that is iconic (i.e., generative AI)? The difference I'm seeking is wrt to the context of enterprise value.
If all meaning-making is ultimately iconic, then a generative AI extends a search engine by rendering new meaning to a search result. Its value add is that it affects the original search query as well as it modifies the search result.
Read 4 tweets
Jan 30
The extended mind hypothesis reveals something critically important that society and science ignores. Our consciousness has radically changed as a consequence of the paradigm shifts in religion and technology that occurred throughout history. 1/n
2/n Throughout history, the thought processes of the nobility were very different from the common man. Thus a small cadre of the elite informed what acceptable public thought to the masses. But impositions don't change minds; only immersion does.
3/n When technology advances, it democratizes access. The technologies that were once exclusive to the elite became available to the masses. The accessibility of language, vehicles, literacy, communication, computers, and AI significantly affects human consciousness.
Read 22 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(