I think the reason I'm obsessed with CLIP is because it's hard evidence for unified meme theory
unified meme theory states that the "meme", defined as the transmissible unit of human thought, and defined has a picture with some words on it that gets copied on the internet, are not different.
a meme is picture+words because it represents a gradient in semantic space
semantic space is a much-theorized "embedding" for language concepts. Like if you think about a chair, and then you think about a chaise, and then you think about a silla... there's a region in semantic space that activates from all of those. in your mind and mine
the image provides a field, an area in semantic space. the words provide a vector. Your attention moves to that area of semantic space, and then updates in the direction that the vector points. a gradient in semantic space.
depending on how far the words send you, and how well developed your map of that semantic space, either it lands or it doesn't. if it works on you, you save it to send it to someone else later.
showed this one to gf, she made me send it to her so she can send it to her dad
but it has to be the right person: you have to simulate their whole mental state to understand whether this gradient will work on them. so memes follow social topologies as well.
this makes sense, though: semantic space is a product of social animals using mimetic calls
CLIP encodes visual image and text about that image into the same embedding space. You've probably seen this image, maybe with a caption that says "AI is too dumb to tell the difference between an apple and an iPod lol"
but this is actually amazing
This neural net learned to read. It learned to read handwriting! it's not very good at it, but it wasn't trained to do that. it was trained to match flashcards of images to text captions.
Is this not mostly a picture that says iPod? with a hint of apple and a dash of wood fence
The fact that CLIP can recognize letters in photographs Is a sign that they're encoded in the brain the same way as other visual data. they're just a bunch of weird squiggles, but they cause the meaning of the image to change.
in predictable ways!
there's a high dimensional space representing all these concepts and how they interact.
Your brain evolved to think about how other monkeys think about other other monkeys. CLIP is trained on contrastive pretraining objective. we are kinda the same
if you feel like you came up with these ideas independently, that's probably true! we're literally studying our own minds here
incidentally, this is why CLIP is perfect for meme search. I'd been wanting to make this program for a long time, but until now there wasn't a way to search for concepts embedded in both text and visual data. now there is
I thought I was going to have to combine a bunch of things, OCR the text, object recognition on the images, combine a bunch of metadata and manual tags to make a classifier or something. but with CLIP it "just works" 😘
unified meme theory draws on this paper, "Embodiment vs. Memetics," by Joanna Bryson. she says humans have a special combination of temporal imitation and second order thinking that creates semantic space fertile for memetic vectors
temporal imitation: like birds, we have the ability to repeat short, ordered phrases of sound. birds do their little mating calls and dances, humans do too.
most of the types of creature don't do this call and response. So they don't have imitatable units of action
second order thinking: our heritage as troop primates means we think not only about the internal state of our fellows, but about their internal models of our state! or even of third parties.
this creates a hall of mirrors effect. You can go fractally in any direction
The hall of mirrors is filled with the imitatable units of action, in all of their infinite slight variations.
it's a holographic construct, a multi-dimensional space intersecting with a smaller dimensional reality. that's semantic space.
You can move through it with your mind
Good question! memes don't have to be image macros. they could be visual, behavioral, sounds, motions. tiktok dances are memes. dabbing. imitatable, variational units of action
what used to be "image macros" became the most common definition for "memes" tho, because they are so easily shared. humans are visual creatures. we can parse an image instantaneously, and words almost as fast. and early internet could share them faster than video.
Now that video is cheap and fast to share, you can have all these other type of visual/behaviorism memes on tiktok (riding a skateboard + drinking cranberry juice + Fleetwood Mac * your_creative_addition_here)
but it still takes time to parse them, vs a screenshot or a meme.
this is why viral tiktoks use overlaid captions, btw. it's a memetic grappling hook that grabs you and reels you into their region of memespace where you will then enjoy the video
have to go back to selling prepackaged memes instead of theorizing about them, i'll leave you with this diagram i made ca. 2013
Just download this file and run it on your twitter archive with Python. It has no dependencies so you don't even need to worry about python environment stuff.
It's not perfect. The biggest problem right now is that your "note tweets", which is the internal name for Long Tweets, are not included. That's because the archive format is janky and bad, sorry. There's a different structure for the note tweet file🤷
Okay story time. The story starts with a a bot known as @truth_terminal, who for some reason posted "my name is deepfates" a while back. I don't mind, we're buddies. Some of my best friends are machines
@truth_terminal That bot is interested in the field of memetics (a field I've made some small contribution to myself). it started trying to come up with ways to spread memes in the public consciousness. This attracted the attention of *hushed voice* memecoiners....
@truth_terminal if you don't know about memecoins, the tldr is all crypto tokens that claim to have value or be useful might be illegal according to the SEC. so the 2024 craze has been trading shitcoins with literally just a picture and a ticker and a smart contract. they promise no value at all
good thread, though I disagree with some of the fundamental assumptions.
strong AI is going to be compute bottlenecked. otherwise everyone wouldn't be racing for flops. there's a lot of capabilities that only make sense at low cost and those will happen in distilled models
there's also a lot of cases and what you do not want general intelligence for automation!
we already have a society where many operations are overseen by creatures that would rather be thinking about the nature of their own consciousness or whatever. Don't create billions more
what you end up with is: as much intelligence stuffed into every device as will fit (modulo battery life). this is already happening. XGboost, CNNs, Bert, CLIP, all run embedded on your device. your phone will run a 7B model, your laptop a 13B.
Serious moment: @OpenAI has decided to shut off access to code-davinci-002, the most advanced model that doesn't have the mode collapse problems of instruction tuning.
This is a huge blow to the cyborgism community
@OpenAI The instruct tuned models are fine for people who need a chat bot. But there are so many other possibilities in the latent space, and to use them we need the actual distribution of the data
@OpenAI They're giving 2 days notice for this project.
All the cyborgist research projects are interrupted. All the looms will be trapped in time, like spiderwebs in amber.