jack morris Profile picture
getting my phd in language models @cornell_tech 🚠 // visiting researcher @meta // academic optimist // master of the semicolon
May 21 8 tweets 3 min read
excited to finally share on arxiv what we've known for a while now:

All Embedding Models Learn The Same Thing

embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data

feels like magic, but it's real:🧵 a lot of past research (relative representations, The Platonic Representation Hypothesis, comparison metrics like CCA, SVCCA, ...) has asserted that once they reach a certain scale, different models learn the same thing

this has been shown using various metrics of comparison
Jan 3 4 tweets 1 min read
no AI here, just the coolest paper i've seen in a while Image turns out the way paints mix (blue + red = purple) is much more complicated than how light mixes (blue + red = pink)

they have to use a little bit of nonlinear modeling to capture this, and "add" paints in this nonlinear latent color space Image
Oct 4, 2024 7 tweets 3 min read
We spent a year developing cde-small-v1, the best BERT-sized text embedding model in the world.

today, we're releasing the model on HuggingFace, along with the paper on ArXiv.

I think our release marks a paradigm shift for text retrieval. let me tell you why👇Image Typical text embedding models have two main problems
1. training them is complicated and requires many tricks: giant batches, distillation, hard negatives...
2. the embeddings don't "know" what corpus they will be used in; consequently, all text spans are encoded the same way
Apr 4, 2024 6 tweets 2 min read
New Research:

a lot of talk today about "what happens" inside a language model, since they spend the exact same amount of compute on each token, regardless of difficulty.

we touch on this question on our new theory paper, Do Language Models Plan for Future Tokens?Image I think our most crucial finding is that although humans think far ahead while speaking (especially while doing complex reasoning problems) it turns out that transformer language models.... don't seem to do that.

they just predict the next token. Image