Diffusers docs has a great section on schedulers, which is one of the most important hyperparameters of diffusion models huggingface.co/docs/diffusers…
As you can see in the screencast above, Hub dataset viewer (developed by @severo_dev) is absolutely amazing
1/ On a high level, "textual inversion" is a technique of introducing new "concept" to text2img diffusion models.
In this example, diffusion model learns what this specific "<cat-toy>" is (1st img), and when prompted with "<cat-toy> in NYC", produces a coherent result (2nd img)
2/ Technically, it is a process of:
I. add one more additional token, let's call it tkn99, to model's vocab
II. freeze all weights, except tkn99's embeddings
III. run training by supplying a few example imgs with tkn99
3/ Intuitively, it is finding a point in a high dimensional embedding space (most modern ones have rank in the order of 100s) that will nudge the model produce imgs with tkn99 concept.
It is called "concept" because abstract things like style can be represented
How do language models (like BERT or GPT) "see" words?
TLDR: whereas we see 𝚆𝚎̄𝚕𝚌𝚘́𝚖𝚎̂ 𝚝𝚘́ 𝚝𝚑𝚎̈ 🤗 𝚃𝚘̂𝚔𝚎́𝚗𝚒̄𝚣𝚎̄𝚛𝚜, language models see [𝟷0𝟷, 𝟼𝟷𝟼0, 𝟸000, 𝟷𝟿𝟿𝟼, 𝟷00, 𝟷𝟿𝟸0𝟺, 𝟷𝟽𝟼𝟸𝟿, 𝟸0𝟷𝟻, 𝟷0𝟸]
🧵 on Tokenization by examples
1/