Mishig Profile picture
Apr 13, 2023 3 tweets 3 min read Read on X
Pose 3D mode & render with Stable Diffusion (ControlNet)

(forget about ray-tracing, path-tracing, or any kind of tracing in that respect #FutureIsNow)

try gr.demo: huggingface.co/spaces/diffuse…
technical details: combined @PavelBoytchev mannequin.js @threejs extension into custom @Gradio component -> @huggingface/@diffuserslib implementation of ControlNet

and Voilà !
1K retweets, and I will fkn implement: 3d rigged animation to video rendering

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Mishig

Mishig Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mishig25

Dec 6, 2022
img-to-img

stable diffusion V2 uses depth information to transform an image while preserving the structure of the original image

demo 👇
Read 5 tweets
Dec 5, 2022
14 million text-to-image prompt dataset with their hyperparameters (DiffusionDB dataset from @PoloDataClub)

Quite useful for both research & product at the same time

huggingface.co/datasets/poloc…
Diffusers docs has a great section on schedulers, which is one of the most important hyperparameters of diffusion models
huggingface.co/docs/diffusers… Image
As you can see in the screencast above, Hub dataset viewer (developed by @severo_dev) is absolutely amazing
Read 4 tweets
Sep 27, 2022
1/ On a high level, "textual inversion" is a technique of introducing new "concept" to text2img diffusion models.

In this example, diffusion model learns what this specific "<cat-toy>" is (1st img), and when prompted with "<cat-toy> in NYC", produces a coherent result (2nd img) ImageImage
2/ Technically, it is a process of:
I. add one more additional token, let's call it tkn99, to model's vocab
II. freeze all weights, except tkn99's embeddings
III. run training by supplying a few example imgs with tkn99

Find scripts & more desc at: huggingface.co/docs/diffusers…
3/ Intuitively, it is finding a point in a high dimensional embedding space (most modern ones have rank in the order of 100s) that will nudge the model produce imgs with tkn99 concept.

It is called "concept" because abstract things like style can be represented Image
Read 7 tweets
Apr 25, 2022
How do language models (like BERT or GPT) "see" words?

TLDR: whereas we see 𝚆𝚎̄𝚕𝚌𝚘́𝚖𝚎̂ 𝚝𝚘́ 𝚝𝚑𝚎̈ 🤗 𝚃𝚘̂𝚔𝚎́𝚗𝚒̄𝚣𝚎̄𝚛𝚜, language models see [𝟷0𝟷, 𝟼𝟷𝟼0, 𝟸000, 𝟷𝟿𝟿𝟼, 𝟷00, 𝟷𝟿𝟸0𝟺, 𝟷𝟽𝟼𝟸𝟿, 𝟸0𝟷𝟻, 𝟷0𝟸]
🧵 on Tokenization by examples
1/
2/ NLP Tokenization steps are ↳ 𝚗𝚘𝚛𝚖𝚊𝚕𝚒𝚣𝚊𝚝𝚒𝚘𝚗 ➜ 𝚙𝚛𝚎-𝚝𝚘𝚔𝚎𝚗𝚒𝚣𝚊𝚝𝚒𝚘𝚗 ➜ 𝚖𝚘𝚍𝚎𝚕 ➜ 𝚙𝚘𝚜𝚝-𝚙𝚛𝚘𝚌𝚎𝚜𝚜𝚒𝚗𝚐.

Together, they are called a "tokenization pipeline"
huggingface.co/docs/tokenizer…
3/ 𝚗𝚘𝚛𝚖𝚊𝚕𝚒𝚣𝚊𝚝𝚒𝚘𝚗:
𝚆𝚎̄𝚕𝚌𝚘́𝚖𝚎̂ 𝚝𝚘́ 𝚝𝚑𝚎̈ 🤗 𝚃𝚘̂𝚔𝚎́𝚗𝚒̄𝚣𝚎̄𝚛𝚜 ➜ 𝚆𝚎𝚕𝚌𝚘𝚖𝚎 𝚝𝚘 𝚝𝚑𝚎 🤗 𝚃𝚘𝚔𝚎𝚗𝚒𝚣𝚎𝚛𝚜
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(