I finally got around to playing with @RiversHaveWings's VQGAN+CLIP notebooks!
The first order of business was to try to reproduce @ak92501's beautiful samples. You can see the results of my journey below (seeds=0 and 123456)

1/5
To reasonably create these samples, I attempted to optimize the model by jitting it with TorchScript. After countless wrong attempts, it's finally 5x as fast as the baseline. (If you're using PyTorch, try JIT. You might want to follow my notebook for further optimizations.)

2/5
I also added new features, such as gaussian dropout and noise, which immediately improved the samples.
Below you can see the same prompt with different sample-wide noise (S) and per-item noise (I).

1) S=0.05, I=0.01
2) S=0.25, I=0.10
3) S=0.10, I=0.153
4) S=0.25, I=0.125

3/5
I'm surprised by the incredible diversity of the generated images. Some have hovering houses, and some have people. Others have rain or even plants.
None of this is part of the prompt, and CLIP/VQGAN fully hallucinated all of it, which is remarkable, in my opinion.

4/5
Thank you, @advadnoun, @RiversHaveWings and @ak92501, for making this possible.

It'd be an honour if you'd give my modified notebook a try:
colab.research.google.com/gist/ClashLuke…

5/5
I'm sorry, I accidentally deleted the wrong gist. The link below works. Thank you, @pollinations_ai, for pointing it out.

This might be a good time to say that I added efficient support for multiple prompts, which significantly improves image quality.

colab.research.google.com/gist/ClashLuke…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lucas Nestler

Lucas Nestler Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_clashluke

18 May
This is major breakthrough 👇
We're now using only seq^2 (4Mi) elements for each attention tensor instead of batch*heads*seq^2 (128Gi) for a PanGu-Alpha-200B-sized model, without reducing the performance or ability to scale.
I'll implement it immediately in our GPT codebase and share its performance on 2B-equivalent models.

@Hanxiao_6, is the split across channels necessary? You briefly describe it as "effective". Is that on TPU?
I can't figure out what "small initialization" means.
I finally arrived at 0.02 / context_size, which gives the blue curve (500M body + 400M embedding).
It looks very promising, but still NaNs after just 3000 steps with lr=1e-5. Image
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(