Tweet

Lucas Nestler

15 Jul, 6 tweets, 4 min read

@RiversHaveWings

I finally got around to playing with @RiversHaveWings's VQGAN+CLIP notebooks!
The first order of business was to try to reproduce @ak92501's beautiful samples. You can see the results of my journey below (seeds=0 and 123456)

1/5

https://twitter.com/ak92501/status/1414020174357934086

To reasonably create these samples, I attempted to optimize the model by jitting it with TorchScript. After countless wrong attempts, it's finally 5x as fast as the baseline. (If you're using PyTorch, try JIT. You might want to follow my notebook for further optimizations.)

2/5

I also added new features, such as gaussian dropout and noise, which immediately improved the samples.
Below you can see the same prompt with different sample-wide noise (S) and per-item noise (I).

1) S=0.05, I=0.01
2) S=0.25, I=0.10
3) S=0.10, I=0.153
4) S=0.25, I=0.125

3/5

I'm surprised by the incredible diversity of the generated images. Some have hovering houses, and some have people. Others have rain or even plants.
None of this is part of the prompt, and CLIP/VQGAN fully hallucinated all of it, which is remarkable, in my opinion.

4/5

@advadnoun

Thank you, @advadnoun, @RiversHaveWings and @ak92501, for making this possible.

It'd be an honour if you'd give my modified notebook a try:
colab.research.google.com/gist/ClashLuke…

5/5

@pollinations_ai

I'm sorry, I accidentally deleted the wrong gist. The link below works. Thank you, @pollinations_ai, for pointing it out.

This might be a good time to say that I added efficient support for multiple prompts, which significantly improves image quality.

colab.research.google.com/gist/ClashLuke…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @_clashluke

Lucas Nestler

@_clashluke

18 May

https://twitter.com/Hanxiao_6/status/1394742841033641985

This is major breakthrough 👇
We're now using only seq^2 (4Mi) elements for each attention tensor instead of batch*heads*seq^2 (128Gi) for a PanGu-Alpha-200B-sized model, without reducing the performance or ability to scale.

https://twitter.com/Hanxiao_6/status/1394742841033641985

@Hanxiao_6

I'll implement it immediately in our GPT codebase and share its performance on 2B-equivalent models.

@Hanxiao_6, is the split across channels necessary? You briefly describe it as "effective". Is that on TPU?

I can't figure out what "small initialization" means.
I finally arrived at 0.02 / context_size, which gives the blue curve (500M body + 400M embedding).
It looks very promising, but still NaNs after just 3000 steps with lr=1e-5.

Read 16 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Lucas Nestler

Try unrolling a thread yourself!

More from @_clashluke

Lucas Nestler

Did Thread Reader help you today?

Like this author's thread?