Profile picture
Alex J. Champandard @alexjc
, 31 tweets, 17 min read Read on Twitter
Neural approaches to style transfer struggle with certain types of art, e.g. crisp yet smooth brush-strokes 🖋️. It's likely a combination of factors, including using models pre-trained on natural images. 📷

In this thread I'll experiment to learn more! 👇 #neuralimagen #procjam
Ah. Here's what happens visually when the optimization algorithm (L-BFGS) "explodes" 💥. I have not tracked down the cause yet, likely extending #PyTorch to understand parameter ranges would help.

(Left: previous iteration looks OK, Right: pixels go far out of range → clamped)
Example outputs of the Picasso drawing with more bias towards upper layers of the convnet (4x), then equalized, then bias towards lower layers (4x).

Interesting, but nothing like the original. Some insights forming!
Notice how everything is smoothed out even though there's no code to do this explicitly. The patterns look like water erosion on a terrain...
When I don't understand the results, I like to break things down into components. Here are textures synthesized from the Picasso using only layers conv4_1 (8x downsampling total), 3_1 (4x), 2_1 (2x) and 1_1 (1x).

Colors captured best by 1_1, but I'm happily surprised by 3_1.
(All layers together now, with tweaked weights...)

What makes neural style interesting is that it generalizes based on the patterns it sees. Here, the colors it picks are strange blends from the original image, but I love the diversity nonetheless! ✨
First "HD" render at 1024x1024 worked great! Rotated the hue procedurally for an amazing effect... and also completely side stepping the color blending bug I found!

I still need to dig into that issue now though ;-)
Plotting the statistics for the 64 channels of the convolution network (first layer). Each is a "feature" of the image.

Blue is original histogram, orange is reproduced by optimization. The differences don't seem like much, but I'm not sure how that looks visually yet.
The more I push the boundaries of quality, the more I find myself digging into @DeepForger's code. I just `grep` my various experiments on disk for keywords and that's what consistently comes up.

It was ahead of its time: first SaaS for neural style, first with true HD support!
In particular, the code was doing histogram matching upfront on the content and style images to help improve the quality. Now I need to port it to GPU and do it every iteration ;-)…
These are supposedly visualizations of the channels inside the neural network, but I got something terribly wrong. Still looking cool ;-)
Now converting float32 to uint8 and complaining about silent implicit type conversions. It's looking much better!

These are 3 of 64 channels of the convnet, they show features that seem to be a] smooth surfaces, b] very sensitive noise detector (?) c] left-side edges / shadows.
Visualization of 64 channels of the convnet processing the Picasso painting (above). Yellow means the 3x3 filter / neuron responds strongly, blue means it's a weaker response.

It helps provide a sense of what the model can understand! #procjam #neuralimagen
[Q&A]📋 Using synthetic or procedurally generated images in training the convnet would likely help generalization (for unseen styles), but reduce the average quality. The ideal is a completely custom network per style or image pair!
[Q&A]📋 Any annotation you can provide to the algorithm is potentially useful (subject to minor tweaks). You can either use that as the seed for the optimization search, or as a "semantic" constraint like #NeuralDoodle does.
Matching the mean and standard-deviation for each channel doesn't make much difference to the histograms it seems.

The features are clamped to zero with max(0.0, x) so many values are zero. I wonder how histogram normalization would manage...
It seems like histogram matching would work much better on the raw features — before clamping (a.k.a. nonlinearity). From this subset of features, it's also clearer why they can't entirely be captured or fixed by a Gaussian distribution.
Notice how everything becomes washed out now its trying to approximate the histogram matching as a Gaussian distribution? All features become pushed towards the average and it lost its crispness...
Building a reliable histogram matching implementation that works on the GPU is taking longer than I thought — and I haven't even started optimizing yet!

Learning lots though, thanks to property testing. It's a "you'll thank me later" kinda work...
My histograms are matching pretty well on the GPU, now to figure out how to make the images look better than they did before.

**pops hood open**
Thank you, @PiotrZelasko! I feel it's the most fun way to do R&D as well... it started out as "baseline" code but anywhere off the beaten path feel like open research topics.
Mixed results for histogram matching, haven't found the right parameters. Needs an overnight run to explore more? 🌃

From worst to best, matching on: 1) logarithmic scale, 2) linear pre-activation, 3) rectified linear, 4) disabled.
My take-away is that histograms help: variety improves and convergence is easier — at a cost. However, they are not the cause of this problem, nor is histogram matching the solution for these styles of Art.

Other tweaks I made diminished the benefits of this approach. 💡🤔
Next theory is that the real cause is how the problem is specified. Multiple layers each have their own loss and they compete against each other, the optimizer can't find a compromise.

Here, blending weights between conv3_1 and conv1_1. Nice patterns or correct colors, pick one:
[Q&A]📋 I expect to see patches of texture (the bigger the better) that could be reasonably confused with the original Picasso. Right now the results can look OK, but it's a different "style".
OK. I have figured it out—and the results are pretty INSANE. 🤯 (More in-depth analysis below after I celebrate a bit ;-) 🎉 #procjam #NeuralArt

These images are #generative, and it's beyond my expectations:
The source image is the Picasso at the very top of this thread. ☝️

There are clearly image sections that are reproduced from location-independent statistics, but the convnet does a great job of mashing up the elements in new ways—and that's what #NeuralStyle does the best.
[Q&A]📋 The source texture is encoded as multiple levels of statistics: "this pixel feature occurs alongside this other feature" (correlation matrix).

Until now, it was unclear to me whether this representation was sufficient for such complex styles.
[Q&A]📋 Yes, there's an input image at the same resolution, but it's transformed into coordinate-independent statistics and then thrown out.

I don't know how well it does as compression mechanism yet! 🗜️
One insight is the number of iterations: obviously, with more iterations the result looks better. Much more compute than you'd feel comfortable with by default!

Here are results at 160, 80, 40, and 20 iterations at each scale.
There's lots of room for optimization—in particular using encoder/decoder architectures that are now standard in style transfer. But that part is a more mechanical process than figuring out a good loss function that looks good and lets the optimizer find good solutions reliably!
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Alex J. Champandard
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!