Read on Twitter

Alex J. Champandard @alexjc

, 31 tweets, 17 min read Read on Twitter

Neural approaches to style transfer struggle with certain types of art, e.g. crisp yet smooth brush-strokes 🖋️. It's likely a combination of factors, including using models pre-trained on natural images. 📷

In this thread I'll experiment to learn more! 👇 #neuralimagen #procjam

Ah. Here's what happens visually when the optimization algorithm (L-BFGS) "explodes" 💥. I have not tracked down the cause yet, likely extending #PyTorch to understand parameter ranges would help.

(Left: previous iteration looks OK, Right: pixels go far out of range → clamped)

Example outputs of the Picasso drawing with more bias towards upper layers of the convnet (4x), then equalized, then bias towards lower layers (4x).

Interesting, but nothing like the original. Some insights forming!

Notice how everything is smoothed out even though there's no code to do this explicitly. The patterns look like water erosion on a terrain...

When I don't understand the results, I like to break things down into components. Here are textures synthesized from the Picasso using only layers conv4_1 (8x downsampling total), 3_1 (4x), 2_1 (2x) and 1_1 (1x).

Colors captured best by 1_1, but I'm happily surprised by 3_1.

(All layers together now, with tweaked weights...)

What makes neural style interesting is that it generalizes based on the patterns it sees. Here, the colors it picks are strange blends from the original image, but I love the diversity nonetheless! ✨

First "HD" render at 1024x1024 worked great! Rotated the hue procedurally for an amazing effect... and also completely side stepping the color blending bug I found!

I still need to dig into that issue now though ;-)

Plotting the statistics for the 64 channels of the convolution network (first layer). Each is a "feature" of the image.

Blue is original histogram, orange is reproduced by optimization. The differences don't seem like much, but I'm not sure how that looks visually yet.

@DeepForger

@DeepForger

The more I push the boundaries of quality, the more I find myself digging into @DeepForger's code. I just `grep` my various experiments on disk for keywords and that's what consistently comes up.

It was ahead of its time: first SaaS for neural style, first with true HD support!

In particular, the code was doing histogram matching upfront on the content and style images to help improve the quality. Now I need to port it to GPU and do it every iteration ;-) en.wikipedia.org/wiki/Histogram…

These are supposedly visualizations of the channels inside the neural network, but I got something terribly wrong. Still looking cool ;-)

Now converting float32 to uint8 and complaining about silent implicit type conversions. It's looking much better!

These are 3 of 64 channels of the convnet, they show features that seem to be a] smooth surfaces, b] very sensitive noise detector (?) c] left-side edges / shadows.

Visualization of 64 channels of the convnet processing the Picasso painting (above). Yellow means the 3x3 filter / neuron responds strongly, blue means it's a weaker response.

It helps provide a sense of what the model can understand! #procjam #neuralimagen

https://twitter.com/zoombapup/status/1054468641146761216

https://twitter.com/zoombapup/status/1054468641146761216

[Q&A]📋 Using synthetic or procedurally generated images in training the convnet would likely help generalization (for unseen styles), but reduce the average quality. The ideal is a completely custom network per style or image pair!

https://twitter.com/zoombapup/status/1054468641146761216

https://twitter.com/orangecinnamonr/status/1054477791100633089

https://twitter.com/orangecinnamonr/status/1054477791100633089

[Q&A]📋 Any annotation you can provide to the algorithm is potentially useful (subject to minor tweaks). You can either use that as the seed for the optimization search, or as a "semantic" constraint like #NeuralDoodle does.

https://twitter.com/orangecinnamonr/status/1054477791100633089

Matching the mean and standard-deviation for each channel doesn't make much difference to the histograms it seems.

The features are clamped to zero with max(0.0, x) so many values are zero. I wonder how histogram normalization would manage...

It seems like histogram matching would work much better on the raw features — before clamping (a.k.a. nonlinearity). From this subset of features, it's also clearer why they can't entirely be captured or fixed by a Gaussian distribution.

Notice how everything becomes washed out now its trying to approximate the histogram matching as a Gaussian distribution? All features become pushed towards the average and it lost its crispness...

Building a reliable histogram matching implementation that works on the GPU is taking longer than I thought — and I haven't even started optimizing yet!

Learning lots though, thanks to property testing. It's a "you'll thank me later" kinda work...

My histograms are matching pretty well on the GPU, now to figure out how to make the images look better than they did before.

**pops hood open**

@PiotrZelasko

@PiotrZelasko

Thank you, @PiotrZelasko! I feel it's the most fun way to do R&D as well... it started out as "baseline" code but anywhere off the beaten path feel like open research topics.

https://twitter.com/PiotrZelasko/status/1054812105600319494

Mixed results for histogram matching, haven't found the right parameters. Needs an overnight run to explore more? 🌃

From worst to best, matching on: 1) logarithmic scale, 2) linear pre-activation, 3) rectified linear, 4) disabled.

My take-away is that histograms help: variety improves and convergence is easier — at a cost. However, they are not the cause of this problem, nor is histogram matching the solution for these styles of Art.

Other tweaks I made diminished the benefits of this approach. 💡🤔

Next theory is that the real cause is how the problem is specified. Multiple layers each have their own loss and they compete against each other, the optimizer can't find a compromise.

Here, blending weights between conv3_1 and conv1_1. Nice patterns or correct colors, pick one:

https://twitter.com/nueluno/status/1055055792049655808

https://twitter.com/nueluno/status/1055055792049655808

[Q&A]📋 I expect to see patches of texture (the bigger the better) that could be reasonably confused with the original Picasso. Right now the results can look OK, but it's a different "style".

https://twitter.com/nueluno/status/1055055792049655808

OK. I have figured it out—and the results are pretty INSANE. 🤯 (More in-depth analysis below after I celebrate a bit ;-) 🎉 #procjam #NeuralArt

These images are #generative, and it's beyond my expectations:

The source image is the Picasso at the very top of this thread. ☝️

There are clearly image sections that are reproduced from location-independent statistics, but the convnet does a great job of mashing up the elements in new ways—and that's what #NeuralStyle does the best.

https://twitter.com/vahlamorgulis/status/1055137414715453440

https://twitter.com/vahlamorgulis/status/1055137414715453440

[Q&A]📋 The source texture is encoded as multiple levels of statistics: "this pixel feature occurs alongside this other feature" (correlation matrix).

Until now, it was unclear to me whether this representation was sufficient for such complex styles.

https://twitter.com/vahlamorgulis/status/1055137414715453440

https://twitter.com/Ramperkash/status/1055137829087596544

https://twitter.com/Ramperkash/status/1055137829087596544

[Q&A]📋 Yes, there's an input image at the same resolution, but it's transformed into coordinate-independent statistics and then thrown out.

I don't know how well it does as compression mechanism yet! 🗜️

https://twitter.com/Ramperkash/status/1055137829087596544

One insight is the number of iterations: obviously, with more iterations the result looks better. Much more compute than you'd feel comfortable with by default!

Here are results at 160, 80, 40, and 20 iterations at each scale.

There's lots of room for optimization—in particular using encoder/decoder architectures that are now standard in style transfer. But that part is a more mechanical process than figuring out a good loss function that looks good and lets the optimizer find good solutions reliably!

Like this thread? Get email updates or save it to PDF!

Subscribe to Alex J. Champandard

This content may be removed anytime!

Try unrolling a thread yourself!

Trending hashtags

Like this thread? Get email updates or save it to PDF!

Subscribe to Alex J. Champandard

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @alexjc see all

Related threads

Trending hashtags

Did Thread Reader help you today?