Follow @alexjc

12,399 views

Alex J. Champandard

Follow @alexjc

, 19 tweets, 9 min read

My Authors

Next in our literature survey in Texture Synthesis, a personal favorite and under-rated paper by Li et Wand. 💥

An illustrated review & tutorial in a thread! 👇

📝 Combining Markov Random Fields & Convolutional Neural Networks for Image Synthesis
🔗 arxiv.org/abs/1601.04589 #ai

https://twitter.com/alexjc/status/1261235716295712768

https://twitter.com/alexjc/status/1261235716295712768

Here's the nomenclature I'm using.

✏️ Beginner-friendly insight or exercise.
🕳️ Related work that's relevant here!
📖 Open research topic of general interest.
💡 Idea or experiment to explore further...

See this thread for context and other reviews:

https://twitter.com/alexjc/status/1261235716295712768

https://twitter.com/alexjc/status/1261583810665332737

https://twitter.com/alexjc/status/1261583810665332737

🕳️ The paper of Li & Wand is inspired by Gatys' work from 2015. It explores a different way (sometimes better) to use deep convolution networks to generate images...

https://twitter.com/alexjc/status/1261583810665332737

https://twitter.com/alexjc/status/1261597480606711808

https://twitter.com/alexjc/status/1261597480606711808

Remember that the work of Gatys' et al extracts "deep features" from images that look something like this:

https://twitter.com/alexjc/status/1261597480606711808

https://twitter.com/alexjc/status/1261607515558752256

https://twitter.com/alexjc/status/1261607515558752256

Then, those features are reduced down into smaller gram matrices that act like the fingerprint of a texture:

https://twitter.com/alexjc/status/1261607515558752256

✏️ This is called a parametric approach to texture synthesis.

The problem, however, is that it doesn't look very good as soon as you have interesting textures. It can end up looking like a spaghetti mess where crisp edges get lost.

Original (1x) Generated (2x)

Instead, Li & Wand use an approach known as Markov Random Fields — which in this case means matching 3x3 patches of "deep features" for each layer.

Each iteration, every patch of the generated image gets matched with its closest in the source texture.

It can work very well: 👀

The matching of 3x3 patches is done in a brute-force manner in the paper, so it's basically a large matrix multiplication that computes the full similarity matrix between all patches.

✏️ As far as deep learning frameworks go, this code (matmult) "should" be simplest to write.

One down side, however, is that it's not easy to control the outcome and patches may end up being copied too many times!

Here it looks like a dog was sitting on the grass too long: 🐕

💡 There are a bunch of small tricks to inject enough variety:
- normalize features
- inject some noise

But most papers that cite Li & Wand don't showcase the work in good light, and end up with results like this instead:

📖 It's a shame really, because the parametric+neural approaches don't reach anywhere near the photo-realistic quality of this algorithm. I consider this to be not only an open avenue of research — but a very promising one.

Besides this, the algorithm operates very much like Gatys et al, using iterative optimization that incrementally refines an image starting from random noise.

It looks like this:

Li & Wand also reintroduced a coarse-to-fine optimization procedure, so the procedure starts off at scale 1:16, then 1:8, then 1:4 and 1:2, finally 1:1.

(This is an old idea in texture synthesis, and you can still find it almost everywhere now.)

Since I find these visualizations so cool, here's another one — because why not ;-)

This one shows the L-BFGS optimizer blow up and recover. The CUDA implementation of L-BFGS is faster but apparently less reliable (WIP):

@DeepForger

@DeepForger

The paper has more examples, but it doesn't sell the idea as well as it could. See the @DeepForger output (Feb 2016 onward) for an idea what this class of algorithm generally can do (incl. custom tweaks): twitter.com/deepforger/lik…

https://twitter.com/alexjc/status/808040534421995520

https://twitter.com/alexjc/status/808040534421995520

For completeness, I previously reviewed this paper here. I was already very impressed:

https://twitter.com/alexjc/status/808040534421995520

The original algorithm is designed for style transfer, so typically multiple losses/criteria being satisfied at the same time. When you remove one criteria (e.g. no matching content image), the textures become more repetitive.

Some example texture renders at 640x400:

Finally, here's what the core loop of the algorithm looks like: matching all patches of one image A with another. This code splits up the patches of B so it fits into memory easier... github.com/photogeniq/neu…

https://twitter.com/maciejhalber/status/1273665602763190272

https://twitter.com/maciejhalber/status/1273665602763190272

🎙️[Q&A] Li & Wand use the same approach as Gatys. You start with a random image then repeat: feed it forward through a convnet (VGG), find the best patches at chosen layers, compute the error, back-propagate gradients, update the image, then repeat.

https://twitter.com/maciejhalber/status/1273665602763190272

Try unrolling a thread yourself!

Related hashtags

More from @alexjc see all

Embed code for your website

Did Thread Reader help you today?