Thread by @justLV on Thread Reader App

I wanted to imagine how we’d better use #stablediffusion for video content / AR.

A major obstacle, why most videos are so flickery, is lack of temporal & viewing angle consistency, so I experimented with an approach to fix this

See 🧵 for process & examples

Ideally you want to learn a single representation of an object across time or different viewing directions to perform a *single* #img2img generation on.

For this I used layered-neural-atlases.github.io (2021)

This learns an "atlas" to represent an object and its background across the video.

Regularization losses during training help preserve the original shape, with a result that resembles a usable slightly "unwrapped" version of the object

The authors of the paper recommend using Mask R-CNN for creating a segmentation mask before training, but for this I found it easier (and cleaner) to just create a mask with the Rotobrush in After Effects

Once the "atlas" was learned I could then run it through #depth2img, then use the new atlas to reproject across the video.

This last remapping part is quick so you could imagine it being rendered live based on your viewing angle for #AR (for a pre-generated scene)

Here are some more out there takes, including turning my couch into a jumping castle! 🏰🎈

There are endless possibilities here for content creation. Follow for more creative AI experiments!

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll