🧠 AI experiment comparing #ControlNet and #Gen1. Video goes in ➡ Minecraft comes out.
Results are wild, and it's only a matter of time till this tech runs at 60fps. Then it'll transform 3D and AR.
How soon until we're channel surfing realities layered on top of the world?🧵
First ControlNet. Wow, this tool makes it very easy to get photorealistic results. I used the HED method for this scene and got some amazing results. I used EbSynth for smoother interpolation between ControlNet keyframes. Check out my prior posts for the end-to-end workflow.
Next up Gen 1: impressive is the word. The star of the show is the temporal consistency. Getting photoreal results is harder than ControlNet IMO. #Gen1 is almost its own stylized thing, so I advise leaning into that. But why does it matter - can't we just type text to get video?
Text prompts are cool, but control over the details is crucial for artists. These new AI tools turn regular photos/videos into an expressive form of performance capture. Record characters/scenes with your phone and use it to guide the generation. My buddy Don shows how it's done:
Of course, the input media can also be *synthetically* generated. Go from a blocked out 3D scene to final render in record time. Control the details you care about (e.g. blocking), and let AI help you with the rest (e.g. texturing). I cover use cases here: creativetechnologydigest.substack.com/p/depth2image-…
See where I'm going when I say gen AI is going to disrupt 3D rendering? You could be running a lightweight 3D engine in a browser, slap on a generative filter, and transform it into AAA game engine quality. No massive team required. And it's not limited to gaming either:
It's wild how fast things are moving in generative AI. Here's my video2minecraft results from just a few months ago, which look dated with all these new approaches to tame the chaotic diffusion process:
And that's a wrap! I like sharing my workflows openly with the AI & creator community, so if enjoyed this thread I'd appreciate it if you: 1. RT the thread below 2. Follow @bilawalsidhu for more 3. Subscribe to get some visual umami right to your inbox: creativetechnologydigest.substack.com
Before/after of Corridor's latest AI video is wild. They shot video on greenscreen, made virtual sets in Unreal, then reskinned it to anime by finetuning Stable Diffusion. Net result? 120 VFX shots done by a team of 3 on a dime. Bravo! This is a milestone in creative technology🧵
⚙ Corridor basically made an open source video2anime workflow to pull off this video. Key tools they used:
- Stable Diffusion model + DreamBooth fine-tuning
- Unreal Engine + asset store 3D models
- Img2Img + DeFlickering effect
- Heaps of gold ol' fashioned VFX compositing
Now let's deconstruct their creation workflow: 1. Train a model to replicate a specific style 2. Train a model to know a character 🔄 3. Run green screen video through img2img 4. Reduce flicker with Deflicker plugin 5. Add 3D elements in Unreal 5 6. Final VFX comp/edit in Resolve
Multi ControlNet is a game changer for making an open source video2video pipeline. I spent some time hacking this NeRF2Depth2Image workflow using a combination of ControlNet methods + SD 1.5 + EbSynth.
🧵 Full breakdown of my workflow & detailed tips shared in the thread below ⬇
Here's an overview workflow we're going to deconstruct! At a high level:
Capture video (used my iPhone) ➡️ Train NeRF (used Luma AI) ➡️ Animate & Render RGB + Depth ➡️ Multi-Control Net (Depth + HED) ➡️ EbSynth ➡️ Blending & Compositing. Now let's break it down step by step:
For the input, I wanted to see if I can exploit the crispy depth maps you can get out of a Neural Radiance Field (NeRF) 3D scan.
- Left: 3D flythrough rendered from a NeRF (iPhone video ➡️ trained w/ Luma AI)
- Right: The corresponding depth map (notice the immaculate detail!)