Tweet

Bilawal Sidhu

Mar 25 • 14 tweets • 9 min read

Midjourney v5 has pushed into photorealism, a goal which has eluded the computer graphics industry for decades (!) 🤯

Insane progression, and all that by 11 people with a shared dream.

🧵 Let's explore what these breakthrough in Generative AI mean for 3D & VFX as we know it...

https://twitter.com/227528138/status/1636599811418750976

First off, Midjourney v5 is far more photorealistic out-of-the-box. Where as it's predecessor has a more painterly, stylized bent.

Here's a thorough comparison of v5 vs v4 incase you want to go deeper. But let's keep going...

https://twitter.com/227528138/status/1636599811418750976

@midjourney

Key point is @midjourney crossed the chasm of uncanniness, and is well into photorealistic territory.

And this feeling is resounding amongst professionals. Some might even say it's one for the history books!

@midjourney

@midjourney I mean how can generative AI *not* absolutely disrupt 3D engines like Unreal & Unity, or even Octane & Redshift

Just look at the quality of this Midjourney generation by Linus 🤯

https://twitter.com/3888491/status/1638695005664555008

@midjourney

@midjourney Like, who knew it'd take generative AI to cross the uncanny valley, particularly for digital humans? No sub-surface scattering required!

You've got all you need to realize your ambitious Bollywood dreams:

https://twitter.com/1246511533175136256/status/1636583668025024513

@midjourney

@midjourney Virtual sets? Not a problem. These Midjourney generations easily surpass the quality of an Unreal Engine or Octane render.

I mean, just look at the high frequency detail in the chair, the knitting, the windows -- and good lord (!) the dynamic range is immaculate

https://twitter.com/14404998/status/1637720004195143680

@midjourney

@midjourney Obviously, the meme potential is exceedingly high too :)

Especially given MJ's new approach to prompting which allows us to compose complex scenes with multiple characters.

Fancy a Hogwarts rave circa 1998? No problemo:

https://twitter.com/spacecasetay/status/1638212304683532288

@midjourney

@midjourney Product photography gets a huge boost too. Imagine products before you create them, or fine tune models with actual product photography to stage virtual shoots on demand.

Doing this in the past has required scanning assets, or modelling them from scratch, plus hours in 3D tools:

https://twitter.com/227528138/status/1636382312131878912

@midjourney

@midjourney It's no surprise that Jensen Huang CEO of @nvidia said “Every single pixel will be generated soon. Not rendered: generated”

Obviously, I've been saying this for a hot minute now, especially after playing with ControlNet:

https://twitter.com/227528138/status/1627872696112480257

@midjourney

@midjourney @nvidia In the near term, we will see hybrid approaches that fuse the best of classical 3D + generative AI will reign supreme.

Run a lightweight 3D engine for the first pass, then run a generative filter on top to convert it into AAA quality.

Think NVIDIA's "DLSS" on steroids:

https://twitter.com/bilawalsidhu/status/1632133413833506819

@midjourney

@midjourney @nvidia Explicitly modelling reality seems overrated for visualization. A hybrid approach absolutely crushes it!

E.g. throw in an uncanny Unreal model, and get out a much more photorealistic result. Minor temporal inconsistencies aside (which'll be solved!) the result is beyond Unreal:

https://twitter.com/CoffeeVectors/status/1632825278148870145

@midjourney

@midjourney @nvidia Video is in it's infancy, but clearly the next target. Jon made this short film with a freaking iPhone + Midjourney + @runwayml Gen-1

And it's all filmed in his apartment! This is James Cameron style virtual production ($$$) democratized.

Imagine where we'll be in +6 months...

https://twitter.com/355726664/status/1637205235465863168

@DavidSHolz

Obviously, Midjourney's @DavidSHolz goal has always been to transform the interactive content.

First it'll transform ideation, then asset creation, then 3D engine embellishment, but eventually -- we'll be playing dreams in the cloud 🌥

And I for one, can't wait!

@DavidSHolz

@DavidSHolz That's a wrap! If you enjoyed this deep dive on AI's impact on real-time 3D & offline VFX:
- RT the thread below to share with your audience
- Follow @bilawalsidhu for more creative tech magic
- Sign up to get these sent neatly into your inbox: creativetechnologydigest.substack.com

https://twitter.com/227528138/status/1639688267695112194

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @bilawalsidhu

Bilawal Sidhu

@bilawalsidhu

Mar 23

@ayaanzhaque

If you though reskinning 2D videos was fun, how about reskinning 3D captures of the world?

That's exactly what you get when you combine NeRFs with InstructPix2Pix in this new paper by @ayaanzhaque et al.

Mini-thread🧵

InstructPix2Pix is applied to the input 2D views used for training to NeRF in an iterative fashion.

Notice below that the edits are gradually becoming more consistent over time.

I'm impressed with how well it works!

Of course, it's not just places and objects -- you can totally reskin people to your hearts content.

No makeup or costumes required :)

Read 7 tweets

Bilawal Sidhu

@bilawalsidhu

Mar 21

Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! 🤯

Here's a taste of the power of #AdobeFirefly 🎇 and what sets it apart in the increasingly crowded world of #AI art.

Thread 🧵🎨

For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will:

If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in...

Read 14 tweets

Bilawal Sidhu

@bilawalsidhu

Mar 19

3D capture is moving so fast - I scanned & animated this completely on an iPhone.

Last summer you'd need to wrangle COLMAP, Instant NGP, and FFmpeg to make NeRFs.

Now you can do it all inside Luma AI's mobile app. Capture anything and reframe infinitely in post!

Thread 🧵

https://twitter.com/bilawalsidhu/status/1542713163791171587?s=20

Last summer when NVIDIA's Instant NGP dropped, I went through my entire photogrammetry catalog and re-processed everything. This should give you a teaser for the possibilities of ML-based reality capture:

https://twitter.com/bilawalsidhu/status/1542713163791171587?s=20

https://twitter.com/bilawalsidhu/status/1545898227853135879?s=20

What's cool about NeRFs is that you don't actually need a bucket load of input images to get great results. Case in point ~150 images results in:

https://twitter.com/bilawalsidhu/status/1545898227853135879?s=20

Read 15 tweets

Bilawal Sidhu

@bilawalsidhu

Mar 16

Midjourney v5 is wildly photorealistic out of the box 🎥

It's starting to hit that threshold for cinematic realism, yet many other attributes also got a boost 🤯

Let's compare the new (v5) & old (v4) model using the same prompts, and explore the visual differences.

🧵A thread

v5: when generating volumetric effects - the default v5 result (pictured below) feels *less* like stylized digital art, and *more* like a complex multi-layer composite you might make in after effects or nuke (pun intended).

v4: the older model (pictured below) feels a bit more cartoony in comparison. v4 has a more painterly bent out of the box. Screams digital art made in photoshop or procreate. Still pretty cool, but definitely not photorealistic.

Read 22 tweets

Bilawal Sidhu

@bilawalsidhu

Mar 15

GPT-4 is multi-modal. But why does that matter? 🧵

🖼 Simply put: a picture is worth a thousand words. GPT-4 bridges the communication gap with AI by embracing visual imagery to better express our intent.

TL;DR What ControlNet did for image generation, GPT-4 will do for LLMs...

Conveying intent purely in text form requires an exacting command of the language, but still leaves too much to interpretation.
This was a big problem with text-to-image models. Good for brainstorming. Not so good if you have a specific vision in mind:
creativetechnologydigest.substack.com/p/controlling-…

GPT-4 can "perceive" the contents of imagery, leaving less to interpretation.
You can augment your text query by providing photos, diagrams or even screenshots to better express your intent.
You could create a slide, diagram or UX flow and GPT-4 will understand the context:

Read 5 tweets

Bilawal Sidhu

@bilawalsidhu

Mar 4

🧠 AI experiment comparing #ControlNet and #Gen1. Video goes in ➡ Minecraft comes out.

Results are wild, and it's only a matter of time till this tech runs at 60fps. Then it'll transform 3D and AR.

How soon until we're channel surfing realities layered on top of the world?🧵

First ControlNet. Wow, this tool makes it very easy to get photorealistic results. I used the HED method for this scene and got some amazing results. I used EbSynth for smoother interpolation between ControlNet keyframes. Check out my prior posts for the end-to-end workflow.

Next up Gen 1: impressive is the word. The star of the show is the temporal consistency. Getting photoreal results is harder than ControlNet IMO. #Gen1 is almost its own stylized thing, so I advise leaning into that. But why does it matter - can't we just type text to get video?