Bilawal Sidhu Profile picture
Mar 25 14 tweets 9 min read
Midjourney v5 has pushed into photorealism, a goal which has eluded the computer graphics industry for decades (!) 🤯

Insane progression, and all that by 11 people with a shared dream.

🧵 Let's explore what these breakthrough in Generative AI mean for 3D & VFX as we know it...
First off, Midjourney v5 is far more photorealistic out-of-the-box. Where as it's predecessor has a more painterly, stylized bent.

Here's a thorough comparison of v5 vs v4 incase you want to go deeper. But let's keep going...
Key point is @midjourney crossed the chasm of uncanniness, and is well into photorealistic territory.

And this feeling is resounding amongst professionals. Some might even say it's one for the history books!
@midjourney I mean how can generative AI *not* absolutely disrupt 3D engines like Unreal & Unity, or even Octane & Redshift

Just look at the quality of this Midjourney generation by Linus 🤯
@midjourney Like, who knew it'd take generative AI to cross the uncanny valley, particularly for digital humans? No sub-surface scattering required!

You've got all you need to realize your ambitious Bollywood dreams:
@midjourney Virtual sets? Not a problem. These Midjourney generations easily surpass the quality of an Unreal Engine or Octane render.

I mean, just look at the high frequency detail in the chair, the knitting, the windows -- and good lord (!) the dynamic range is immaculate
@midjourney Obviously, the meme potential is exceedingly high too :)

Especially given MJ's new approach to prompting which allows us to compose complex scenes with multiple characters.

Fancy a Hogwarts rave circa 1998? No problemo:
@midjourney Product photography gets a huge boost too. Imagine products before you create them, or fine tune models with actual product photography to stage virtual shoots on demand.

Doing this in the past has required scanning assets, or modelling them from scratch, plus hours in 3D tools:
@midjourney It's no surprise that Jensen Huang CEO of @nvidia said “Every single pixel will be generated soon. Not rendered: generated”

Obviously, I've been saying this for a hot minute now, especially after playing with ControlNet:
@midjourney @nvidia In the near term, we will see hybrid approaches that fuse the best of classical 3D + generative AI will reign supreme.

Run a lightweight 3D engine for the first pass, then run a generative filter on top to convert it into AAA quality.

Think NVIDIA's "DLSS" on steroids:
@midjourney @nvidia Explicitly modelling reality seems overrated for visualization. A hybrid approach absolutely crushes it!

E.g. throw in an uncanny Unreal model, and get out a much more photorealistic result. Minor temporal inconsistencies aside (which'll be solved!) the result is beyond Unreal:
@midjourney @nvidia Video is in it's infancy, but clearly the next target. Jon made this short film with a freaking iPhone + Midjourney + @runwayml Gen-1

And it's all filmed in his apartment! This is James Cameron style virtual production ($$$) democratized.

Imagine where we'll be in +6 months...
Obviously, Midjourney's @DavidSHolz goal has always been to transform the interactive content.

First it'll transform ideation, then asset creation, then 3D engine embellishment, but eventually -- we'll be playing dreams in the cloud 🌥

And I for one, can't wait!
@DavidSHolz That's a wrap! If you enjoyed this deep dive on AI's impact on real-time 3D & offline VFX:
- RT the thread below to share with your audience
- Follow @bilawalsidhu for more creative tech magic
- Sign up to get these sent neatly into your inbox: creativetechnologydigest.substack.com

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bilawal Sidhu

Bilawal Sidhu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @bilawalsidhu

Mar 23
If you though reskinning 2D videos was fun, how about reskinning 3D captures of the world?

That's exactly what you get when you combine NeRFs with InstructPix2Pix in this new paper by @ayaanzhaque et al.

Mini-thread🧵
InstructPix2Pix is applied to the input 2D views used for training to NeRF in an iterative fashion.

Notice below that the edits are gradually becoming more consistent over time.

I'm impressed with how well it works!
Of course, it's not just places and objects -- you can totally reskin people to your hearts content.

No makeup or costumes required :)
Read 7 tweets
Mar 21
Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! 🤯

Here's a taste of the power of #AdobeFirefly 🎇 and what sets it apart in the increasingly crowded world of #AI art.

Thread 🧵🎨 ImageImage
For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will: In my beta access I was abl...
If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in... Image
Read 14 tweets
Mar 19
3D capture is moving so fast - I scanned & animated this completely on an iPhone.

Last summer you'd need to wrangle COLMAP, Instant NGP, and FFmpeg to make NeRFs.

Now you can do it all inside Luma AI's mobile app. Capture anything and reframe infinitely in post!

Thread 🧵
Last summer when NVIDIA's Instant NGP dropped, I went through my entire photogrammetry catalog and re-processed everything. This should give you a teaser for the possibilities of ML-based reality capture:
What's cool about NeRFs is that you don't actually need a bucket load of input images to get great results. Case in point ~150 images results in:
Read 15 tweets
Mar 16
Midjourney v5 is wildly photorealistic out of the box 🎥

It's starting to hit that threshold for cinematic realism, yet many other attributes also got a boost 🤯

Let's compare the new (v5) & old (v4) model using the same prompts, and explore the visual differences.

🧵A thread Image
v5: when generating volumetric effects - the default v5 result (pictured below) feels *less* like stylized digital art, and *more* like a complex multi-layer composite you might make in after effects or nuke (pun intended). Image
v4: the older model (pictured below) feels a bit more cartoony in comparison. v4 has a more painterly bent out of the box. Screams digital art made in photoshop or procreate. Still pretty cool, but definitely not photorealistic. Image
Read 22 tweets
Mar 15
GPT-4 is multi-modal. But why does that matter? 🧵

🖼 Simply put: a picture is worth a thousand words. GPT-4 bridges the communication gap with AI by embracing visual imagery to better express our intent.

TL;DR What ControlNet did for image generation, GPT-4 will do for LLMs...
Conveying intent purely in text form requires an exacting command of the language, but still leaves too much to interpretation.
This was a big problem with text-to-image models. Good for brainstorming. Not so good if you have a specific vision in mind:
creativetechnologydigest.substack.com/p/controlling-…
GPT-4 can "perceive" the contents of imagery, leaving less to interpretation.
You can augment your text query by providing photos, diagrams or even screenshots to better express your intent.
You could create a slide, diagram or UX flow and GPT-4 will understand the context:
Read 5 tweets
Mar 4
🧠 AI experiment comparing #ControlNet and #Gen1. Video goes in ➡ Minecraft comes out.

Results are wild, and it's only a matter of time till this tech runs at 60fps. Then it'll transform 3D and AR.

How soon until we're channel surfing realities layered on top of the world?🧵
First ControlNet. Wow, this tool makes it very easy to get photorealistic results. I used the HED method for this scene and got some amazing results. I used EbSynth for smoother interpolation between ControlNet keyframes. Check out my prior posts for the end-to-end workflow.
Next up Gen 1: impressive is the word. The star of the show is the temporal consistency. Getting photoreal results is harder than ControlNet IMO. #Gen1 is almost its own stylized thing, so I advise leaning into that. But why does it matter - can't we just type text to get video?
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(