@midjourney Product photography gets a huge boost too. Imagine products before you create them, or fine tune models with actual product photography to stage virtual shoots on demand.
Doing this in the past has required scanning assets, or modelling them from scratch, plus hours in 3D tools:
@midjourney@nvidia Explicitly modelling reality seems overrated for visualization. A hybrid approach absolutely crushes it!
E.g. throw in an uncanny Unreal model, and get out a much more photorealistic result. Minor temporal inconsistencies aside (which'll be solved!) the result is beyond Unreal:
@midjourney@nvidia Video is in it's infancy, but clearly the next target. Jon made this short film with a freaking iPhone + Midjourney + @runwayml Gen-1
And it's all filmed in his apartment! This is James Cameron style virtual production ($$$) democratized.
Obviously, Midjourney's @DavidSHolz goal has always been to transform the interactive content.
First it'll transform ideation, then asset creation, then 3D engine embellishment, but eventually -- we'll be playing dreams in the cloud 🌥
And I for one, can't wait!
@DavidSHolz That's a wrap! If you enjoyed this deep dive on AI's impact on real-time 3D & offline VFX:
- RT the thread below to share with your audience
- Follow @bilawalsidhu for more creative tech magic
- Sign up to get these sent neatly into your inbox: creativetechnologydigest.substack.com
Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! 🤯
Here's a taste of the power of #AdobeFirefly 🎇 and what sets it apart in the increasingly crowded world of #AI art.
Thread 🧵🎨
For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will:
If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in...
3D capture is moving so fast - I scanned & animated this completely on an iPhone.
Last summer you'd need to wrangle COLMAP, Instant NGP, and FFmpeg to make NeRFs.
Now you can do it all inside Luma AI's mobile app. Capture anything and reframe infinitely in post!
Thread 🧵
Last summer when NVIDIA's Instant NGP dropped, I went through my entire photogrammetry catalog and re-processed everything. This should give you a teaser for the possibilities of ML-based reality capture:
Midjourney v5 is wildly photorealistic out of the box 🎥
It's starting to hit that threshold for cinematic realism, yet many other attributes also got a boost 🤯
Let's compare the new (v5) & old (v4) model using the same prompts, and explore the visual differences.
🧵A thread
v5: when generating volumetric effects - the default v5 result (pictured below) feels *less* like stylized digital art, and *more* like a complex multi-layer composite you might make in after effects or nuke (pun intended).
v4: the older model (pictured below) feels a bit more cartoony in comparison. v4 has a more painterly bent out of the box. Screams digital art made in photoshop or procreate. Still pretty cool, but definitely not photorealistic.
🖼 Simply put: a picture is worth a thousand words. GPT-4 bridges the communication gap with AI by embracing visual imagery to better express our intent.
TL;DR What ControlNet did for image generation, GPT-4 will do for LLMs...
Conveying intent purely in text form requires an exacting command of the language, but still leaves too much to interpretation.
This was a big problem with text-to-image models. Good for brainstorming. Not so good if you have a specific vision in mind: creativetechnologydigest.substack.com/p/controlling-…
GPT-4 can "perceive" the contents of imagery, leaving less to interpretation.
You can augment your text query by providing photos, diagrams or even screenshots to better express your intent.
You could create a slide, diagram or UX flow and GPT-4 will understand the context:
🧠 AI experiment comparing #ControlNet and #Gen1. Video goes in ➡ Minecraft comes out.
Results are wild, and it's only a matter of time till this tech runs at 60fps. Then it'll transform 3D and AR.
How soon until we're channel surfing realities layered on top of the world?🧵
First ControlNet. Wow, this tool makes it very easy to get photorealistic results. I used the HED method for this scene and got some amazing results. I used EbSynth for smoother interpolation between ControlNet keyframes. Check out my prior posts for the end-to-end workflow.
Next up Gen 1: impressive is the word. The star of the show is the temporal consistency. Getting photoreal results is harder than ControlNet IMO. #Gen1 is almost its own stylized thing, so I advise leaning into that. But why does it matter - can't we just type text to get video?