Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! 🤯
Here's a taste of the power of #AdobeFirefly 🎇 and what sets it apart in the increasingly crowded world of #AI art.
Thread 🧵🎨
For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will:
If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in...
Adobe is using a diffusion-based model (not GigaGAN as many of us suspected!), so needless to say you can get some pretty photorealistic results.
Adobe's trained this model using Adobe Stock, which means the provenance of the data is rock solid.
Adobe can't afford to alienate creators, so they have *not* trained models on Behance imagery yet, despite it being a treasure trove 💎
Will these moves woo AI art naysayers? 🤔
Firefly you can also generate text effects!
Pick a font, type in some text, describe your style and voila - a new logo for my creator brand.
I can totally see how this will be super useful inside photoshop or illustrator. No more complex layer effects to wrangle :)
Adobe's Firefly UX is unique in that you can provide a prompt (which describes the contents of your scene), and then you can augment it with bunch of parameters like style, color and tone, lighting and composition. This makes it super easy to iterate:
So let's say I like the the overall result, but I'm looking for a different camera angle, a slightly different aesthetic (e.g. low lighting, shot from below, cool tone). You can really dial in a look easily without futzing around with prompts. Pretty nice!
Stylized not your jam, and want to go back to a photorealistic result? As easy as clicking a button, and bam:
"Robot that toasts your bread and applies butter to it, in the style of rick and morty" produced some impressive results in Firefly:
You're probably wondering how hands look? Pretty coherent!
Even with a prompt like this:
Punjabi man in flannel shirt using AI voice dictation to create the client pitch deck while drinking espresso a cozy cabin, while wearing an Oculus VR headset, with a laptop on the table
@ericsnowden made an awesome analogy about ingredients and taking decades of Adobe tech combined with these newer models to make amazing recipes. And I have to say, the dishes do look good! Case in point:
Adobe will be expanding access gradually -- so it won't exactly be a free-for-all. During the beta period, there are some noteworthy limitations worth being aware of -- critically commercial use is not allowed.
So what do you think of Adobe's entry? Share your thoughts below.
That's a wrap! If you enjoyed this deep dive on Adobe Firefly (adobe.com/firefly):
- RTing the thread below to share with your audience
- Follow @bilawalsidhu to stay tuned for more creative tech magic
- Subscribe to get these right to your inbox: creativetechnologydigest.substack.com
3D capture is moving so fast - I scanned & animated this completely on an iPhone.
Last summer you'd need to wrangle COLMAP, Instant NGP, and FFmpeg to make NeRFs.
Now you can do it all inside Luma AI's mobile app. Capture anything and reframe infinitely in post!
Thread 🧵
Last summer when NVIDIA's Instant NGP dropped, I went through my entire photogrammetry catalog and re-processed everything. This should give you a teaser for the possibilities of ML-based reality capture:
Midjourney v5 is wildly photorealistic out of the box 🎥
It's starting to hit that threshold for cinematic realism, yet many other attributes also got a boost 🤯
Let's compare the new (v5) & old (v4) model using the same prompts, and explore the visual differences.
🧵A thread
v5: when generating volumetric effects - the default v5 result (pictured below) feels *less* like stylized digital art, and *more* like a complex multi-layer composite you might make in after effects or nuke (pun intended).
v4: the older model (pictured below) feels a bit more cartoony in comparison. v4 has a more painterly bent out of the box. Screams digital art made in photoshop or procreate. Still pretty cool, but definitely not photorealistic.
🖼 Simply put: a picture is worth a thousand words. GPT-4 bridges the communication gap with AI by embracing visual imagery to better express our intent.
TL;DR What ControlNet did for image generation, GPT-4 will do for LLMs...
Conveying intent purely in text form requires an exacting command of the language, but still leaves too much to interpretation.
This was a big problem with text-to-image models. Good for brainstorming. Not so good if you have a specific vision in mind: creativetechnologydigest.substack.com/p/controlling-…
GPT-4 can "perceive" the contents of imagery, leaving less to interpretation.
You can augment your text query by providing photos, diagrams or even screenshots to better express your intent.
You could create a slide, diagram or UX flow and GPT-4 will understand the context:
🧠 AI experiment comparing #ControlNet and #Gen1. Video goes in ➡ Minecraft comes out.
Results are wild, and it's only a matter of time till this tech runs at 60fps. Then it'll transform 3D and AR.
How soon until we're channel surfing realities layered on top of the world?🧵
First ControlNet. Wow, this tool makes it very easy to get photorealistic results. I used the HED method for this scene and got some amazing results. I used EbSynth for smoother interpolation between ControlNet keyframes. Check out my prior posts for the end-to-end workflow.
Next up Gen 1: impressive is the word. The star of the show is the temporal consistency. Getting photoreal results is harder than ControlNet IMO. #Gen1 is almost its own stylized thing, so I advise leaning into that. But why does it matter - can't we just type text to get video?
Before/after of Corridor's latest AI video is wild. They shot video on greenscreen, made virtual sets in Unreal, then reskinned it to anime by finetuning Stable Diffusion. Net result? 120 VFX shots done by a team of 3 on a dime. Bravo! This is a milestone in creative technology🧵
⚙ Corridor basically made an open source video2anime workflow to pull off this video. Key tools they used:
- Stable Diffusion model + DreamBooth fine-tuning
- Unreal Engine + asset store 3D models
- Img2Img + DeFlickering effect
- Heaps of gold ol' fashioned VFX compositing
Now let's deconstruct their creation workflow: 1. Train a model to replicate a specific style 2. Train a model to know a character 🔄 3. Run green screen video through img2img 4. Reduce flicker with Deflicker plugin 5. Add 3D elements in Unreal 5 6. Final VFX comp/edit in Resolve
Multi ControlNet is a game changer for making an open source video2video pipeline. I spent some time hacking this NeRF2Depth2Image workflow using a combination of ControlNet methods + SD 1.5 + EbSynth.
🧵 Full breakdown of my workflow & detailed tips shared in the thread below ⬇
Here's an overview workflow we're going to deconstruct! At a high level:
Capture video (used my iPhone) ➡️ Train NeRF (used Luma AI) ➡️ Animate & Render RGB + Depth ➡️ Multi-Control Net (Depth + HED) ➡️ EbSynth ➡️ Blending & Compositing. Now let's break it down step by step:
For the input, I wanted to see if I can exploit the crispy depth maps you can get out of a Neural Radiance Field (NeRF) 3D scan.
- Left: 3D flythrough rendered from a NeRF (iPhone video ➡️ trained w/ Luma AI)
- Right: The corresponding depth map (notice the immaculate detail!)