Bilawal Sidhu Profile picture
Mar 21 14 tweets 8 min read
Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! 🤯

Here's a taste of the power of #AdobeFirefly 🎇 and what sets it apart in the increasingly crowded world of #AI art.

Thread 🧵🎨 ImageImage
For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will: In my beta access I was abl...
If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in... Image
Adobe is using a diffusion-based model (not GigaGAN as many of us suspected!), so needless to say you can get some pretty photorealistic results. ImageImage
Adobe's trained this model using Adobe Stock, which means the provenance of the data is rock solid.

Adobe can't afford to alienate creators, so they have *not* trained models on Behance imagery yet, despite it being a treasure trove 💎

Will these moves woo AI art naysayers? 🤔 Image
Firefly you can also generate text effects!
Pick a font, type in some text, describe your style and voila - a new logo for my creator brand.
I can totally see how this will be super useful inside photoshop or illustrator. No more complex layer effects to wrangle :) Image
Adobe's Firefly UX is unique in that you can provide a prompt (which describes the contents of your scene), and then you can augment it with bunch of parameters like style, color and tone, lighting and composition. This makes it super easy to iterate: Image
So let's say I like the the overall result, but I'm looking for a different camera angle, a slightly different aesthetic (e.g. low lighting, shot from below, cool tone). You can really dial in a look easily without futzing around with prompts. Pretty nice! ImageImage
Stylized not your jam, and want to go back to a photorealistic result? As easy as clicking a button, and bam: Image
"Robot that toasts your bread and applies butter to it, in the style of rick and morty" produced some impressive results in Firefly: ImageImageImageImage
You're probably wondering how hands look? Pretty coherent!

Even with a prompt like this:

Punjabi man in flannel shirt using AI voice dictation to create the client pitch deck while drinking espresso a cozy cabin, while wearing an Oculus VR headset, with a laptop on the table Image
@ericsnowden made an awesome analogy about ingredients and taking decades of Adobe tech combined with these newer models to make amazing recipes. And I have to say, the dishes do look good! Case in point: ImageImage
Adobe will be expanding access gradually -- so it won't exactly be a free-for-all. During the beta period, there are some noteworthy limitations worth being aware of -- critically commercial use is not allowed.

So what do you think of Adobe's entry? Share your thoughts below. Image
That's a wrap! If you enjoyed this deep dive on Adobe Firefly (adobe.com/firefly):
- RTing the thread below to share with your audience
- Follow @bilawalsidhu to stay tuned for more creative tech magic
- Subscribe to get these right to your inbox: creativetechnologydigest.substack.com

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bilawal Sidhu

Bilawal Sidhu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @bilawalsidhu

Mar 19
3D capture is moving so fast - I scanned & animated this completely on an iPhone.

Last summer you'd need to wrangle COLMAP, Instant NGP, and FFmpeg to make NeRFs.

Now you can do it all inside Luma AI's mobile app. Capture anything and reframe infinitely in post!

Thread 🧵
Last summer when NVIDIA's Instant NGP dropped, I went through my entire photogrammetry catalog and re-processed everything. This should give you a teaser for the possibilities of ML-based reality capture:
What's cool about NeRFs is that you don't actually need a bucket load of input images to get great results. Case in point ~150 images results in:
Read 15 tweets
Mar 16
Midjourney v5 is wildly photorealistic out of the box 🎥

It's starting to hit that threshold for cinematic realism, yet many other attributes also got a boost 🤯

Let's compare the new (v5) & old (v4) model using the same prompts, and explore the visual differences.

🧵A thread Image
v5: when generating volumetric effects - the default v5 result (pictured below) feels *less* like stylized digital art, and *more* like a complex multi-layer composite you might make in after effects or nuke (pun intended). Image
v4: the older model (pictured below) feels a bit more cartoony in comparison. v4 has a more painterly bent out of the box. Screams digital art made in photoshop or procreate. Still pretty cool, but definitely not photorealistic. Image
Read 22 tweets
Mar 15
GPT-4 is multi-modal. But why does that matter? 🧵

🖼 Simply put: a picture is worth a thousand words. GPT-4 bridges the communication gap with AI by embracing visual imagery to better express our intent.

TL;DR What ControlNet did for image generation, GPT-4 will do for LLMs...
Conveying intent purely in text form requires an exacting command of the language, but still leaves too much to interpretation.
This was a big problem with text-to-image models. Good for brainstorming. Not so good if you have a specific vision in mind:
creativetechnologydigest.substack.com/p/controlling-…
GPT-4 can "perceive" the contents of imagery, leaving less to interpretation.
You can augment your text query by providing photos, diagrams or even screenshots to better express your intent.
You could create a slide, diagram or UX flow and GPT-4 will understand the context:
Read 5 tweets
Mar 4
🧠 AI experiment comparing #ControlNet and #Gen1. Video goes in ➡ Minecraft comes out.

Results are wild, and it's only a matter of time till this tech runs at 60fps. Then it'll transform 3D and AR.

How soon until we're channel surfing realities layered on top of the world?🧵
First ControlNet. Wow, this tool makes it very easy to get photorealistic results. I used the HED method for this scene and got some amazing results. I used EbSynth for smoother interpolation between ControlNet keyframes. Check out my prior posts for the end-to-end workflow.
Next up Gen 1: impressive is the word. The star of the show is the temporal consistency. Getting photoreal results is harder than ControlNet IMO. #Gen1 is almost its own stylized thing, so I advise leaning into that. But why does it matter - can't we just type text to get video?
Read 8 tweets
Mar 1
Before/after of Corridor's latest AI video is wild. They shot video on greenscreen, made virtual sets in Unreal, then reskinned it to anime by finetuning Stable Diffusion. Net result? 120 VFX shots done by a team of 3 on a dime. Bravo! This is a milestone in creative technology🧵
⚙ Corridor basically made an open source video2anime workflow to pull off this video. Key tools they used:
- Stable Diffusion model + DreamBooth fine-tuning
- Unreal Engine + asset store 3D models
- Img2Img + DeFlickering effect
- Heaps of gold ol' fashioned VFX compositing
Now let's deconstruct their creation workflow:
1. Train a model to replicate a specific style
2. Train a model to know a character 🔄
3. Run green screen video through img2img
4. Reduce flicker with Deflicker plugin
5. Add 3D elements in Unreal 5
6. Final VFX comp/edit in Resolve
Read 6 tweets
Feb 25
Multi ControlNet is a game changer for making an open source video2video pipeline. I spent some time hacking this NeRF2Depth2Image workflow using a combination of ControlNet methods + SD 1.5 + EbSynth.
🧵 Full breakdown of my workflow & detailed tips shared in the thread below ⬇
Here's an overview workflow we're going to deconstruct! At a high level:
Capture video (used my iPhone) ➡️ Train NeRF (used Luma AI) ➡️ Animate & Render RGB + Depth ➡️ Multi-Control Net (Depth + HED) ➡️ EbSynth ➡️ Blending & Compositing. Now let's break it down step by step:
For the input, I wanted to see if I can exploit the crispy depth maps you can get out of a Neural Radiance Field (NeRF) 3D scan.
- Left: 3D flythrough rendered from a NeRF (iPhone video ➡️ trained w/ Luma AI)
- Right: The corresponding depth map (notice the immaculate detail!)
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(