Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Bilawal Sidhu

@bilawalsidhu

Mar 21, 2023 • 14 tweets • 8 min read • Read on X

Scrolly

Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! 🤯

Here's a taste of the power of #AdobeFirefly 🎇 and what sets it apart in the increasingly crowded world of #AI art.

Thread 🧵🎨

For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will:

If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in...

Adobe is using a diffusion-based model (not GigaGAN as many of us suspected!), so needless to say you can get some pretty photorealistic results.

Adobe's trained this model using Adobe Stock, which means the provenance of the data is rock solid.

Adobe can't afford to alienate creators, so they have *not* trained models on Behance imagery yet, despite it being a treasure trove 💎

Will these moves woo AI art naysayers? 🤔

Firefly you can also generate text effects!
Pick a font, type in some text, describe your style and voila - a new logo for my creator brand.
I can totally see how this will be super useful inside photoshop or illustrator. No more complex layer effects to wrangle :)

Adobe's Firefly UX is unique in that you can provide a prompt (which describes the contents of your scene), and then you can augment it with bunch of parameters like style, color and tone, lighting and composition. This makes it super easy to iterate:

So let's say I like the the overall result, but I'm looking for a different camera angle, a slightly different aesthetic (e.g. low lighting, shot from below, cool tone). You can really dial in a look easily without futzing around with prompts. Pretty nice!

Stylized not your jam, and want to go back to a photorealistic result? As easy as clicking a button, and bam:

"Robot that toasts your bread and applies butter to it, in the style of rick and morty" produced some impressive results in Firefly:

You're probably wondering how hands look? Pretty coherent!

Even with a prompt like this:

Punjabi man in flannel shirt using AI voice dictation to create the client pitch deck while drinking espresso a cozy cabin, while wearing an Oculus VR headset, with a laptop on the table

@ericsnowden

@ericsnowden made an awesome analogy about ingredients and taking decades of Adobe tech combined with these newer models to make amazing recipes. And I have to say, the dishes do look good! Case in point:

Adobe will be expanding access gradually -- so it won't exactly be a free-for-all. During the beta period, there are some noteworthy limitations worth being aware of -- critically commercial use is not allowed.

So what do you think of Adobe's entry? Share your thoughts below.

@bilawalsidhu

That's a wrap! If you enjoyed this deep dive on Adobe Firefly (adobe.com/firefly):
- RTing the thread below to share with your audience
- Follow @bilawalsidhu to stay tuned for more creative tech magic
- Subscribe to get these right to your inbox: creativetechnologydigest.substack.com

https://twitter.com/227528138/status/1638172373244411906

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @bilawalsidhu

Bilawal Sidhu

@bilawalsidhu

Aug 9

Damn it worked! Genie 3 world --> inpaint UI --> 4x topaz AI upscale --> train 3d gaussian splat

You can step inside a painting of Socrates from 1787. Better than any image-to-3d model I've seen.

I think Google has stumbled upon the killer app for VR -- the literal holodeck.

Also tried running this genie 3 creation through a classical photogrammetry pipeline to extract a textured 3d mesh that you can import into any 3d tool.

The radiance field obviously degrades more gracefully than the mesh, but it's still pretty useable!

https://x.com/bilawalsidhu/status/1954266832862429356?s=46

Since people are asking, I’ve uploaded both the gaussian splat and 3d mesh here — works in browser, enjoy:

https://x.com/bilawalsidhu/status/1954266832862429356?s=46

Read 4 tweets

Bilawal Sidhu

@bilawalsidhu

Mar 16

This is wild. Google's new Gemini model turns complex effects authoring into simple text prompts.

Technical barriers gone - just describe what you want. Entire ComfyUI workflows now collapsed into simple prompts.

5 workflows you should try for free in Google AI studio.

1/5 These filters you define via prompt can also play with the temperature value to change the degree of faithfulness vs. abstraction you want.

Here I turned myself into a roblox and minecraft character.

2/5 genuinely impressive how good this model is at separating all the entities in this scene. Gemini's showcasing some deep scene understanding here -- even going beyond what multi-controlnet img2img workflows offer us. Though I suspect this could work well in concert too.

Read 9 tweets

Bilawal Sidhu

@bilawalsidhu

Dec 16, 2024

BREAKING: Google just dropped Veo 2 and Imagen 3 -- their next gen video and image generation models.

Turns out Google's been closing the gap quietly -- not just on LLMs, but on visual creation too.

Here’s everything you need to know w/o the hype 🧵

1/ First, let's get the Veo 2 updates out of the way:

• Up to 4K resolution (woot!)
• Increased detail & realism
• Improved human movement & expressions
• Better physics modeling & temporal coherence

On Meta's Movie Gen Bench, Veo holds it down against top video models:

2/ Veo 2 now speaks cinematographer. Instead of wrestling w/ technical params or guessing how Gemini captioned stuff, you can just say what you want using terms you're used to. Legit useful for production workflows.

E.g. here's a prompt to generate a classic car chase scene:

Read 8 tweets

Bilawal Sidhu

@bilawalsidhu

Dec 11, 2024

BREAKING: Here are the coolest things Google announced today; got the press briefing yesterday and here's my favorites w/o the hype.

TL;DR Gemini 2.0 brings multimodal creation, research agents, browser control, and massive compute upgrades. Plus dope research.

🧵 Let's dive in

1/ Let's talk Gemini 2.0 Flash:
• 2x faster than 1.5 Pro while outperforming it on key benchmarks
• Native tool use (Search + custom functions)
• New Multimodal Live API for realtime audio/video streaming w/ smart interrupt detection
• Available today; more model sizes in Jan

2/ Gemini 2.0 *finally* gets native multimodal output
• Can generate images + text combined naturally
• Steerable text-to-speech in multiple languages/accents
• Alas early access only for now; wider rollout in Jan
Viggle's doing some cool stuff with it: ai.google.dev/showcase/viggle

Read 9 tweets

Bilawal Sidhu

@bilawalsidhu

Oct 17, 2024

Heads up! Mosaic dropped a pretty wild dataset of 1.26 million 360° images of Prague 🤯

If you're a researcher, creator or developer into 3D/AI/Geo, I think you're gonna wanna play with this

Here's the scoop on this 15 TERAPIXEL dataset & the crazy things you can do with it 🧵

The specs are nuts:

• 210,469 panos in 13K
• 1,262,814 source images (6 x 12MP)
• 1 image every meter
• 2cm pose accuracy

Not quite Google level, but the pano density is WAY higher. An image every meter means it's perfect for all sorts of spatial 3D stuff.

This dataset is global shutter too. This beast was captured with the 6-lens Mosaic X camera. This thing is built for serious real world data collection.

And it captured Prague’s architecture, streetscapes, and urban environment in incredible detail.

Read 14 tweets

Bilawal Sidhu

@bilawalsidhu

Feb 16, 2024

OpenAI just dropped their Sora research paper.

As expected, the video-to-video results are flipping spectacular 🪄

A few other gems:

Another superpower unlocked is the ability to seamlessly blend individual videos together.

Note how the drone transforms into a butterfly as gradually find ourselves underwater

Connecting videos is a surprisingly powerful primitive.

Example: Drone POV shots of Jeeps are cool, but how about blending it with another clip of a cheetah?

End result: your jeep is now being chased by the cheetah, and giving me Harold & Kumar vibes 😂