Been hands-on with the beta of Adobe's cutting-edge Generative AI tool, and I'm impressed! π€―
Here's a taste of the power of #AdobeFirefly π and what sets it apart in the increasingly crowded world of #AI art.
Thread π§΅π¨
For starters, Adobe Firefly isn't one thing. It encompasses multiple AI models. It's a portal for testing new capabilities with creators, and eventually graduating them into products like Photoshop & Premiere that creators know and love. Meeting users where they are, if you will:
If you've used any text-to-image product (e.g. Stable Diffusion or DALL-E) At first glance, Adobe Firefly will be immediately familiar.
But there's a few unique takes in Adobe's product experience.
Let's dig in...
Adobe is using a diffusion-based model (not GigaGAN as many of us suspected!), so needless to say you can get some pretty photorealistic results.
Adobe's trained this model using Adobe Stock, which means the provenance of the data is rock solid.
Adobe can't afford to alienate creators, so they have *not* trained models on Behance imagery yet, despite it being a treasure trove π
Will these moves woo AI art naysayers? π€
Firefly you can also generate text effects!
Pick a font, type in some text, describe your style and voila - a new logo for my creator brand.
I can totally see how this will be super useful inside photoshop or illustrator. No more complex layer effects to wrangle :)
Adobe's Firefly UX is unique in that you can provide a prompt (which describes the contents of your scene), and then you can augment it with bunch of parameters like style, color and tone, lighting and composition. This makes it super easy to iterate:
So let's say I like the the overall result, but I'm looking for a different camera angle, a slightly different aesthetic (e.g. low lighting, shot from below, cool tone). You can really dial in a look easily without futzing around with prompts. Pretty nice!
Stylized not your jam, and want to go back to a photorealistic result? As easy as clicking a button, and bam:
"Robot that toasts your bread and applies butter to it, in the style of rick and morty" produced some impressive results in Firefly:
You're probably wondering how hands look? Pretty coherent!
Even with a prompt like this:
Punjabi man in flannel shirt using AI voice dictation to create the client pitch deck while drinking espresso a cozy cabin, while wearing an Oculus VR headset, with a laptop on the table
@ericsnowden made an awesome analogy about ingredients and taking decades of Adobe tech combined with these newer models to make amazing recipes. And I have to say, the dishes do look good! Case in point:
Adobe will be expanding access gradually -- so it won't exactly be a free-for-all. During the beta period, there are some noteworthy limitations worth being aware of -- critically commercial use is not allowed.
So what do you think of Adobe's entry? Share your thoughts below.
That's a wrap! If you enjoyed this deep dive on Adobe Firefly (adobe.com/firefly):
- RTing the thread below to share with your audience
- Follow @bilawalsidhu to stay tuned for more creative tech magic
- Subscribe to get these right to your inbox: creativetechnologydigest.substack.com
BREAKING: Google just dropped Veo 2 and Imagen 3 -- their next gen video and image generation models.
Turns out Google's been closing the gap quietly -- not just on LLMs, but on visual creation too.
Hereβs everything you need to know w/o the hype π§΅
1/ First, let's get the Veo 2 updates out of the way:
β’ Up to 4K resolution (woot!)
β’ Increased detail & realism
β’ Improved human movement & expressions
β’ Better physics modeling & temporal coherence
On Meta's Movie Gen Bench, Veo holds it down against top video models:
2/ Veo 2 now speaks cinematographer. Instead of wrestling w/ technical params or guessing how Gemini captioned stuff, you can just say what you want using terms you're used to. Legit useful for production workflows.
E.g. here's a prompt to generate a classic car chase scene:
BREAKING: Here are the coolest things Google announced today; got the press briefing yesterday and here's my favorites w/o the hype.
TL;DR Gemini 2.0 brings multimodal creation, research agents, browser control, and massive compute upgrades. Plus dope research.
𧡠Let's dive in
1/ Let's talk Gemini 2.0 Flash:
β’ 2x faster than 1.5 Pro while outperforming it on key benchmarks
β’ Native tool use (Search + custom functions)
β’ New Multimodal Live API for realtime audio/video streaming w/ smart interrupt detection
β’ Available today; more model sizes in Jan
2/ Gemini 2.0 *finally* gets native multimodal output
β’ Can generate images + text combined naturally
β’ Steerable text-to-speech in multiple languages/accents
β’ Alas early access only for now; wider rollout in Jan
Viggle's doing some cool stuff with it: ai.google.dev/showcase/viggle
Heads up! Mosaic dropped a pretty wild dataset of 1.26 million 360Β° images of Prague π€―
If you're a researcher, creator or developer into 3D/AI/Geo, I think you're gonna wanna play with this
Here's the scoop on this 15 TERAPIXEL dataset & the crazy things you can do with it π§΅
The specs are nuts:
β’ 210,469 panos in 13K
β’ 1,262,814 source images (6 x 12MP)
β’ 1 image every meter
β’ 2cm pose accuracy
Not quite Google level, but the pano density is WAY higher. An image every meter means it's perfect for all sorts of spatial 3D stuff.
This dataset is global shutter too. This beast was captured with the 6-lens Mosaic X camera. This thing is built for serious real world data collection.
And it captured Pragueβs architecture, streetscapes, and urban environment in incredible detail.
With Gaussian Splatting you get 3D editing support! So you can select, move, and delete stuff; apply shader fx. This type of editing has been tedious to do with NeRFs and their implicit black box representations.
Case in point (1/3) by @hybridherbst:
Case in point (2/3): repurpose your point cloud shaders to make something unreal like @Ruben_Fro