Tweet

Jim Fan

@DrJimFan

Apr 3 • 12 tweets • 8 min read Twitter logo

Scrolly

Just got access to Adobe Firefly! How does the world's leading creative tool maker fare against MidJourney, a self-funded 11-person team?

Let's check it out. Left is Firefly and right is MidJourney V5. Prompt in "ALT" button on lower-left corner.

Deadpool posing on a car. 1/🧵

@LinusEkenstam

Super Mario in a dim lit street with a big reflection in a puddle. Firefly's interpretation of "Super Mario" is ... exotic (?) 😅

Prompt and image credits to @LinusEkenstam @vitomotiv.

2/

@LinusEkenstam

Same prompt as above but for Pikachu. Again, somehow Firefly does not fully get these famous characters. Maybe a training data copyright issue?

Prompt and MJ image credits to @LinusEkenstam @vitomotiv.

3/

@nickfloats

Next, who is the better portrait photographer?

Photo of a large crowd of commuters in Tokyo, sharply focused faces, but it's the woman in red that commands your attention. Warm glow, elegance.

Prompt & MJ image credit: @nickfloats

4/

@chetbff

How about some sci-fi?

Abstract fractal circular mosaic city architecture.

Prompt & MJ image credit: @chetbff @BambuuArt

5/

@followmarcos

Now let's do some mobile app icon design. Does Firefly even know what an app icon is?

iOS app icon, Sci-fi planet landscape with skeuomorphic style.

Prompt & MJ image credit: @followmarcos

6/

@nickfloats

The "human finger" test is becoming the new visual Turing Test. It's the final moat that Diffusion needs to conquer to become truly sentient 🤣.

A stunning young Jamaican woman wearing white retrofuturistic sequin Gucci gown, standing in the desert.

Credit: @nickfloats

7/

@heyBarsee

Finally, a landscape photo. It turns out to be an easy task that both Firefly and MJ excel.

Red Ferrari F40 in Dandelions at the Lake Seealpsee.

Prompt & MJ image credit: @heyBarsee

8/

Note: these prompts are heavily optimized for MidJourney, so that may give it an unfair advantage. However, I did try a few variations but still couldn't get better results. I'm not a prompt ninja, so your mileage may vary.

Still, I'm grateful for Adobe's early beta access! /🧵

Note 2: Firefly is only trained on Adobe Stock and fully licensed images. The data curation is very conservative, which may cripple its performance.

I also included examples without copyrighted characters in the thread.

@vdeschaintre

Note 3: Adobe research scientist @vdeschaintre has a good point: it may be a significant plus for companies who must ensure the IP copyright of the output image. They may be more than willing to sacrifice quality for legality, which makes MJ a less appealing option.

https://twitter.com/DrJimFan/status/1642967379939573761?s=20

Thanks for all your feedback. I wrote a summary note to give Firefly's approach fair and proper credits:

https://twitter.com/DrJimFan/status/1642967379939573761?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @DrJimFan

Jim Fan

@DrJimFan

Apr 5

@MetaAI

Reading @MetaAI's Segment-Anything, and I believe today is one of the "GPT-3 moments" in computer vision. It has learned the *general* concept of what an "object" is, even for unknown objects, unfamiliar scenes (e.g. underwater & cell microscopy), and ambiguous cases.

I still… twitter.com/i/web/status/1…

@MetaAI

Team: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

Website: segment-anything.com

Original announcement from @MetaAI

https://twitter.com/MetaAI/status/1643599800414380038?s=20

@inkynumbers

Ross (@inkynumbers) was the inventor of Fast R-CNN 7 years ago, which kickstarted CNN-based image segmentation. He co-invented Faster R-CNN and Mask R-CNN. All these years of deep research culminated in Segment-Anything.

I have so much respect for Ross and his team.

Read 4 tweets

Jim Fan

@DrJimFan

Apr 2

twitter.com/i/web/status/1…

HuggingGPT is the most interesting paper I read this week. It gets very close to the "Everything App" vision that I described a while ago.

ChatGPT acts as a controller over the *AI model space*, picks the right model (app) given the human specification, and assembles them… twitter.com/i/web/status/1…

https://twitter.com/DrJimFan/status/1639309454415106048?s=20

Vision of an "Everything App" based on ChatGPT app store:

https://twitter.com/DrJimFan/status/1639309454415106048?s=20

https://twitter.com/DrJimFan/status/1633179734803890177?s=20

Prismer, my team's open-source multimodal LLM:

https://twitter.com/DrJimFan/status/1633179734803890177?s=20

Read 4 tweets

Jim Fan

@DrJimFan

Mar 29

twitter.com/i/web/status/1…

In 2018, Turing Prize Laureate Judea Pearl said that “all achievements of deep learning amount to just “curve fitting” and will never learn causality.

Yet GPT seems quite capable of reasoning about “why” (cause & effect) and “what if” (counterfactual imagination).

Why?

Going… twitter.com/i/web/status/1…

First, GPT-4 acknowledges Judea's statement and humbly admits that there are limitations. Then it gives a bullet list of 5 points.

>>>

1) Pretraining data contains many examples of causality and counterfactuals, so GPT's answer may simply interpolate human's judgement in similar scenarios.
2) Inductive reasoning: making educated guesses based on common sense.

>>>

Read 5 tweets

Jim Fan

@DrJimFan

Mar 27

twitter.com/i/web/status/1…

Enough with LLMs - exciting things are happening in the world of atoms.

This is Stanford ALOHA, a low-cost and agile robot platform. The whole system is open-source (!!): hardware design, CAD models for 3D printing, simulator, and training code. Time to trossenrobotics.com/aloha.aspx… twitter.com/i/web/status/1…

@tonyzzhao

If you want to learn more, here's the original tweet from Tony @tonyzzhao:

https://twitter.com/tonyzzhao/status/1640393026341322754?s=20

@chelseabfinn

Technical thread from advisor, Prof. Chelsea Finn: @chelseabfinn

https://twitter.com/chelseabfinn/status/1640399373761929216

Read 4 tweets

Jim Fan

@DrJimFan

Mar 23

twitter.com/i/web/status/1…

OpenAI just announced ChatGPT Plugins. If ChatGPT's debut was the "iPhone event", today is the "iOS App Store" event.

3 official plugins available now:
- Web browser: adding Bing in the loop
- Code interpreter: adding a live Python interpreter in a github.com/hwchase17/lang…… twitter.com/i/web/status/1…

@StabilityAI

ChatGPT's Retrieval Plugin is open-source, check it out: github.com/openai/chatgpt…

LangChain supports OpenAI's API, as well as other providers and models. If @StabilityAI succeeds in reproducing an open GPT-4, then LangChain will become a truly end-to-end Android!

Can't wait.

I cross-post on Linkedin if you find Twitter thread a bit hard to navigate. Welcome you to follow me there as well!

linkedin.com/in/linxifan/

Read 4 tweets

Jim Fan

@DrJimFan

Mar 22

twitter.com/i/web/status/1…

10x engineer is a myth. 100x AI-powered engineer is more real than ever. As OpenAI winds down Codex, Microsoft announces GitHub Copilot X. I think it's almost as exciting as GPT-4 itself:

- Copilot Chat: any piece of text database will be "chattable", and codebase is no… twitter.com/i/web/status/1…