Note: these prompts are heavily optimized for MidJourney, so that may give it an unfair advantage. However, I did try a few variations but still couldn't get better results. I'm not a prompt ninja, so your mileage may vary.
Still, I'm grateful for Adobe's early beta access! /🧵
Note 2: Firefly is only trained on Adobe Stock and fully licensed images. The data curation is very conservative, which may cripple its performance.
I also included examples without copyrighted characters in the thread.
Note 3: Adobe research scientist @vdeschaintre has a good point: it may be a significant plus for companies who must ensure the IP copyright of the output image. They may be more than willing to sacrifice quality for legality, which makes MJ a less appealing option.
Thanks for all your feedback. I wrote a summary note to give Firefly's approach fair and proper credits:
Reading @MetaAI's Segment-Anything, and I believe today is one of the "GPT-3 moments" in computer vision. It has learned the *general* concept of what an "object" is, even for unknown objects, unfamiliar scenes (e.g. underwater & cell microscopy), and ambiguous cases.
Team: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
Ross (@inkynumbers) was the inventor of Fast R-CNN 7 years ago, which kickstarted CNN-based image segmentation. He co-invented Faster R-CNN and Mask R-CNN. All these years of deep research culminated in Segment-Anything.
HuggingGPT is the most interesting paper I read this week. It gets very close to the "Everything App" vision that I described a while ago.
ChatGPT acts as a controller over the *AI model space*, picks the right model (app) given the human specification, and assembles them… twitter.com/i/web/status/1…
Vision of an "Everything App" based on ChatGPT app store:
First, GPT-4 acknowledges Judea's statement and humbly admits that there are limitations. Then it gives a bullet list of 5 points.
>>>
1) Pretraining data contains many examples of causality and counterfactuals, so GPT's answer may simply interpolate human's judgement in similar scenarios. 2) Inductive reasoning: making educated guesses based on common sense.
Enough with LLMs - exciting things are happening in the world of atoms.
This is Stanford ALOHA, a low-cost and agile robot platform. The whole system is open-source (!!): hardware design, CAD models for 3D printing, simulator, and training code. Time to trossenrobotics.com/aloha.aspx… twitter.com/i/web/status/1…
If you want to learn more, here's the original tweet from Tony @tonyzzhao:
LangChain supports OpenAI's API, as well as other providers and models. If @StabilityAI succeeds in reproducing an open GPT-4, then LangChain will become a truly end-to-end Android!
Can't wait.
I cross-post on Linkedin if you find Twitter thread a bit hard to navigate. Welcome you to follow me there as well!
10x engineer is a myth. 100x AI-powered engineer is more real than ever. As OpenAI winds down Codex, Microsoft announces GitHub Copilot X. I think it's almost as exciting as GPT-4 itself:
- Copilot Chat: any piece of text database will be "chattable", and codebase is no… twitter.com/i/web/status/1…