With so many different labs rushing to research and deploy this kind of technology, this will quickly turn into a race for more efficiency as different providers compete on costs too.
The paper is a bit evasive on the dataset (LAION?) — I presume for legal reasons. But the good news is that it's "only" 1B text/image pairs... although they are highly filtered.
IMHO there's much more room to improve quality with the current datasets.
Note that in the first tweet, the "Trending On ArtStation" prompt engineering hack that's equivalent for photos is "4K DSLR"!
It still can't get fingers right though! (eDiffi is on the right, DALLE-2 in the middle, Stable on left.)
"The Right To Read Is The Right To Mine" was a campaign from ~2012-2015 to convince the public & legislators that machines should bypass copyright for data-mining.
IMHO we're at the next stage of this campaign, now for generative systems — should they act outside copyright?
Articles like this one are at the tail end of the first pro-mining campaign and precursors to this new generative campaign?
It tries to establish that "reading by robots doesn’t count" and "infringement is for humans only".
If you're working at a generative company, and worried about the lawsuit against GitHub for their generative model, please take some comfort in the fact that I think they made *many* missteps — with either a serious lack of due care, or the intent to break the law.
For instance, Google announced they had a similar code model and they didn't release it. They used it internally & measured a 6% improvement on productivity while they understand the legal and ethical implications.
(Could also be that Google wanted to see others get sued first?)
I will compile my best advice for companies who, understandably, want to continue their work in a promising/competitive field, but also don't want to spend all their money on lawyers!
Reading through the GitHub CoPilot litigation submitted; although it was pulled off quickly — it's a solid piece of work!
My assessment is that the defendants, GitHub, Microsoft and OpenAI are in a very bad position... githubcopilotlitigation.com
The documents show how Codex and CoPilot act like databases; they have three different examples of JS code that is recited verbatim — with mistakes — from licensed sources.
Including this debug code below isPrime(n):
The documents then proceeds to cast doubt on the claim of FairUse, that even if it was applicable here, it wouldn't help circumvent (a) the breach of contract, (b) the privacy issues, and (c) the DMCA.
You know how hands & fingers are particularly difficult to generate?
Wouldn't it be funny if people having important conversations online (in the near future) used hand gestures in front of their faces, so both sides know it's not a #DeepFake.
OpenAI engineers probably did this a few months ago, now frantically trying to make sure their Python sandboxed environments are sufficiently safe...
Thread predicting this is the best next direction for LLMs and why it's important (e.g. you don't need to retrain models with new information, just use an API for DB access):
Prompt-To-Prompt editing allows you to easily change your input text without needing to completely regenerating the image. This makes it much easier to control the diffusion!
Example from bloc97's GitHub, four seasons of the same scene:
Prompt-To-Prompt falls into the category of UX improvements of stable diffusion, and speed of iteration is major competitive factor.
Platforms able to deliver speed, e.g. by caching temporary data about the generation (not just the random seed) have a big advantage! [1/3]