This current period of prompt engineering is going to be short lived. The majority are not here for the new niche of prompt design. People use these tools because there are things they want to see and experience in high fidelity, right now.
The minute someone can provide a better experience than:
with truly excellent results, people will come running.
What then can AI devs do to provide a more consistent, high-quality experience and reduce prompting complexity for people engaging with these AIs?
1. Few Shot Learning (fine-tuning) — already being embraced by creators like @rainisto and @Nitrosocke to create artistically consistent outputs. A curated group of 20-40ish images teaches the model unified styles, consistent characters, composition structures, etc.
2. Reinforcement Learning — can be used to feed these consistent model outputs back to the model, formally, thus retraining and *reweighting* model output priorities.
3. Retrain the model with an order of magnitude increase to the data set — this is unlikely to happen because there just aren't that many quality images laying around that haven't already been used to train these models.
Side note: If you somehow happen to be sitting on such a dataset, you're starting to realize that it has incredible NEW value, e.g. if you're Hollywood right now, you're spinning up new legal teams, new contracts, maybe your own models, everything.
Current devs are therefore likely to stick with option 2, bypassing a new war on the most exclusive data, and instead creating their own datasets. After all, they have image-generators, why not use them?
Unfortunately, most of these developers are not image and art experts. They will instead think like product managers and look toward the desired outputs of their current core audiences.
Who is the current core audience and what are they producing right now? Well, to be fair, lots of people and lots of stuff. But IMO the majority are producing the type of things people fascinate on at the beginning of their artistic journey, not the end.
So product managers at big models are busy trying to build for their current audience of early art-interested, rather than a total potential audience who benefit most from art-precision and art-expertise.
We can already see it in Stable Diffusion v2.0 and Midjourney 4. The models now prioritize the fetish of hyper-reality, deep saturation, concept art, graphics, etc, over true photographic representation. To get there you need to use long lists of negative prompts/hacks.
"So what? Not everybody's into cinematic portraits!"
That's fine. But these model changes live in everything: the lighting, the sculpts, the color grade, the environment, the depictions, the actions, etc.
If you are someone who DOESN'T want these aesthetic priorities in your outputs, then bad news for you.
These are all indications that these early general purpose models are susceptible to trends and not the total solve. And that the AI-gen arms race won't necessarily be won by the largest data, the quickest to build, or the most marketing videos — because those don't help.
The smartest developers will parter with dataset owners, creatives, and other experts to produce specific models that deliver high quality results. They will promote the strengths of these models, be experts on them, and be honest about what they can and cannot do well.
This will allow for the participation of individual creators and larger rights holders within an AI-centric model economy that is sustainable and beneficial to all. It's a massive stream of potential new revenue that is largely being ignored in current AI-gen convos.
How could it work?:
AI developers could reach out to individual creatives, taste makers, editors, producers, rights houses, etc, to begin work on authoring fine-tunings. These fine-tunings would then be marketable by both parties to the end user.
How could it work (cont.):
End users could select from licensed fine-tunings, make their own fine tunings, and mix and match these fine-tunings; both in isolated scenarios, and with other prompt types.
Thanks for reading. To anybody who made it this far, I hope this thread is useful :)
Wanna use this thread to also acknowledge the major innovations that have taken place around prompting, one each from the big 3:
Today I would like to share this series of #dalle2 AI-generated images. (thread for detail and process)
The series uses an initial prompt:
"Brilliant 8K portraiture from the film [film title], featuring people wrapped in faded blue linen, standing in the desert, directed by Annie Leibovitz"
to generate starter portraits in Leibovitz style.
The prompt is not a secret incantation or anything, it's just a rough guess at a good starting point (after generating tens of thousands of images with Dalle).