If you last checked in on AI image makers a month ago & thought “that is a fun toy, but is far from useful…” Well, in just the last week or so two of the major AI systems updated.
You can now generate a solid image in one try. For example, “otter on a plane using wifi” 1st try:
This is what you got a month ago with the same prompt. (MidJourney v3 vs. v4)
This is a classic case of disruptive technology, in the original Clay Christensen sense 👇
A less capable technology is developing faster than a stable dominant technology (human illustration), and starting to be able to handle more use cases. Except it is happening very quickly
Seriously, everyone whose job touches on writing, images, video, or music should realize that the pace of improvement here is very fast & also, unlike other areas of AI, like robotics, there are not any obvious barriers to improvement.
Also worth looking at the details in the admittedly goofy otter pictures: the lighting looks correct (even streaming through the windows), everything is placed correctly, including the drink, the composition is varied, etc.
And this is without any attempts to refine the prompts.
Some more, again all first attempts with no effort to revise:
🦦 Otters fighting a medieval duel
🦦Otter physicist lamenting the invention of the atomic bomb
🦦Otter inventing the airplane in 1905
🦦Otters playing chess in the fall
(These AIs just came out just a few months ago)
AI image generation can now beat the Lovelace Test, a Turing Test, but for creativity. It challenges AI to equal humans under constrained creativity.
Illustrating “an otter making pizza in Ancient Rome” in a novel, interesting way & as well as an average human is a clear pass!
And I picked otters randomly for fun
But since some comments are pointing out that nonhuman scenes may be easier; here are some of the prompt “doctor on a plane using wifi” - we are good at picking out flaws with illustrations of people, but they are impressive & improving fast.
People keep asking what system I was using: it is MidJourney (I mentioned this in the thread)
If you want to try it, you get 25 uses for free & a guide is below. Be sure to use —v4 at the end of your prompt to use the latest version, which is the one I use throughout the thread.
Here👇 is a thread with more comparisons between MidJourney a month or so ago, compared to MidJourney now. The pace is fast!
If you are trying MidJourney, the way to use the new version is to add --v 4 to the end of your prompt (I have no association with it or any AI company)
Reminder: if you want to use the new MidJourney version 4, rather than the old (from a month ago!) version add “ --v 4” to the end of the prompt. The spaces are vital
Interestingly, version 4 “just works” making it easier for everyone but power users who learned to craft prompts
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Microsoft keeps launching Copilot tools that seem interesting but which I can't ever seem to locate. Can't find them in my institution's enterprise account, nor my personal account, nor the many Copilot apps or copilots to apps or Agents for copilots
Each has their own UIs. 🤷♂️
For a while in 2023, Microsoft, with its GPT-4-powered Bing, was the absolute leader in making LLMs accessible and easy to use.
Even Amazon made Nova accessible through a simple URL.
Make your products easy to experiment with and people will discover use cases. Make them impossible without some sort of elaborate IT intervention and nobody will notice and they will just go back to ChatGPT or Gemini.
As someone who has spent a lot of time thinking and building in AI education, and sees huge potential, I have been shown this headline a lot
I am sure Alpha School is doing interesting things, but there is no deployed AI tutor yet that drives up test scores like this implies.
I am not doubting their test results, but I would want to learn more about the role AI is playing, and what they mean by AI tutor, before attributing their success to AI as opposed to the other dials they are turning.
Google has been doing a lot of work on fine-tuning Gemini for learning, and you can see a good overview of the issues and approaches in their paper (which also tests some of our work on tutor prompts). arxiv.org/abs/2412.16429
I suspect that a lot of "AI training" in companies and schools has become obsolete in the last few months
As models get larger, the prompting tricks that used to be useful are no longer good; reasoners don't play well with Chain-of-Thought; hallucination rates have dropped, etc.
I think caution is warranted when teaching prompting approaches for individual use or if training is trying to define clear lines about tasks where AI is bad/good. Those areas are changing very rapidly.
None of this is the fault of trainers - I have taught my students how to do Chain-of-thought, etc. But we need to start to think about how to teach people to use AI in a world that is changing quite rapidly. Focusing on exploration and use, rather than a set of defined rules.
“GPT-4.5, Give me a secret history ala Borges. Tie together the steel at Scapa Flow, the return of Napoleon from exile, betamax versus VHS, and the fact that Kafka wanted his manuscripts burned. There should be deep meanings and connections”
“Make it better” a few times…
It should have integrated the scuttling of the High Seas Fleet better but it knocked the Betamax thing out of the park
🚨Our Generative AI Lab at Wharton is releasing its first Prompt Engineering Report, empirically testing prompting approaches. This time we find: 1) Prompting “tricks” like saying “please” do not help consistently or predictably 2) How you measure against benchmarks matters a lot
Using social science methodologies for measuring prompting results helped give us some useful insights, I think. Here’s the report, the first of hopefully many to come. papers.ssrn.com/sol3/papers.cf…
This is what complicates things. Making a polite request ("please") had huge positive effects in some cases and negative ones in others. Similarly being rude ("I order you") helped in some cases and not others.
There was no clear way to predict in advance which would work when.