If you last checked in on AI image makers a month ago & thought “that is a fun toy, but is far from useful…” Well, in just the last week or so two of the major AI systems updated.
You can now generate a solid image in one try. For example, “otter on a plane using wifi” 1st try:
This is what you got a month ago with the same prompt. (MidJourney v3 vs. v4)
This is a classic case of disruptive technology, in the original Clay Christensen sense 👇
A less capable technology is developing faster than a stable dominant technology (human illustration), and starting to be able to handle more use cases. Except it is happening very quickly
Seriously, everyone whose job touches on writing, images, video, or music should realize that the pace of improvement here is very fast & also, unlike other areas of AI, like robotics, there are not any obvious barriers to improvement.
Also worth looking at the details in the admittedly goofy otter pictures: the lighting looks correct (even streaming through the windows), everything is placed correctly, including the drink, the composition is varied, etc.
And this is without any attempts to refine the prompts.
Some more, again all first attempts with no effort to revise:
🦦 Otters fighting a medieval duel
🦦Otter physicist lamenting the invention of the atomic bomb
🦦Otter inventing the airplane in 1905
🦦Otters playing chess in the fall
(These AIs just came out just a few months ago)
AI image generation can now beat the Lovelace Test, a Turing Test, but for creativity. It challenges AI to equal humans under constrained creativity.
Illustrating “an otter making pizza in Ancient Rome” in a novel, interesting way & as well as an average human is a clear pass!
And I picked otters randomly for fun
But since some comments are pointing out that nonhuman scenes may be easier; here are some of the prompt “doctor on a plane using wifi” - we are good at picking out flaws with illustrations of people, but they are impressive & improving fast.
People keep asking what system I was using: it is MidJourney (I mentioned this in the thread)
If you want to try it, you get 25 uses for free & a guide is below. Be sure to use —v4 at the end of your prompt to use the latest version, which is the one I use throughout the thread.
Here👇 is a thread with more comparisons between MidJourney a month or so ago, compared to MidJourney now. The pace is fast!
If you are trying MidJourney, the way to use the new version is to add --v 4 to the end of your prompt (I have no association with it or any AI company)
Reminder: if you want to use the new MidJourney version 4, rather than the old (from a month ago!) version add “ --v 4” to the end of the prompt. The spaces are vital
Interestingly, version 4 “just works” making it easier for everyone but power users who learned to craft prompts
• • •
Missing some Tweet in this thread? You can try to
force a refresh
So, OpenAI Deep Research can connect directly to Dropbox, Sharepoint, etc.
Early experiments only, but it feels like what every "talk to our documents" RAG system has been aiming for, but with o3 smarts and easy use. I haven't done robust testing yet, but very impressive so far.
I think it is going to be a shock to the market, since "talk to our documents" is one of the most popular implementations of AI in large organizations, and this version seems to work quite well and costs very little.
I am sure the other Deep Research products will be able to do the same soon, and, while I am sure there are hallucinations (haven't spotted any yet, though), this seems like an example of how the LLM makers can sometimes move upstream to the application space and take a market.
Very big impact: The final version of a randomized, controlled World Bank study finds using a GPT-4 tutor with teacher guidance in a six week after school progam in Nigeria had "more than twice the effect of some of the most effective interventions in education" at very low costs
Microsoft keeps launching Copilot tools that seem interesting but which I can't ever seem to locate. Can't find them in my institution's enterprise account, nor my personal account, nor the many Copilot apps or copilots to apps or Agents for copilots
Each has their own UIs. 🤷♂️
For a while in 2023, Microsoft, with its GPT-4-powered Bing, was the absolute leader in making LLMs accessible and easy to use.
Even Amazon made Nova accessible through a simple URL.
Make your products easy to experiment with and people will discover use cases. Make them impossible without some sort of elaborate IT intervention and nobody will notice and they will just go back to ChatGPT or Gemini.
As someone who has spent a lot of time thinking and building in AI education, and sees huge potential, I have been shown this headline a lot
I am sure Alpha School is doing interesting things, but there is no deployed AI tutor yet that drives up test scores like this implies.
I am not doubting their test results, but I would want to learn more about the role AI is playing, and what they mean by AI tutor, before attributing their success to AI as opposed to the other dials they are turning.
Google has been doing a lot of work on fine-tuning Gemini for learning, and you can see a good overview of the issues and approaches in their paper (which also tests some of our work on tutor prompts). arxiv.org/abs/2412.16429
I suspect that a lot of "AI training" in companies and schools has become obsolete in the last few months
As models get larger, the prompting tricks that used to be useful are no longer good; reasoners don't play well with Chain-of-Thought; hallucination rates have dropped, etc.
I think caution is warranted when teaching prompting approaches for individual use or if training is trying to define clear lines about tasks where AI is bad/good. Those areas are changing very rapidly.
None of this is the fault of trainers - I have taught my students how to do Chain-of-thought, etc. But we need to start to think about how to teach people to use AI in a world that is changing quite rapidly. Focusing on exploration and use, rather than a set of defined rules.
“GPT-4.5, Give me a secret history ala Borges. Tie together the steel at Scapa Flow, the return of Napoleon from exile, betamax versus VHS, and the fact that Kafka wanted his manuscripts burned. There should be deep meanings and connections”
“Make it better” a few times…
It should have integrated the scuttling of the High Seas Fleet better but it knocked the Betamax thing out of the park