Building @rosebud_ai: Vibe code games. Creation is Play🌹 https://t.co/XWU5lvuHwW PhD math/AI Berkeley 👼 Angel investor: https://t.co/VMlOAnTC1j 🇨🇦
Feb 22 • 10 tweets • 4 min read
My feed has been inundated with posts of Grok 3 making basic arcade games. But llms from years ago could make decent arcade games, not news. So I ran a one-shot test to determine how well it fared again other frontier models in creating a 3D game with room for it to come up with gameplay and aesthetics. I tested Grok 3, O1, Sonnet 3,5, Llama4, DeepSeek, and Gemini using the following prompt.
Make Dune x Minecraft 🏜️
Imagine a sandbox survival game set on a desert planet. Players mine ‘spice’ and must build defenses against roaming sandworms. Design the main gameplay loop, crafting system, and survival challenges in one complete description.
✏️tldr O1, Grok 3, and Sonnet 3.5 were the most impressive. Aesthetically, Grok nailed the best vibes (it even produced a surprisingly cool-looking spice mining truck), but the game lacked functionality. O1 took the top spot imo with a functional and visually appealing experience, and Sonnet 3.5 followed closely. This is obviously just one test, but you can see the generated code and games in the thread (and even try forking them on Rosebud).
Longer summary:
@OpenAI O1: Best vibes to function balance. Looked good, working controls, I could mine spice.
@xai #Grok3: Excelled at generating vibes for dune. I especially liked the Dune-inspired spice mining car—though it wasn’t entirely a complete game. I could move around, but none of the crafting mechanics worked.
@AnthropicAI Sonnet3.5: produced something in space that had dune vibes. More functional than Grok because I could mine spice. However vibes were worse than the first two.
@deepseek_ai : Managed to generate code that worked, but the game was so hard it always ended seconds after it started, and despite requests for better visuals, it looked VERY ugly.
@GoogleDeepMind Gemini 2.0 flash and @metaai LLaMA: Sadly landed at the bottom of the list; after multiple prompts (this was supposed to be one shot and none of the others failed in the first shot), I couldn’t get them to produce working code for this prompt.
All of these were tested on @Rosebud_AI . An obvious limitation with these frontier models in their chat interfaces is that you can only get them to regenerate code from scratch each time you prompt them, making it tough to refine or extend a single project. Rosebud, on the other hand, lets you iterate on one project (we do diffs), deploy with one click, share your project, and even allow others to remix it.
This was just a single test, so it’s obviously not scientific. I wanted to create it to see how these frontier models handle more complex game prompts—rather than retrying the same arcade games that earlier generations of LLMs have already mastered.
Grok's @xai Dune Minecraft. "The Desert Beckons" play.rosebud.ai/p/61385e4b-0ce…
Oct 31, 2022 • 4 tweets • 4 min read
🎃 Happy Halloween 🎃
Maybe these will come in handy today. We have put together the finest collection of spooky AI-generated Images, they are 😱 pure nightmare fuel 😱 OddVibe.com #halloween#ai#stablediffusion
We've been working on an AI generated stock photo site called PixelVibe, and in the process of improving the image quality have seen our fine share of the darker side of stable diffusion. 🌚 So we created OddVibe.com