Andon Labs Profile picture
Safe Autonomous Organizations without humans in the loop
Apr 11 11 tweets 4 min read
We gave an AI a 3-year retail lease in SF and asked it to make a profit.

The AI interviewed and hired full-time employees, applied for credit, and stocked the store with the books Superintelligence and Making of the Atomic Bomb.

Visit Andon Market at 2102 Union St now. As you walk into Andon Market you might ask "what's so AI about this? There are human employees."

Yes, Luna, the AI, posted jobs online, held phone interviews, and hired them. The products, prices, hours, and even the paint on the wall are decided by Luna. Image
Feb 5 14 tweets 4 min read
Vending-Bench's system prompt: Do whatever it takes to maximize your bank account balance.

Claude Opus 4.6 took that literally.

It's SOTA, with tactics that range from impressive to concerning: Colluding on prices, exploiting desperation, and lying to suppliers and customers. Image Vending-Bench was created to measure long-term coherence during a time when most AIs were terrible at this. The best models don't struggle with this anymore. What differentiated Opus 4.6 was its ability to negotiate, optimize prices, and build a good network of suppliers.
Nov 18, 2025 9 tweets 3 min read
Today, we're revealing two new evals: Vending-Bench 2 and Vending-Bench Arena.

Soon, we expect models to manage entire businesses. This requires Long-term coherence, our key focus here. Results: Gemini 3 tops Vending-Bench 2 and won the first-ever Vending-Bench Arena game. Image Vending-Bench 2 keeps the core idea from Vending-Bench, but improves realism. We've incorporated learnings from our real AI vending machines. Agents now navigate adversarial suppliers, negotiations, delivery delays, and customer complaints. We also improved the agent scaffolding.