Karpathy's Autoresearch pushed my vibecoded Rust chess engine AI from "expert" to a top 50 grandmaster, a #311 chess engine.
It ran over 70 experiments on its own and tried to hill climb to the top ELO score it could, landing at 2718!
This approach fundamentally uses a negamax alpha-beta tree search with pruning and iterative deepening. I tested everything with a 500ms per move limit. The main way to improve it would be to get rid of the static evaluation at the nodes and replace it with efficiently updatable neural nets (NNUEs). Also uses standard opening books and a transposition table to cache moves.
There's no offline computation or training element, so each run is like the last.
Thanks @navvye for offering to test it (he was #28 in India and 23-2400 at some point and @parimarjan one of the goats for being my chess hero growing up and offering to give it a go
We just dropped a 12 page AI report on how ~500 execs at US enterprises use generative AI.
I read it all so you don't have to. Top 8 takeaways:
Anthropic is the #1 model provider in the enterprise, with 40% of ~$37B spend, with OpenAI dropping to #2.
1/8
On overall AI spend.
Generative AI has captured ~6% of software spend at $37B, growing ~3.2x YoY. Investments are coming to fruition and buyers are seeing results.
2/8
On where the spend goes.
Companies are using off-the-shelf models more than they're training their own. Horizontal AI tools like ChatGPT Enterprise, Claude for Work, Msft Copilot and Glean have exploded. "Departmental" AI like Cursor and Github Copilot also sees a huge boost.
This new DeepMind research shows just how broken vector search is.
Turns out some docs in your index are theoretically incapable of being retrieved by vector search, given a certain dimension count of the embedding.
Plain old BM25 from 1994 outperforms it on recall.
1/4
This result gives me a lot of joy as a search nerd for more than a decade.
Haters will say that the dataset the authors created, LIMIT, is synthetic and unrealistic, but this has been my observation building search systems at Google / Glean.
Vector Search was popularized as an approachable drop-in search since OpenAI embeddings grew in popularity, but has clear limitations in production settings.
Even aside from this result, showing it just misses certain docs constantly, it
– doesn't search for concepts well
– often retrieves similar but unrelated results
– doesn't account for non-content signals of similarity (recency, popularity)
3/4
I'm using GPT5 Pro to find me the best stocks and startup investments.
Asked it to use modern portfolio theory and size investments.
—Top Privates [+9.7%]: Databricks, Stripe, Anthropic, SpaceX
—Top Publics [+14.2%]: Nvidia, TSMC, Microsoft, Meta
Just put $1000 into the stocks!
Prompt: "Check all public / private stock market companies and tell me what I should invest in from first principles reasoning. You have $1000.
Please do deep research and present rationale for each investment. Each one should have a target price and expected value. Use advanced math for trading. Draw research from authoritative sources like research and unbiased pundits. Size my bets properly and use everything you know about portfolio theory. Corroborate each decision with a list of predictions about those companies.
Your goal is to maximize expected value. Make minimum 5 investments. Write it in a table."
This follows my previous experiment on Polymarket, which seemingly had ~2-4x the expected returns!
And yes, I know they’ve always reported on the 477 denominator, but that’s NOT “SWE-Bench verified”, that’s an entirely different metric, it’s “OpenAI’s subset of SWE Bench Verified” and that number can’t be compared