Post

More from @deedydas

Deedy

@deedydas

Apr 22

ChatGPT’s new image model can take in a picture of a house and “generate an entire floor plan”

It blows every single other image model out of the water.

I’ve been trying it all day and here are 10 unbelievable things it can do:

1/11

“Make this graph prettier”

Easily beautifies simple charts (espp matplotlib charts) to make them professional grade and presentable.

2/11

“Technical diagram for Photosynthesis.”

This is a complete game changer for education. The accuracy and depth is unparalleled and far better than Nano Banana 2.

3/11

Read 17 tweets

Deedy

@deedydas

Mar 22

Karpathy's Autoresearch pushed my vibecoded Rust chess engine AI from "expert" to a top 50 grandmaster, a #311 chess engine.

It ran over 70 experiments on its own and tried to hill climb to the top ELO score it could, landing at 2718!

This approach fundamentally uses a negamax alpha-beta tree search with pruning and iterative deepening. I tested everything with a 500ms per move limit. The main way to improve it would be to get rid of the static evaluation at the nodes and replace it with efficiently updatable neural nets (NNUEs). Also uses standard opening books and a transposition table to cache moves.

There's no offline computation or training element, so each run is like the last.

Lichess bot link: lichess.org/@/deedybot
Github repo: github.com/deedy/chess
Chess AI ranking (CCRL): computerchess.org.uk/ccrl/4040/
Bayesian ELO:
remi-coulom.fr
Stash measurement: dannyhammer.github.io/engine-testing…

Thanks @navvye for offering to test it (he was #28 in India and 23-2400 at some point and @parimarjan one of the goats for being my chess hero growing up and offering to give it a go

Read 4 tweets

Deedy

@deedydas

Dec 9, 2025

We just dropped a 12 page AI report on how ~500 execs at US enterprises use generative AI.

I read it all so you don't have to. Top 8 takeaways:

Anthropic is the #1 model provider in the enterprise, with 40% of ~$37B spend, with OpenAI dropping to #2.

1/8

On overall AI spend.

Generative AI has captured ~6% of software spend at $37B, growing ~3.2x YoY. Investments are coming to fruition and buyers are seeing results.

2/8

On where the spend goes.

Companies are using off-the-shelf models more than they're training their own. Horizontal AI tools like ChatGPT Enterprise, Claude for Work, Msft Copilot and Glean have exploded. "Departmental" AI like Cursor and Github Copilot also sees a huge boost.

3/8

Read 9 tweets

Deedy

@deedydas

Aug 31, 2025

This new DeepMind research shows just how broken vector search is.

Turns out some docs in your index are theoretically incapable of being retrieved by vector search, given a certain dimension count of the embedding.

Plain old BM25 from 1994 outperforms it on recall.

1/4

This result gives me a lot of joy as a search nerd for more than a decade.

Haters will say that the dataset the authors created, LIMIT, is synthetic and unrealistic, but this has been my observation building search systems at Google / Glean.

Source:

2/4alphaxiv.org/pdf/2508.21038

Vector Search was popularized as an approachable drop-in search since OpenAI embeddings grew in popularity, but has clear limitations in production settings.

Even aside from this result, showing it just misses certain docs constantly, it
– doesn't search for concepts well
– often retrieves similar but unrelated results
– doesn't account for non-content signals of similarity (recency, popularity)

3/4

Read 4 tweets

Deedy

@deedydas

Aug 10, 2025

I'm using GPT5 Pro to find me the best stocks and startup investments.

Asked it to use modern portfolio theory and size investments.
—Top Privates [+9.7%]: Databricks, Stripe, Anthropic, SpaceX
—Top Publics [+14.2%]: Nvidia, TSMC, Microsoft, Meta

Just put $1000 into the stocks!

Prompt: "Check all public / private stock market companies and tell me what I should invest in from first principles reasoning. You have $1000.

Please do deep research and present rationale for each investment. Each one should have a target price and expected value. Use advanced math for trading. Draw research from authoritative sources like research and unbiased pundits. Size my bets properly and use everything you know about portfolio theory. Corroborate each decision with a list of predictions about those companies.

Your goal is to maximize expected value. Make minimum 5 investments. Write it in a table."

https://twitter.com/deedydas/status/1944227802867470563

This follows my previous experiment on Polymarket, which seemingly had ~2-4x the expected returns!

https://twitter.com/deedydas/status/1944227802867470563

Read 4 tweets

Deedy

@deedydas

Aug 8, 2025

Ridiculous that OpenAI claimed 74.9% on SWE-Bench just to prove they were above Opus 4.1’s 74.5%…

By running it on 477 problems instead of the full 500.

Their system card only says 74% too.

Source: cdn.openai.com/pdf/8124a3ce-a… cdn.openai.com/pdf/8124a3ce-a…

And yes, I know they’ve always reported on the 477 denominator, but that’s NOT “SWE-Bench verified”, that’s an entirely different metric, it’s “OpenAI’s subset of SWE Bench Verified” and that number can’t be compared

Read 4 tweets

Share this page!

Enter URL or ID to Unroll

Deedy

Try unrolling a thread yourself!

More from @deedydas

Deedy

Deedy

Deedy

Deedy

Deedy

Deedy

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!