Surya Dantuluri Profile picture
ml https://t.co/r5ICReH9eO
Oct 30, 2025 8 tweets 6 min read
What if next-token prediction wasn't a single forward pass, but a tiny optimization problem?

Introducing: nanoEBM a tiny transformer that learns to think harder by doing gradient descent on its own predictions.

You can start training on your Mac now - it comes < 400 lines A few days ago I wondered if LMs could think harder without CoT prompting but by doing gradient descent on some objective function - this led me to energy-based models[1] where you learn some smooth and differential function to find a good y for any x. [1] atcold.github.io/NYU-DLSP20/en/…. Image from: alexiglad.github.io/blog/2025/ebt/ @AlexiGladImage
May 5, 2024 4 tweets 3 min read
is consumer ai dead? the uninspiring stories of gpt wrappers ramping up to 10s to 100s of thousands of users are common, they rake in profits with 40-70%+ margins and seemingly all die within 5-9 months. Why?
Image
Image
Character AI has fallen from #2 to #40 in the App Store, mentions of Character AI and similar apps have all fall 30% this year compared to last, and 50%+ of Perplexity traffic comes from SEA and South Africa. Artifact(self funded) shutdown in a year after also blowing up for the first month and tailing off, failing to get “PMF”. Many other gpt wrappers have also had quick spikes before either trying to pivot or the founders move on while handling growth to contractors.Image
Image
Image
Jul 10, 2023 4 tweets 2 min read
Just got Code Interpreter to run GPT-2 117M with GGML entirely in ChatGPT

Spent the weekend nurturing a relationship to get to this point -- also if you try enough you can upload zips of up to 250mb This is going to be a longer thread, but a few initial impressions:

1. You can build a "relationship" by showing you know something it doesn't (files are located in /mnt/data/<etc>)
2. Built trust by failing, i.e. don't let it run GPT-2 inference on the first shot
Jun 25, 2023 5 tweets 2 min read
over the past month my plugins have embedded over 10 billion tokens and 2 million unique hits

to handle that scale, every service i used broke down day after day

introducing serverless embeddings, at half the cost. called re-embed. why? --> embeddings were made to embed static data for one-time use.

at scale, injesting, embedding new data, whether it be user chat logs, or user gen content, embeddings will start playing a critical role to many new apps

re-embed runs the most powerful embeddings models on nvidia gpu