There’s a very real possibility local dev may be dead in 10 years.
- @isamlambert “Planetscale doesnt believe in localhost”
- @ericsimons40 Stackblitz runs Node fast in the browser
- @github runs entirely on Codespaces
This would be the biggest shift in dev workflow since git.
writing about “The Death of Localhost” in my next @DXTipsHQ piece, pls offer any relevant data/notable technologies. will acknowledge in writeup!
big thanks to everyone that contributed opinions and datapoints (shoutouts in the article)! I ended up compiling a nice big list of all the bigco cloud dev environments, and collecting all the arguments for/against
I've updated this post with great takes from this fantastic thread from Kelsey!
"in the long run, fully utilizing the cloud will make engineers more productive. Imagine taking a large test suite and running 1000 tests in parallel on a FaaS platform"
wonderful meetup event tonight. can feel the energy building and love @jolandgraf and @svenefftinge’s incredible passion for solving this problem. super encouraged that they really get how important it is to make dev environments *ephemeral*
was fun to work on this @latentspacepod post w/ @benhylak
inspired by "pivot token" literature, one gpt->o1 mental model shift i've made is the role of self-evaluation and -correction as an ESSENTIAL part of planning/reasoning.
with o1, you move the LLM-as-judge *INTO THE PROMPT*, so you can let it handle the self eval and replanning. this is the incremental next "agentic" step, which openai consistently does well, to the frustration of more hyperbolic but snake oily alternatives.
this neurips is really going to be remembered as the "end of pretraining" neurips
notes from doctor @polynoamial's talk on scaling test time compute today
(thank you @oh_that_hat for organizing)
all gains to date have been from scaling data and pretrain compute and yet LLMs cant solve simple problems like tictactoe
however inference costs have scaled much less.
goes back to libratus/pluribus work
poker model scaling from 2012-2015 - scaled 5x each year, but still lost dramatically (9 big bets per hundred) to poker pros in 80k hands
recalls familiar insight about humans taking longer to think for harder problems.
added 20s of search - reduced distance from nash equilibrium results reduced by a factor of 7 - roughly the equivalent of scaling up model size by 100,000x
Here’s my @OpenAIDevs day thread for those following along. everyone else gotchu with videos and stuff so i will just give personal notes and aha moments thru the day
after some nice screenshot of Cocounsel, time for @romainhuet’s legendary live demos. o1 one-shots an ios app and does the frotnend/backend to control a drone.
ai controlled drones, what could go wrong?
@romainhuet Realtime API announced!
starting with speech to speech support
all 6 adv voice mode voices supported
just realized NotebookLM is @GoogleDeepMind's ChatGPT moment
- "low key research preview"/"experimental"
- not monetized
- GPUs/TPUs immediately on fire
- SOTA proprietary new model buried in there with upgrade that weren't previously announced
- new AI UX that cleverly embeds LLM usage natively within the product features
in this case NBLM nailed multimodal RAG and I/O in a way that @ChatGPTapp never did (or for that matter, @GeminiApp). The multiple rounds of preprocessing described by @stevenbjohnson also raise the quality of the audio conversation dramatically at the cost of extreme latency (took an efficient model that was advertised as capable of generating 30s of audio in 0.5s, and slapped on like 200s of LLM latency haha)
@GoogleDeepMind like, i put my podcast into it and it made a podcast of my podcast and... it was good.
do u guys know we spend 1-2 hrs writing up the show notes and now its a button press in NBLM
Gemini really took pride topping @lmsysorg for a hot second and then @OpenAI said "oh no u dont" and put out 4 straight bangers pounding everyone into the dust by 50 elo points
V high bar set for Gemini 2, Grok 2.5, and Claude 4 this fall.
Multiple fronts - on reasoning, multiturn chat tuning, instruction following, and coding - to compete.
anyway we finally did a @latentspacepod paper club on STaR and friends, swim on by
i hastily sketched out a "paper stack" of what the "literature of reasoning" could look like, but this is amateur work - would love @teortaxesTex or @arattml to map out a full list of likely relevant papers for o1
holy shit @ideogram_ai thumbnails are untapped alpha