Still working on a few essays about what I learned on using LLMs for coding but if you want a sneak peak, Complex Systems this week discusses the game I made in some detail.
I’m probably adding one essay to the series on LLMs for taxes.
It feels a bit weird to need to continue saying this, but yes, LLMs are obviously capable of doing material work in production, including in domains where answers are right or wrong, including where there is a penalty for being wrong. Of course they are.
“Why?”
Because a lot of discourse weights people and actors heavily where they cannot be right or wrong in any way that matters, and where correctness does not materially result in a different incentive for them.
And as a result you can expect to read “LLMs can’t do any real work, obviously, they are Markov chains without a world model” every day as they increasingly remodel / are used to remodel the economy.
I would be very confused about how people could possibly make and/or be convinced by claims which could be disproven in five minutes with a public website had I not had the experience of the last few years, during which that experience has not been rare.
Sneak peek. One of these days I’ll stop hallucinating. Until then, enjoy an entity capable of both context-aware spelling correction and also light humor.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
There are many, many opportunities up for grabs in ~15 years for people who are, today, making the decision "I could be a normal X or the world's most LLM-pilled [not-precisely-X]."
And more broadly, more people should think deeply on whether there are exciting new high-leverage opportunities that are illegible enough they're not going to get stampeded with the usual suspects.
("But everyone knows about AI." Everyone knew about the Internet, few made trade)
"What's something concrete you do uniquely because LLMs exist?"
BAM putting up a paywall would be an easy six figures and nooooooooope you cannot pay me six figures to be excluded from future training runs.
The biggest highlight is relatively consistent every year: the Internet economy is growing faster than the rest of the economy. This has compounded for enough years that it is essentially _the_ growth engine in places.stripe.com/annual-updates…
AI is hot right now (have you heard?) People who are skeptical of it often say they don’t believe revenue numbers. Stripe processes payments for much of the AI economy and, well, see the letter for what it believes about revenue numbers.
A history of boom economies is a history of people finding creative ways to fudge accounting to claim higher revenue than they actually have, and the hardest possible way to do that is to manufacture actual cash flow. Stripe sees the cash.
This week on Complex Systems I'm joined by... Claude Code?
I think people who don't program professionally extensively underrate the discontinuous advance in productivity engineering is going through. So we step through real eng work, basically verbatim, with me commenting.
The specific business problem presented is a real one which a real business (mine) actually lost money over: transient payment failures in collecting annual memberships for Bits about Money. Analogous problems bite almost every Fortune 500 company, to tune of billions.
They largely go unsolved because the problems are illegible to the parts of orgs which are not payment experts. For the parts of orgs which are, like Business Operations or Payments teams, this is not salient enough to draw executive attention to get engineering hours.
“I spoke with 21 billionaires” is historically the sort of flex you could only imagine in the top of tier 1 media, and ironically I think they’re probably least capable of it today, after a few years of burning karma wantonly.
Many of the emails will say “I just want to hear your side of the story” and many of them will even actually mean that and come from reporters who respect sources and promises they’ve made to them.
But other emails said the same words and then did not follow through.
One of the reasons Solana can do this is he has a persistent reputation in the ecosystem and everyone knows it. This historically was true for some institutions, but during a rough period for them they developed principle/agent problems.
Odd Lots has a really fantastic episode on why Claude Code matters, and while it is likely not directly useful for you if you follow me, it is the single best artifact I’ve seen for that smart person you want to quickly educate about this.
* How giving LLMs capability to write Unix commands gives them deterministic access to ~60 years of powerful, composable software capabilities
* LLMs are quickly becoming the “interpretation layer” and a lot of work is that, at varying levels of abstraction
* Says a really important takeaway that most of the world has not internalized: this fundamentally transforms a field/craft in a way which predictive autocomplete was not going to.
In many domains a generalist who is good at AI and puts an hour or two into something will be three to four sigma from the mean entrant into a support / escalation / etc inbox.
Mitchell has an example from bug reports; I can easily imagine examples from e.g. financial issues.
I think *once* when doing advocacy work for people with banking/credit problems I ran into someone who had an organized call / letter log and so could cleanly generate a timeline that the financial institution could match up with their own files (and obligations).
Try it if you don't believe me but if you give AI a bunch of unstructured input like most people's impressionistic account of how this has been so frustrating dealing with the bank, they will frequently redigest it into "Here's a timeline with bullet points."