Still working on a few essays about what I learned on using LLMs for coding but if you want a sneak peak, Complex Systems this week discusses the game I made in some detail.
I’m probably adding one essay to the series on LLMs for taxes.
It feels a bit weird to need to continue saying this, but yes, LLMs are obviously capable of doing material work in production, including in domains where answers are right or wrong, including where there is a penalty for being wrong. Of course they are.
“Why?”
Because a lot of discourse weights people and actors heavily where they cannot be right or wrong in any way that matters, and where correctness does not materially result in a different incentive for them.
And as a result you can expect to read “LLMs can’t do any real work, obviously, they are Markov chains without a world model” every day as they increasingly remodel / are used to remodel the economy.
I would be very confused about how people could possibly make and/or be convinced by claims which could be disproven in five minutes with a public website had I not had the experience of the last few years, during which that experience has not been rare.
Sneak peek. One of these days I’ll stop hallucinating. Until then, enjoy an entity capable of both context-aware spelling correction and also light humor.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
This week on Complex Systems I'm joined by... Claude Code?
I think people who don't program professionally extensively underrate the discontinuous advance in productivity engineering is going through. So we step through real eng work, basically verbatim, with me commenting.
The specific business problem presented is a real one which a real business (mine) actually lost money over: transient payment failures in collecting annual memberships for Bits about Money. Analogous problems bite almost every Fortune 500 company, to tune of billions.
They largely go unsolved because the problems are illegible to the parts of orgs which are not payment experts. For the parts of orgs which are, like Business Operations or Payments teams, this is not salient enough to draw executive attention to get engineering hours.
“I spoke with 21 billionaires” is historically the sort of flex you could only imagine in the top of tier 1 media, and ironically I think they’re probably least capable of it today, after a few years of burning karma wantonly.
Many of the emails will say “I just want to hear your side of the story” and many of them will even actually mean that and come from reporters who respect sources and promises they’ve made to them.
But other emails said the same words and then did not follow through.
One of the reasons Solana can do this is he has a persistent reputation in the ecosystem and everyone knows it. This historically was true for some institutions, but during a rough period for them they developed principle/agent problems.
Odd Lots has a really fantastic episode on why Claude Code matters, and while it is likely not directly useful for you if you follow me, it is the single best artifact I’ve seen for that smart person you want to quickly educate about this.
* How giving LLMs capability to write Unix commands gives them deterministic access to ~60 years of powerful, composable software capabilities
* LLMs are quickly becoming the “interpretation layer” and a lot of work is that, at varying levels of abstraction
* Says a really important takeaway that most of the world has not internalized: this fundamentally transforms a field/craft in a way which predictive autocomplete was not going to.
In many domains a generalist who is good at AI and puts an hour or two into something will be three to four sigma from the mean entrant into a support / escalation / etc inbox.
Mitchell has an example from bug reports; I can easily imagine examples from e.g. financial issues.
I think *once* when doing advocacy work for people with banking/credit problems I ran into someone who had an organized call / letter log and so could cleanly generate a timeline that the financial institution could match up with their own files (and obligations).
Try it if you don't believe me but if you give AI a bunch of unstructured input like most people's impressionistic account of how this has been so frustrating dealing with the bank, they will frequently redigest it into "Here's a timeline with bullet points."
Considering writing about non-coding LLM workflows a bit in December partially for personal interest and partially so people can see concrete examples of progress / usage.
The one easiest for me to talk about is just a geeky hobby: here's a plastic model and then here is ChatGPT producing a painting reference of ~that model, after a discussion on characterization, color scheme, etc.
I honestly love using it in my art projects. Hallucination rate is acceptable given ~wide acceptance criteria in art; like Bob Ross used to say, there are only happy accidents if e.g. its suggested recipe for mixing a teal paint does not actually result in teal immediately.
If I clipped every good Byrne Hobart or Matt Levine line I’d never get around to writing my own stuff but this from Byrne is too good to not share:
An extraordinary fact about finance is that there are some firms which are financial service providers specifically for scams which sometimes, almost as an industrial accident, bafflingly end up in a contractual relationship with a legitimate, successful company.
These underwriters are not necessarily that; some overlevered highly “structured” IPOs of midmarket software businesses should have a non-zero price, and a capitalist should not say they are a scam just because he is not a buyer at that price.