Still working on a few essays about what I learned on using LLMs for coding but if you want a sneak peak, Complex Systems this week discusses the game I made in some detail.
I’m probably adding one essay to the series on LLMs for taxes.
It feels a bit weird to need to continue saying this, but yes, LLMs are obviously capable of doing material work in production, including in domains where answers are right or wrong, including where there is a penalty for being wrong. Of course they are.
“Why?”
Because a lot of discourse weights people and actors heavily where they cannot be right or wrong in any way that matters, and where correctness does not materially result in a different incentive for them.
And as a result you can expect to read “LLMs can’t do any real work, obviously, they are Markov chains without a world model” every day as they increasingly remodel / are used to remodel the economy.
I would be very confused about how people could possibly make and/or be convinced by claims which could be disproven in five minutes with a public website had I not had the experience of the last few years, during which that experience has not been rare.
Sneak peek. One of these days I’ll stop hallucinating. Until then, enjoy an entity capable of both context-aware spelling correction and also light humor.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
In many domains a generalist who is good at AI and puts an hour or two into something will be three to four sigma from the mean entrant into a support / escalation / etc inbox.
Mitchell has an example from bug reports; I can easily imagine examples from e.g. financial issues.
I think *once* when doing advocacy work for people with banking/credit problems I ran into someone who had an organized call / letter log and so could cleanly generate a timeline that the financial institution could match up with their own files (and obligations).
Try it if you don't believe me but if you give AI a bunch of unstructured input like most people's impressionistic account of how this has been so frustrating dealing with the bank, they will frequently redigest it into "Here's a timeline with bullet points."
Considering writing about non-coding LLM workflows a bit in December partially for personal interest and partially so people can see concrete examples of progress / usage.
The one easiest for me to talk about is just a geeky hobby: here's a plastic model and then here is ChatGPT producing a painting reference of ~that model, after a discussion on characterization, color scheme, etc.
I honestly love using it in my art projects. Hallucination rate is acceptable given ~wide acceptance criteria in art; like Bob Ross used to say, there are only happy accidents if e.g. its suggested recipe for mixing a teal paint does not actually result in teal immediately.
If I clipped every good Byrne Hobart or Matt Levine line I’d never get around to writing my own stuff but this from Byrne is too good to not share:
An extraordinary fact about finance is that there are some firms which are financial service providers specifically for scams which sometimes, almost as an industrial accident, bafflingly end up in a contractual relationship with a legitimate, successful company.
These underwriters are not necessarily that; some overlevered highly “structured” IPOs of midmarket software businesses should have a non-zero price, and a capitalist should not say they are a scam just because he is not a buyer at that price.
How much could would you write if you could one-shot 10-100 line shell scripts or similar almost all of the time, in 10 seconds? You would write a stupid amount of code. Who cares if it is disposable? Dispose of it; it's basically free.
Skill issue, code is free to you. Write a test suite too, designed to be thrown away in under a minute. Write three independent implementations and vote on the answer. etc, etc
"Have you actually done this?" Yeah, to a minor degree, and I'll recount a bit more when I do some writeups about my experience with LLM programming. After a few weeks of climbing the skill curve instead of some direct questions I'd say "Goal: *direct question* You should..."
Me to financial firm: *address change form*
Financial firm: Is this five digit number a post code?
Me to financial firm: Oh you have asked exactly the right person for geeking out about post codes. Did you know...
Second thoughts: That was not the efficient way to answer.
"Why didn't they know what a post code looks like?"
Because a post code can look like so many things, like 100-0001, 20500, or SW1A 1AA, to use three codes from three nations that all correspond to a particular famous building/complex within them.
A further fun fact: some nations don't customarily use post codes and others don't customarily use addresses, favoring a natural language description of the recipient which is sufficient to get a mail carrier to successfully route to them.
So October 15th, the extended US tax deadline, is just around the corner, and I have some observations which are more about LLM progress than taxes.
Background: many people professionally involved with LLMs estimate 2026-2028 as the year where one can get an LLM to "do taxes."
I have a fairly complicated situation and have put more of my points into tax procedure than many AI researchers, and I did not previously expect to actually have this capability available in 2028.
On basis of experience with review, but not full execution, rethinking that.
I think the most likely form factor for actually deploying this in the real world is a software company which integrates LLMs as a component but also has a lot of special sauce.
Be that as it may, what I actually had available yesterday was the standard chat interface.